Json and PHP serializers should throw excepiton on failure

Created on 13 February 2024, about 1 year ago

Problem/Motivation

Currently the PhpSerialize and Json serialization classes do not provide indication of failure.

It is documented on the SerializationInterface API for \Drupal\Component\Serialization\Exception\InvalidDataTypeException to be thrown however neither implementation appears to to do so, as such errors are silently hidden which can lead to failure, The YAML serializer does raise exceptions on failure.

Setting as Critical per Priority as this can lead to Data Loss or Corruption.

Steps to reproduce

$serializer = \Drupal::service('serialization.json');
$data = $serializer->decode('s:8:"not json"');
var_dump($data); // NULL
$data= $serializer->decode('null');
var_dump($data); // NULL

Note: These serializers are part of the Base core install not part of the Serialization module.

Proposed resolution

  • Json: Use the JSON_THROW_ON_ERROR flag and catch \JsonException to raise \InvalidDataTypeException on both encode and decode. (Added PHP 7.3.0)
  • PHP: serialize() has no published fault scenario, unserialize() however will return FALSE on an error and set an error status. We can capture the FALSE and compare if the string matches a serialized FALSE, if not its a valid error. Inspired by discussions in ๐Ÿ› UserData only returns strings for scalar values Needs review

Sample versions of the above implementations with tests can be found in a contrib issue I'm working on.
https://git.drupalcode.org/issue/rabbitmq-3416009/-/commit/f12e4725e6110...

Remaining tasks

Patch

User interface changes

None

API changes

None

Data model changes

None

Release notes snippet

Json and PHP serilaizers will now detect faults and raise exception as indicated in the API specifications.

๐Ÿ› Bug report
Status

Active

Version

11.0 ๐Ÿ”ฅ

Component
Baseย  โ†’

Last updated about 3 hours ago

Created by

๐Ÿ‡บ๐Ÿ‡ธUnited States cmlara

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @cmlara
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States cmlara
  • ๐Ÿ‡ฎ๐Ÿ‡ณIndia anandhi karnan Chennai

    anandhi karnan โ†’ made their first commit to this issueโ€™s fork.

  • ๐Ÿ‡ฎ๐Ÿ‡ณIndia anandhi karnan Chennai

    I have updated the serializers to handle failure scenarios more robustly:

    PHP Serializer (PhpSerialize):

    • Added exception handling for unserialization errors.
    • Incorporated the allowed_classes option to enhance security and prevent unserialization of objects.

    JSON Serializer (Json):

    • Implemented exception handling for JSON encoding and decoding, including handling invalid JSON and circular references.

    Testing:

    • Added tests to verify correct behavior for both valid and invalid inputs, covering various edge cases.
  • Status changed to Needs review 7 months ago
  • ๐Ÿ‡ฎ๐Ÿ‡ณIndia anandhi karnan Chennai
  • Pipeline finished with Failed
    7 months ago
    Total: 767s
    #284968
  • Status changed to Needs work 7 months ago
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States smustgrave

    This broke all the tests it seems.

    Also made validation go down so that should be looked at.

  • Pipeline finished with Failed
    7 months ago
    Total: 907s
    #286828
  • Pipeline finished with Failed
    6 months ago
    Total: 789s
    #324325
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States cmlara

    Removing ['allowed_classes' => FALSE] fixed the majority of the failures.

    On a cursory glance many of the remaining failures appear to be test design failures in JSON API related tests:

    Json::decode((string) $response->getBody()); Appears to be called in several places where getBody() may be an empty string (which is NOT a valid JSON string).

    ResourceTestBase::assertResourceResponse() (a basic helper method) appears to trigger this as it will attempt to cast the response body as a decoding string to be used in the message without any knowledge of what the string value will be.

    Later in assertResourceResponse() it appears the test (incorrectly) was written to depend upon GetDocumentFromResponseTrait::getDocumentFromResponse() returning NULL on invalid JSON which is off-spec for the API.

    In Oembed:
    core/modules/media/src/OEmbed/ProviderRepository.php:116 appears to be checked by looking at the value being empty rather than looking for an error (another off-spec use of the API).
    core/modules/media/src/OEmbed/ResourceFetcher.php:67 appears to check the internal use of json_decode with json_last_error(). #3186415: Make oEmbed resource fetcher more tolerant of unexpected Content-Type headers โ†’ called out that the JSON Decoder failed to throw an exception.

    Essentially: Legitimate test failures caused by off-api usage of methods.

  • Pipeline finished with Failed
    6 months ago
    Total: 149s
    #325451
  • Pipeline finished with Failed
    6 months ago
    Total: 696s
    #325456
  • Pipeline finished with Success
    6 months ago
    Total: 1112s
    #325539
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States cmlara

    We have a passing test stack now. Setting needs review to get eyes on below:

    $ grep -R Json::decode *|grep -v Test

    The following entries are already converted to handle the exception (based on failing tests):

    core/modules/system/src/SecurityAdvisories/SecurityAdvisoriesFetcher.php:        $json_payload = Json::decode($response);
    core/modules/jsonapi/tests/src/Traits/GetDocumentFromResponseTrait.php:      $document = Json::decode((string) $response->getBody());
    core/modules/media/src/OEmbed/ResourceFetcher.php:        $data = Json::decode($content);
    core/modules/media/src/OEmbed/ProviderRepository.php:      $providers = Json::decode((string) $response->getBody());

    We may want to convert the following areas of code with no apparent failing tests to catch the exception. If so do we need to do that here or is that a followup? If we need to do that here can we skip the test gate? These are well out of scope of the bug being reported here.

    core/modules/announcements_feed/src/AnnounceFetcher.php:        $announcements = Json::decode($feed_content);	
    core/modules/jsonapi/src/Controller/EntityResource.php:      $document = Json::decode($request->getContent());
    core/modules/jsonapi/src/Controller/EntityResource.php:    $body = Json::decode($request->getContent());
    core/modules/ckeditor5/src/Plugin/Editor/CKEditor5.php:    $toolbar_items = Json::decode($json);
    

    Unsure on if we want these to be patched, or if we want the exceptions to rise through the stack:

    core/modules/package_manager/src/PathExcluder/UnknownPathExcluder.php:    $repositories = Json::decode($repositories);
    core/modules/package_manager/src/PathExcluder/UnknownPathExcluder.php:    $extra = Json::decode($this->composerInspector->getConfig('extra', $packages['drupal/core']->path . '/composer.json'));
    core/modules/package_manager/src/Validator/AllowedScaffoldPackagesValidator.php:    $extra = Json::decode($this->composerInspector->getConfig('extra', $path . '/composer.json'));
    core/modules/package_manager/src/Validator/ComposerPatchesValidator.php:      $extra = Json::decode($this->composerInspector->getConfig('extra', $working_dir));
    core/modules/package_manager/src/Validator/PhpTufValidator.php:    $repositories = Json::decode($repositories);
    core/modules/contextual/contextual.module:    $element['#links'] = Json::decode(rawurldecode($encoded_links));

    package_manager: We might want those errors to show, in either case as the module is experimental can we deffer that to package_manager project to figure out before it makes it to stable?

    contextual.module: This appears to be developer facing, we may want this to rise through the stack.

    Suggest we do not consider patching:
    core/modules/media/src/OEmbed/ResourceFetcher.php: return Json::decode($data);
    This is decoding data that just went through encode(), if this fails something is significantly wrong with the deployment.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States smustgrave

    Needs a manual rebase. This is one may need to ping a framework manager or core committer (usually same group)

  • Pipeline finished with Success
    4 months ago
    Total: 1163s
    #375352
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States cmlara

    Manual rebase (conflict was due to new comments in ResourceTestBase.php).

    Reran jobs with random failures.

  • ๐Ÿ‡จ๐Ÿ‡ญSwitzerland berdir Switzerland

    Coming from ๐Ÿ“Œ Make igbinary the default serializer if available, it saves 50% time on unserialize and memory footprint Active , the use case there is that we would have a serializer that uses igbinary when available, but we need to update the database cache backend to deal with such an error for edge cases when receiving a igbinary serialized string when that isn't available.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States cmlara

    I mentioned in Slack that the raised concern with Database Cache Backend likely needs significant architectural decisions that should not be rushed into this issue however I would implement it if it is the only way to get this to move forward.

    It turns out that there already is an issue to handle the request in ๐Ÿ› Database backends cannot deal with corrupt serialized data: app completey broken Needs review .

    Setting back to NR.

  • Pipeline finished with Success
    about 1 month ago
    Total: 825s
    #451286
  • Status changed to RTBC about 1 month ago
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States smustgrave

    Rebased as the MR had gotten 400+ commits back. All tests are still passing

    Looking at the MR I saw only 1 thread for feedback that was addressed so closed that.

    Not sure if this needs framework manager sign off but all feedback appears to be addressed, fingers crossed.

  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom catch

    One specific question on the MR, but also in general I'm wondering whether this needs to be a deprecation (or unsilenced E_USER_WARNING) changing to an exception in 12.x

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States cmlara

    Responded to MR comment.

    wondering whether this needs to be a deprecation (or unsilenced E_USER_WARNING) changing to an exception in 12.x

    My personal opinion is no. I can understand why this would be asked since this can be disruptive and given by the code being corrected there is clear proof that some implementations have misused this

    However the API is well defined, the proper response is an exception. Code that has properly implemented SerilizationInterface::decode would already have a try/catch block surrounding it.

    Today 3rd parties could replace the decoder using a class alias at the auto loader meaning the code that is broken by us fixing this can already be broken by a contrib module following reasonable design standards.

    The FALSE return was/is an internal component of the JSON decode method that leaked and should not have been utilized.

    As noted in the issue summary, this failure to throw an exception as documented on the API can lead to undetected data loss for those applications that properly follow the documented API.

    This bug fundamentally breaks the ability to use DI of the serialization services.

    I have no proven case to show this however a worst case fear is that silent data corruption could lead to security incidents.

Production build 0.71.5 2024