Ensure trailing whitespace at the end of a cache ID results in a unique cache item

Created on 22 June 2023, over 1 year ago
Updated 18 September 2024, 2 months ago

Drupal\Core\Cache\DatabaseBackend->get() can raise an "Undefined array key" error.

------ Original report ------
Sometimes I get the following in watchdog when saving a file to s3. The strange thing is that the file is uploaded.

Warning: Undefined array key "s3fs:uri:s3://attachments/test.pdf" in Drupal\Core\Cache\DatabaseBackend->getMultiple() (regel 121 van /data/code/project/www/core/lib/Drupal/Core/Cache/DatabaseBackend.php)
#0 /data/code/project/www/core/includes/bootstrap.inc(347): _drupal_error_handler_real(2, 'Undefined array...', '/data/code/jobb...', 121)
#1 /data/code/project/www/core/lib/Drupal/Core/Cache/DatabaseBackend.php(121): _drupal_error_handler(2, 'Undefined array...', '/data/code/jobb...', 121)
#2 /data/code/project/www/core/lib/Drupal/Core/Cache/DatabaseBackend.php(92): Drupal\Core\Cache\DatabaseBackend->getMultiple(Array, false)
#3 /data/code/project/www/modules/contrib/s3fs/src/StreamWrapper/S3fsStream.php(1391): Drupal\Core\Cache\DatabaseBackend->get('s3fs:uri:s3://a...')
#4 /data/code/project/www/modules/contrib/s3fs/src/StreamWrapper/S3fsStream.php(1362): Drupal\s3fs\StreamWrapper\S3fsStream->readCache('s3://attachment...')
#5 /data/code/project/www/modules/contrib/s3fs/src/StreamWrapper/S3fsStream.php(1283): Drupal\s3fs\StreamWrapper\S3fsStream->getS3fsObject('s3://attachment...')
#6 /data/code/project/www/modules/contrib/s3fs/src/StreamWrapper/S3fsStream.php(1098): Drupal\s3fs\StreamWrapper\S3fsStream->stat('s3://attachment...')
#7 [internal function]: Drupal\s3fs\StreamWrapper\S3fsStream->url_stat('s3://attachment...', 6)
#8 /data/code/project/www/core/lib/Drupal/Core/File/FileSystem.php(559): file_exists('s3://attachment...')
#9 /data/code/project/www/modules/contrib/s3fs/src/S3fsFileService.php(429): Drupal\Core\File\FileSystem->getDestinationFilename('s3://attachment...', 0)
#10 /data/code/project/www/core/modules/file/src/Upload/FileUploadHandler.php(193): Drupal\s3fs\S3fsFileService->getDestinationFilename('s3://attachment...', 0)
#11 /data/code/project/www/core/modules/file/file.module(846): Drupal\file\Upload\FileUploadHandler->handleFileUpload(Object(Drupal\file\Upload\FormUploadedFile), Array, 's3://attachment...', 0)
#12 /data/code/project/www/core/modules/file/file.module(722): file_save_upload('field_objection...', Array, 's3://attachment...', NULL, 0)
#13 /data/code/project/www/core/modules/file/file.module(1447): _file_save_upload_from_form(Array, Object(Drupal\Core\Form\FormState))
#14 /data/code/project/www/core/modules/file/src/Element/ManagedFile.php(76): file_managed_file_save_upload(Array, Object(Drupal\Core\Form\FormState))
#15 /data/code/project/www/modules/custom/project/src/Plugin/Field/FieldWidget/AdditionsFileWidget.php(324): Drupal\file\Element\ManagedFile::valueCallback(Array, Array, Object(Drupal\Core\Form\FormState))
#16 [internal function]: Drupal\project\Plugin\Field\FieldWidget\AdditionsFileWidget::value(Array, Array, Object(Drupal\Core\Form\FormState))
#17 /data/code/project/www/core/lib/Drupal/Core/Form/FormBuilder.php(1268): call_user_func_array(Array, Array)

Steps to reproduce

$cache = \Drupal::cache();
$cache->set('test1', 'lowercase');
$cache->get('test1'); //  Good here
$cache->get('test1 '); // The trailing space causes an issues because mysql ignores trailing spaces on primary keys and will return a result of cid 'test1' instead of 'test1 '

Proposed resolution

When MySQL queries with a condition on a varchar, and the string has a trailing space, MySQL will return a result where there is a value for the same string without a trailing space. In the database cache backend, this creates a mis-match between the passed in cache IDs and the returned cache ID, resulting in a PHP warning.

When we normalize cache IDs, we avoid hashing ASCII cache IDs and send the literal. When the cache ID has a trailing space, we can hash that too, then it'll genuinely be unique (i.e. return a cache miss).

Test only job: https://git.drupalcode.org/project/drupal/-/jobs/2310869

πŸ› Bug report
Status

Fixed

Version

10.3 ✨

Component
CacheΒ  β†’

Last updated 3 days ago

Created by

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @sboden
  • πŸ‡ΊπŸ‡ΈUnited States cmlara

    I'm going to send this over to Drupal Core as this error itself would appear to originate from the cache system.

    If we could see the file URI that would be helpful for the core developers.

    I was able to reproduce this with the following code (going off my knowledge of database primary key limitations for mysql) which may or may not be the cause of the original post

    $cache = \Drupal::cache();
    $cache->set('test1', 'lowercase');
    $cache->get('test1'); //  Good here
    $cache->get('test1 '); // The trailing space causes an issues because mysql ignores trailing spaces on primary keys and will return a result of cid 'test1' instead of 'test1 '
    

    https://dev.mysql.com/doc/refman/5.7/en/char.html for documentation on trailing pad characters and character primary keys.

    @sboden Note that even if this is space related that even if core fixes this we will still not support paths with trailing spaces until 4.x at the earliest, see πŸ› Issue with directories and trailing whitespaces Needs work . While we have work to do in s3fs, the cache system is suppose to handle converting strings into a form that can be stored/retrieved so a trailing slash in a cache::get() is not to my knowledge a fault in s3fs.

  • πŸ‡ΈπŸ‡ͺSweden twod Sweden

    We get the same type of error when page cache stores the results of a view with an exposed filter of some value, and you filter again for the same string with a different combination of uppercase/lowercase characters.

    Page cache generated a cache id with the first search string and stored that in the cache table via the database backend.
    The next time page cache asks for the same string (only case differs) the database backend performs a case-insensitive query and returns the original item, which has a different cache id compared to what it put in the mapping array.

    IIRC MariaDB, MySQL and SQLite all do case-insensitive queries by default, unsure about PostgreSQL, so maybe it would be easiest to make the database backend just lowercase the cids when normalizing them.
    Other cache backends, like memory, will of course treat these as different cache ids. The interface does not explicitly mention how this is intended to work so maybe we can get away with that?

  • πŸ‡ΈπŸ‡ͺSweden twod Sweden

    There is indeed a test which validates that all cache backends have case-sensitive cache ids in GenericCacheBackendUnitTestBase, but what makes it case-[in]sensitive is (obvious in hind-sight), the selected collation for the cid column.

    We were using utf8mb4_sv_0900_ai_ci, and switching to utf8mb4_sv_0900_as_cs for just that column on the cache tables fixes the issue with notices for us.

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    Can people who have the problem originally reported here confirm the following details:

    1. What database you using - including version - you can get this information on admin/reports/status
    2. If you connect to your database run the query select cid from cache_page limit 1; and then use the cid returned to a select where you have changed the case of the cid select cid from cache_page where cid = 'cid_case_changed';
    3. The output of the SQL statement show create table cache_page; (mysql) or the CLI command pg_dump --table cache_page --schema-only (postgres)

    For the create table I get

    | cache_page | CREATE TABLE `cache_page` (
      `cid` varchar(255) CHARACTER SET ascii COLLATE ascii_bin NOT NULL DEFAULT '' COMMENT 'Primary Key: Unique cache ID.',
      `data` longblob COMMENT 'A collection of data to cache.',
      `expire` bigint NOT NULL DEFAULT '0' COMMENT 'A Unix timestamp indicating when the cache entry should expire, or -1 for never.',
      `created` decimal(14,3) NOT NULL DEFAULT '0.000' COMMENT 'A timestamp with millisecond precision indicating when the cache entry was created.',
      `serialized` smallint NOT NULL DEFAULT '0' COMMENT 'A flag to indicate whether content is serialized (1) or not (0).',
      `tags` longtext COMMENT 'Space-separated list of cache tags for this entry.',
      `checksum` varchar(255) CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL COMMENT 'The tag invalidation checksum when this entry was saved.',
      PRIMARY KEY (`cid`),
      KEY `expire` (`expire`),
      KEY `created` (`created`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='Storage for the cache API.' |
    

    And although the collation for the table is utf8mb4_0900_ai_ci the queries are case-sensitive.

    mysql>  select cid from cache_page where cid = 'http://drupal8alt.test/fr:';
    +----------------------------+
    | cid                        |
    +----------------------------+
    | http://drupal8alt.test/fr: |
    +----------------------------+
    1 row in set (0.00 sec)
    
    mysql>  select cid from cache_page where cid = 'Http://drupal8alt.test/fr:';
    Empty set (0.00 sec)
    

    That's because the cid column has it's pwn collation of ascii_bin.

  • First commit to issue fork.
  • πŸ‡¬πŸ‡§United Kingdom catch

    The trailing space issue is easy to reproduce without needing any special collation, i.e. the steps given in #2. I've pushed some test coverage that should show it on gitlab too hopefully.

  • Merge request !9030Add test coverage. β†’ (Closed) created by catch
  • Pipeline finished with Failed
    4 months ago
    Total: 567s
    #241148
  • πŸ‡¬πŸ‡§United Kingdom catch

    Here we go:

    https://git.drupalcode.org/project/drupal/-/jobs/2308350

    Fail      Other      phpunit-55.xml       0 Drupal\KernelTests\Core\Cache\Datab
        PHPUnit Test failed to complete; Error: PHPUnit 10.5.29 by Sebastian
        Bergmann and contributors.
        
        Runtime:       PHP 8.3.10
        Configuration: /builds/project/drupal/core/phpunit.xml.dist
        
        ..........W...                                                    14 / 14
        (100%)
        
        Time: 00:11.572, Memory: 8.00 MB
        
        1 test triggered 1 PHP warning:
        
        1)
        /builds/project/drupal/core/lib/Drupal/Core/Cache/DatabaseBackend.php:139
        Undefined array key "trailing-space-test"
        
        Triggered by:
        
        * Drupal\KernelTests\Core\Cache\DatabaseBackendTest::testSetGet
         
        /builds/project/drupal/core/tests/Drupal/KernelTests/Core/Cache/DatabaseBackendTest.php:51
        
    
  • Status changed to Needs review 4 months ago
  • πŸ‡¬πŸ‡§United Kingdom catch

    And pushed a separate commit with a suggested fix - comment could probably use some work, I couldn't find MySQL docs specific to treatment of trailing spaces in indexes as opposed to comparisons in general.

    Major because this is potentially contributing to the 404 issue reported in πŸ› After updating Drupal from version 10.2.7 to version 10.3.0, a 404 error occurred! Needs review .

  • πŸ‡ΊπŸ‡ΈUnited States cmlara

    All MySQL collations are of type PAD SPACE. This means that all CHAR, VARCHAR, and TEXT values are compared without regard to any trailing spaces. β€œComparison” in this context does not include the LIKE pattern-matching operator, for which trailing spaces are significant.

    For those cases where trailing pad characters are stripped or comparisons ignore them, if a column has an index that requires unique values, inserting into the column values that differ only in number of trailing pad characters results in a duplicate-key error. For example, if a table contains 'a', an attempt to store 'a ' causes a duplicate-key error.

    https://dev.mysql.com/doc/refman/5.7/en/char.html

  • Pipeline finished with Success
    4 months ago
    Total: 517s
    #241168
  • πŸ‡¬πŸ‡§United Kingdom catch
  • πŸ‡¬πŸ‡§United Kingdom catch
  • Merge request !9045Don't trim trailing space this time β†’ (Closed) created by catch
  • Status changed to RTBC 4 months ago
  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    This looks great now. Nice reviews @cmlara and nice MR @catch.

  • Pipeline finished with Success
    4 months ago
    Total: 457s
    #242226
  • πŸ‡¬πŸ‡§United Kingdom catch

    Oh good spot on the whitespace, I briefly noticed it but thought it was my eyes playing tricks on me instead of an actual bug :(

    Retitling for the actual problem we're fixing.

    • larowlan β†’ committed fb2469d3 on 11.x
      Issue #3368537 by catch, TwoD, alexpott, cmlara, sboden: Ensure trailing...
    • larowlan β†’ committed ea0088d7 on 10.4.x
      Issue #3368537 by catch, TwoD, alexpott, cmlara, sboden: Ensure trailing...
    • larowlan β†’ committed 71b0922a on 11.0.x
      Issue #3368537 by catch, TwoD, alexpott, cmlara, sboden: Ensure trailing...
  • πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10

    Committed to 11.x and backported to 10.4.x and 11.0.x
    Waiting for a second opinion about 10.3.x, leaving at RTBC

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    @larowlan thanks for the commit. We have decent evidence this is behind some of the original reports on πŸ› After updating Drupal from version 10.2.7 to version 10.3.0, a 404 error occurred! Needs review and will fix a possible regression in 10.3.x

  • Status changed to Fixed 4 months ago
  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    Discussed with @catch and @larowlan - we agreed to backport this to 10.3.x

  • Automatically closed - issue fixed for 2 weeks with no activity.

  • πŸ‡§πŸ‡ͺBelgium wim leers Ghent πŸ‡§πŸ‡ͺπŸ‡ͺπŸ‡Ί

    WOW, what a crazy edge case! 🀯

Production build 0.71.5 2024