Change default escaping of str_getcsv() and fgetcsv() to empty string

Created on 20 October 2024, 2 months ago

Problem/Motivation

Follow-up to #3477324-7: Fix usage of str_getcsv() and fgetcsv() for PHP 8.4 β†’

Reading https://www.php.net/manual/en/function.fgetcsv.php

Warning
When escape is set to anything other than an empty string ("") it can result in CSV that is not compliant with Β» RFC 4180 or unable to survive a roundtrip through the PHP CSV functions. The default for escape is "\\" so it is recommended to set it to the empty string explicitly. The default value will change in a future version of PHP, no earlier than PHP 9.0.

Let's change our default to an empty string or at least to discuss whether we should change this.

Proposed resolution

As PHP docs propose the default should be changed to empty string

Remaining tasks

- Figure out how it may affect code
- patch/review/commit

User interface changes

Introduced terminology

API changes

Data model changes

Release notes snippet

πŸ“Œ Task
Status

Active

Version

11.0 πŸ”₯

Component

base system

Created by

πŸ‡«πŸ‡·France andypost

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @andypost
  • πŸ‡«πŸ‡·France andypost

    There's only 2 places will left after parent issue fixed

    core$ git grep str_getcsv -- core/lib
    core/lib/Drupal/Core/Mail/Plugin/Mail/PhpMail.php:101:        $value = str_getcsv($value, ',');
    core/lib/Drupal/Core/Mail/Plugin/Mail/SymfonyMailer.php:130:            $value = str_getcsv($value, ',');
    
  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    We're not really using this on CSV data but rather as a way to:

            // Split values by comma, but ignore commas encapsulated in double
            // quotes.
    

    I wonder what happens an email address contains \\

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    FWIW I'm pretty sure that this code is incorrect because

    $value = '"te,st"@example.com,test2@example.com';
    $value = str_getcsv($value, ',');
    

    results in a value of

    [
        'te,st@example.com',
        'test2@example.com',
    ]
    

    whereas I'm pretty sure we should be leaving the quotes in, ie...

    [
        '"te,st"@example.com',
        'test2@example.com',
    ]
    
  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    Interestingly \Drupal\Tests\system\Kernel\Mail\MailTest::testFromAndReplyToHeader seems to prove things are working as expected - BUT you cannot set Drupal's mail address to "mail@test"@example.com - which is valid according to https://github.com/egulias/EmailValidator/blob/4.0.2/tests/EmailValidato... so it's hard to properly test.

  • πŸ‡«πŸ‡·France andypost

    Looking for RFC I got answer from GPT

    Escaping should be reconsidered for email parsing: While RFC 4180 is the standard for CSV files, parsing email addresses should follow the more specific email RFCs (5321, 5322). If str_getcsv() no longer handles escape sequences, a custom parsing function or a more specialized library like Egulias's EmailValidator might be required to handle these cases properly.

  • πŸ‡«πŸ‡·France andypost

    For #5

    So looks like here 2 options:
    - set escape: "\\" with comment referencing RFCs for quoted part of email (which can be anything, even comment)
    - delegate parsing to some library

    but surely it needs a test at least for '"te,st"@example.com' which should not become 'te,st@example.com'

  • πŸ‡«πŸ‡·France andypost

    parent commited so the issue unblocked

Production build 0.71.5 2024