Maxlength validation counts newlines twice

Created on 3 August 2021, almost 4 years ago
Updated 11 February 2023, over 2 years ago

Steps to reproduce

  • Add a maxlength attribute to a textarea and set it to 400
  • Write a text with new lines with exactly 400 characters including some new lines where every new line counts as 1 character (like .val().length in JavaScript counts)

We found this issue with some custom JavaScript which shows to the user a counter how many characters are left. The counter shows "0 characters left" but the PHP validation claims that too many characters are used.

Proposed resolution

Problem is in FormValidator class the method performRequiredValidation(). mb_strlen($elements['#value']) counts evey new line as two characters. As suggested here https://stackoverflow.com/questions/68618268/mb-strlen-counts-new-lines-... the numbers of new lines should be deducted so that each new line becomes counted ony once:

   if (isset($elements['#maxlength'])) {
      $countCharacters = mb_strlen($elements['#value']) - substr_count($elements['#value'], "\n");
      if ($countCharacters > $elements['#maxlength']) {
        $form_state->setError($elements, $this->t('@name cannot be longer than %max characters but is currently %length characters long.', ['@name' => empty($elements['#title']) ? $elements['#parents'][0] : $elements['#title'], '%max' => $elements['#maxlength'], '%length' => $countCharacters]));
      }
    }

Remaining tasks

Tests
Code review

πŸ› Bug report
Status

Needs work

Version

9.5

Component
FormΒ  β†’

Last updated 2 days ago

Created by

πŸ‡©πŸ‡ͺGermany tobiberlin Berlin, Germany

Live updates comments and jobs are added and updated live.
  • Needs tests

    The change is currently missing an automated test that fails when run with the original code, and succeeds when the bug has been fixed.

Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    This issue is being reviewed by the kind folks in Slack, #needs-review-queue-initiative. We are working to keep the size of Needs Review queue [2700+ issues] to around 400 (1 month or less), following Review a patch or merge request β†’ as a guide.

    For the tests requested in #4

  • Status changed to Active 3 months ago
  • πŸ‡ΊπŸ‡ΈUnited States dcam

    I verified the problem on my Windows laptop. This is still an issue with the latest 11.x-dev.

    The issue summary has been updated with information about the actual problem. This isn't the result of PHP "counting new line characters twice." The problem is that in Windows new lines are literally two characters, a carriage return and line feed (\r\n). JavaScript and browsers ignore the \r when calculating string length. PHP counts them both.

    Thus, the former proposed fix and patch in #3 are unacceptable solutions. They would cause strings to pass form validation, but would eventually result Drupal trying to save strings that are too long into database fields whose sizes are set to the given maxlengths. That would result in even worse errors, probably WSoDs.

    I don't know what the best solution is to the problem. A framework manager will have to decide.

  • Merge request !11677Added a basic FunctionalJavascript test β†’ (Open) created by dcam
  • Pipeline finished with Failed
    3 months ago
    Total: 235s
    #460492
  • Pipeline finished with Failed
    3 months ago
    Total: 593s
    #460495
  • πŸ‡ΊπŸ‡ΈUnited States dcam

    I wrote a very basic FunctionalJavascript test that demonstrates the issue. The test fails because it finds the server-side validation error message after submitting the form. It will have to be modified appropriately after a solution for the bug has been decided, but it demonstrates that there's a problem for now. I wanted to make it verify the contents of the textarea field before submitting the form, but I don't know how. This is literally the first FunctionalJavascript test that I've written.

    Anyway, I'm leaving the Needs Tests tag since it will have to be improved or modified. But I'm changing the status to Needs Review to get the attention of a maintainer for an opinion on a fix.

  • πŸ‡ΊπŸ‡ΈUnited States dcam
  • πŸ‡ΊπŸ‡ΈUnited States dcam

    Form validation issues are described as having Major status.

  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Not 100% sure putting into review will grab the attention.

    Though it's good to have the hard part done in a test.

    Will post in slack though to try and kick start it.

  • Looking at the HTML5 Standard, the maxlength attribute for a form control is validated by its "API Value".

    In the case of textarea elements, the API value and value differ. In particular, newline normalization is applied before the maximum allowed value length is checked (whereas the textarea wrapping transformation is not applied).

    And the newline normalization converts "\r\n" and "\r" to "\n". So in PHP, the maxlength should be validated against the submitted value with the same newline normalization.

    This likely affects submissions regardless of the user's brower/OS. Also per the HTML5 standard, the submitted form values should have "\n" and "\r" normalized to "\r\n". So the form values in PHP should have "\r\n" exclusively. Testing on Chrome in MacOS, I can validate that this is what I've observed. Putting a breakpoint on these lines in Drupal\Core\Form\FormValidator::performRequiredValidation():

        // Verify that the value is not longer than #maxlength.
        if (isset($elements['#maxlength']) && mb_strlen($elements['#value']) > $elements['#maxlength']) {
          $form_state->setError($elements, $this->t('@name cannot be longer than %max characters but is currently %length characters long.', [
            '@name' => empty($elements['#title']) ? $elements['#parents'][0] : $elements['#title'],
            '%max' => $elements['#maxlength'],
            '%length' => mb_strlen($elements['#value']),
          ]));
        }
    

    If I only hit "return" 20 times in a textarea with #maxlength 20 and submit the form, then the value of $elements['#value'] is "\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n", and mb_strlen($elements['#value']) is 40.

    I think best way forward is to do the same newline normalization in Drupal\Core\Render\Element\Textarea::valueCallback(), so that the maxlength validation is equivalent on the frontend and backend. I think it also makes sense to store this normalized value, so that there isn't an error if the same character limit is applied on the database column, but it needs to be validated that when the value is retrieved and presented back to browsers, that this doesn't create any issues across browsers/OSes.

  • πŸ‡ΊπŸ‡ΈUnited States dcam

    Updated the issue summary with the appropriate solution.

  • Pipeline finished with Canceled
    3 months ago
    Total: 127s
    #467942
  • Pipeline finished with Success
    3 months ago
    Total: 912s
    #467944
  • πŸ‡ΊπŸ‡ΈUnited States dcam

    I implemented the prescribed fix per the HTML5 standard. The FunctionalJavascript test was reverted in favor of Unit test cases in the existing TextareaTest.

  • Because of the browser behavior described in #19, I think it makes sense to include a Browser test after all. I think it'd also make sense to test that the saved value(s) do not exceed DB column sizes, so I think the test could look like this:

    • Create an content entity type with a text field (Drupal\text\Plugin\Field\FieldType\TextItem) and set the maxlength storage setting to something low
    • Visit entity create form and submit the form where the text field value length should equal the maxlength. This actually should be done three times, with "\r\n", "\n", and "\r" separately included in the text value
    • Save should occur without a database exception
    • Load the entity from database storage after save and confirm that the text contains "\n" and not "\r\n" or "\r"
  • πŸ‡ΊπŸ‡ΈUnited States dcam

    @godotislate this issue is about textarea elements. It looks like core's fields with textarea elements don't have settings for maxlength attributes. Am I missing something?

  • Ah right, I'd forgotten that the textarea widget isn't used on text field items. Maybe the unit test is enough, so putting to RTBC.

  • πŸ‡«πŸ‡·France nod_ Lille

    There is suggestion in the MR

  • Pipeline finished with Canceled
    2 months ago
    Total: 77s
    #494898
  • Pipeline finished with Success
    2 months ago
    Total: 425s
    #494899
  • Applied the MR suggestion and rebased. Since the change is minor and there were no other suggestions, going to push this back to RTBC.

  • Status changed to RTBC 13 days ago
  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    Some thoughts while reviewing this issue.

    1. Is it really okay to change user input. Normally Drupal saves what the user enters and makes changes on output. This is pretty much how the filter system works. We do have precedence for changing the value on input TextField does return str_replace(["\r", "\n"], '', $input);.
    2. Should we make the change to \Drupal\Core\Form\FormValidator::performRequiredValidation so any contrib or custom implementations benefit given that that is where max length validation occurs.
    3. What about \Drupal\Core\Validation\Plugin\Validation\Constraint\LengthConstraint and the constraints - looking at our text field types we are not using constraints - but we probably should at some point.
    4. This will make values submitted via forms behave different from JSON:API - which will be surprising.

    I'm not sure what all these thoughts mean at this point but I just wanted to note them down.

  • Is it really okay to change user input. Normally Drupal saves what the user enters and makes changes on output.

    The reason to do that here is that it's possible that database storage has the same character limit, so without converting "\r\n" to "\n", it's possible there could be a database error if it passes JS and PHP validation. I don't think core has any instance of character-limited storage for a field that has textareas used for entry, but it seems possible for something in contrib or custom code.

    Also, the browser is already converting "\n" or "\r" to "\r\n" on submit, so technically for users on non-Windows clients, the input they entered has already been changed.

    I think I did try awhile ago to see if other web projects might be handling "\r\n" for similar concerns, but didn't find anything conclusive.

  • πŸ‡¬πŸ‡§United Kingdom alexpott πŸ‡ͺπŸ‡ΊπŸŒ

    Also, the browser is already converting "\n" or "\r" to "\r\n" on submit, so technically for users on non-Windows clients, the input they entered has already been changed.

    Really? Yes it does.... I've confirmed that with a debugger. This is quite messed up right? What a PITA. FWIW though how real is it to enforce max length on the database level for a field using a text area? And is doing that even correct. I don't think it is. Sure you can enforce max length on the field level but backing a text area field with anything other than big text field feels like you are opening a can of worms.

    Also I think that point 4 from #27 is important.

  • Really? Yes it does.... I've confirmed that with a debugger. This is quite messed up right? What a PITA.

    Yes, this was surprising to me as well, but apparently it is HTML spec. See #19 πŸ› Maxlength validation counts newlines twice Needs work .

    Sure you can enforce max length on the field level but backing a text area field with anything other than big text field feels like you are opening a can of worms.

    I don't think I necessarily disagree, esp for sites with content translation, but is it obvious? #5 πŸ› Maxlength validation counts newlines twice Needs work and #13 πŸ› Maxlength validation counts newlines twice Needs work brought up the storage concerns.
    Also, it's not uncommon for stakeholders to request to have title (replacement) fields that support formatted text and new lines. This could be implemented as a TextItem field, with an alter to allow a textarea widget to be used. But maybe it's too edge-casey?

    Also I think that point 4 from #27 is important.

    Agreed, not sure about what to do about that and open to ideas. Is maxlength checked at all for JSON:API requests, since the validation is done in FormValidator?

  • πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10

    In \Drupal\Core\Render\Element\Textfield::valueCallback we have return str_replace(["\r", "\n"], '', $input); which unless I'm mistaken is also problematic here because in the case of windows, \r\n becomes \n\n and would result in a different max-length validation to that of the browser - should we be handling that (And any other text-like inputs like telephone, machine name etc) here as well - the issue title doesn't specifically mention text areas.

  • In \Drupal\Core\Render\Element\Textfield::valueCallback we have return str_replace(["\r", "\n"], '', $input);

    That str_replace() strips out every instance of \r or \n from the input, so \r\n would be stripped out as well. Though technically, textfields and other <input> elements don't allow new lines in input: https://stackoverflow.com/a/42744064. That's a stackoverflow answer, but from a quick Google, official resources seem to say the same without being as declarative.

    Doing some git archaeology, I found that that str_replace comes from way back in #53480: required checkbox not stylized like a required field β†’ , and it was for form input hardening (see #6 β†’ ):

    We remove linebreaks from mere textfields, there is no way this breaks user data, only hackers are interested. This puts many header injection attacks at rest.

    If we need to add similar hardening for non-form submissions to telephone, machine_name, etc., I think that'd be a separate issue?

    Back to textareas, a couple thoughts:

    I don't think there are any field widgets in core that set the #maxlength property on a Textarea form element object. So if we gate replacing "/\r\n?/" with "\n" based on whether #maxlength is set, something like this:
    return isset($element['#maxlength']) ? preg_replace("/\r\n?/", "\n", (string) $input) : (string) $input;
    then values submitted via JSON:API requests will be the same as content entity form submissions, except for contrib/custom field widgets that set #maxlength on the Textarea. Arguably it would be on the devs implementing the field widgets, and possibly their respective field types, to add the appropriate Constraints and normalize the JSON:API data.

    An alternative idea is to add a kernel request event subscriber that goes through all properties in $request->query and $request->request, and request content as well for requests with JSON bodies, and replace all \r\n with \n. But doing this for request content is tricky, and I think this approach overall might be overly aggressive.

  • πŸ‡ΊπŸ‡ΈUnited States dcam

    I don't think there are any field widgets in core that set the #maxlength property on a Textarea form element object.

    There aren't, unless something changed recently. I checked before writing #23.

  • πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10

    Gating it behind fields with a maxlength is a great idea - nice one

  • First commit to issue fork.
  • Pipeline finished with Failed
    9 days ago
    Total: 239s
    #537634
  • πŸ‡¬πŸ‡§United Kingdom oily Greater London

    Some work needed on the test case. Add newlines to the random text and test how that affects the length?

  • Pipeline finished with Failed
    9 days ago
    Total: 696s
    #537638
  • Pipeline finished with Success
    9 days ago
    Total: 884s
    #538362
  • Gating it behind fields with a maxlength is a great idea - nice one

    Added this to the MR and additional tests. Ready for review again.

  • πŸ‡¬πŸ‡§United Kingdom oily Greater London

    @godotislate I have looked at your new kernel test coverage. Can you please advise me if you think I should refactor my test coverage 'borrowing' from your code to take account of new lines? I am not sure my test coverage is currently fit for purpose?

  • Can you please advise me if you think I should refactor my test coverage 'borrowing' from your code to take account of new lines?

    I don't think it's easily accomplishable in a unit test. As for the rest, I would just leave it as is.

  • πŸ‡¬πŸ‡§United Kingdom oily Greater London

    #29 +1. I realised I had come across this before with Symfony textarea field. I wanted to allow the user to enter a list, actually Black Cab runs which students of the Knowledge study and memorise, for example,

    Leave on Left Green Lanes
    Right Highbury New Park
    etc
    etc
    etc
    Forward into Gibson Square

    All the runs follow a similar pattern, but they vary a lot in length (how many lines in total).

    You have a choice to either present one textbox per line or a textarea for all the lines. Thinking it through I realised that the newlines are easily handled by explode() on the TEXT (string) value. Then custom validation can be applied to the array. The flexibility of the textarea field type, including the length of the field value, is what seems to make it so useful.

    I think there might be edge cases where a maxlength value is desirable. But IMO it is best to leave the core textarea field as it is. Developers can create a custom field or use custom validation. But such cases will be relatively infrequent.

Production build 0.71.5 2024