REST views: double encoding of apostrophes in REST Export display

Created on 6 December 2017, almost 7 years ago
Updated 5 March 2024, 7 months ago

Problem/Motivation

In the below sample of a REST export view output in JSON format, you can see that an apostrophe character (ASCII code 39) is double encoded in the form of \u0026#039;.

[{"book_background_pattern":"\/sites\/default\/files\/a_visit_to\/background_images\/avt_s18_background_pattern.jpg","cover":"\/sites\/default\/files\/a_visit_to\/background_images\/avt_doc_S18_cover.jpg","dark_color":"0073b9","accent_color":"a92825","light_color":"c7d5ee","header_background":"\/sites\/default\/files\/a_visit_to\/background_images\/avt_s18_header.png","title":"The Doctor\u0026#039;s Office: A 4D Book","vuforia_device_database":"\/sites\/default\/files\/a_visit_to\/doctors_office\/targets\/a_visit_to_doctors_office.zip","id":"8799","author":"Blake A. Hoena","illustrator":"","series":"A Visit to...","series_id":"268"}]

Steps to reproduce

  1. Create a node with a title containing an apostrophe character
  2. Create a view containing a REST Export display
  3. Set the view format to "Fields"
  4. Add the "Content:Title" field to the field list
  5. Preview the results of the view
  6. Observe that the apostrophe character is double encoded as \u0026#039; and not the expected '

Proposed resolution

Rollback special character encoder, escaping double quotes with a backslash in preview and output.

The regex searches the $output string for all occurrences of \\uXXXX where X is a hexadecimal character consisting of a digit 0-9 or letter A-F (case insensitive). e.g. \\u0026

For each match that it finds, it uses the mb_convert_encoding() function to convert that character from one encoding to another encoding. Then any double quote characters (") are prefixed with a slash character (\) so that they're properly escaped according to the JSON string requirements.

Remaining tasks

  • ✅ Update issue summary, to include the proposed resolution
  • ✅ Rollback special character encoder in the Views output
  • Rollback special character encoder in the Views preview
  • Add a test, showing the problem

User interface changes

API changes

Data model changes

Release notes snippet

🐛 Bug report
Status

Needs work

Version

11.0 🔥

Component
REST 

Last updated 15 days ago

Created by

🇺🇸United States alex.stone.filament

Live updates comments and jobs are added and updated live.
  • VDC

    Related to the Views in Drupal Core initiative.

  • Needs tests

    The change is currently missing an automated test that fails when run with the original code, and succeeds when the bug has been fixed.

  • Needs issue summary update

    Issue summaries save everyone time if they are kept up-to-date. See Update issue summary task instructions.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇩🇪Germany Internetter Erfurt, Thüringen

    I discovered other problems with rext export of views and image urls. Perhaps it is related:

    There was an encoding of parameter ampersand "&" as "\u0026amp;" for multiple parameter urls from image url formatter (using of focal_point).

  • Status changed to Needs review over 1 year ago
  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    29,366 pass
  • 🇩🇰Denmark ressa Copenhagen

    Thanks @RichardDavies! Your patch works perfectly in Drupal 10, leaving single quotes (') be, and the output a lot cleaner. Before and after:

    • "name": "C\u00f4te d\u0026#039;Ivoire",
      "name": "Côte d'Ivoire"
      
    • "name": "Pes\u00e4pallo",
      "name": "Pesäpallo"
      

    Also, much cleaner looking HTML (before and after):

    • "title": "Facts about C\u00f4te d\u0026#039;Ivoire"
      "title": "Facts about Côte d'Ivoire"
      
    • "field_body": "\u003Ch2\u003E1. Here are some facts\u003C\/h2\u003E ..."
      "field_body": "<h2>1. Here are some facts<\/h2> ..."
      

    There's also the related issue 🐛 single quote character not escaped in REST output Active about single quotes (') which I believe don't need to be HTML encoded into &#039;, since single quotes don't need escaping because proper JSON output is in double quotes.

    Should it be looked at here, or in the other issue?

    I am attaching a re-rolled patch for Drupal 10.1, since I have bad experiences with re-basing Drupal core MR's in Drupal's Gitlab. Also, this patch can then be used as a patch in Composer, since it is static.

  • 🇩🇰Denmark ressa Copenhagen

    Also, fixing 🐛 Allow JSON format when "Accepted request formats" is not defined Active would get REST and Views export in a great state, working out-of-the-box.

  • Status changed to Needs work over 1 year ago
  • 🇺🇸United States smustgrave

    Can the issue summary be updated to include the proposed resolution.

    Also a test showing the problem will be needed please

    Thanks!

  • 🇩🇰Denmark ressa Copenhagen

    Thanks for reviewing it @smustgrave. I would also be interested in a description of what the regex actually does. @RichardDavies: Perhaps you can help with this?

  • 🇩🇰Denmark ressa Copenhagen

    I also now see that the preview is still escaped, so we probably should do the same there? I'll add the tasks in the issue summary.

  • 🇺🇸United States RichardDavies Portland, Oregon

    @ressa The regex searches the $output string for all occurrences of \\uXXXX where X is a hexadecimal character consisting of a digit 0-9 or letter A-F (case insensitive). e.g. \\u0026

    For each match that it finds, it uses the mb_convert_encoding() function to convert that character from one encoding to another encoding. Then any double quote characters (") are prefixed with a slash character (\) so that they're properly escaped according to the JSON string requirements.

  • 🇩🇰Denmark ressa Copenhagen

    Thanks @RichardDavies! Both for working on this solution, and explaining the regex. I have added it in the Issue Summary.

  • 🇫🇷France nicolasgraph Strasbourg

    Patch #86 causes malformed UTF-8 characters for emojis.

Production build 0.71.5 2024