Module names on the Available Updates page are double-escaped

Created on 22 July 2019, over 5 years ago
Updated 28 July 2023, over 1 year ago

Problem/Motivation

Try, for example, with https://www.drupal.org/project/entityreference_dragdrop , whose info.yml file says:

name: Entity Reference Drag & Drop

As for tests, I can see a theme whose name contains special characters, presumably for testing this (modules/block/tests/modules/block_test/themes/block_test_specialchars_theme/block_test_specialchars_theme.info.yml), but we don't seem to have a module for this! So that means there's maybe no test coverage for the Extend page, even though that gets the escaping right.

Steps to reproduce

Proposed resolution

Decode the title in template_preprocess_update_project_status

Remaining tasks

Add Test
review
commit

User interface changes

Before

After

API changes

Data model changes

Release notes snippet

🐛 Bug report
Status

Needs review

Version

11.0 🔥

Component
Update 

Last updated about 6 hours ago

  • Maintained by
  • 🇺🇸United States @tedbow
  • 🇺🇸United States @dww
Created by

🇬🇧United Kingdom joachim

Live updates comments and jobs are added and updated live.
  • Needs tests

    The change is currently missing an automated test that fails when run with the original code, and succeeds when the bug has been fixed.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • last update over 1 year ago
    29,885 pass, 1 fail
  • last update over 1 year ago
    29,886 pass
  • 🇳🇿New Zealand quietone

    I don't know why the fix is made in two places. In my testing I found that having it only in the preprocess function was sufficient. Therefor, I have removed the change from the update manager. I also added a simple unit test. Also, screenshots in the Issue Summary.

  • Status changed to RTBC over 1 year ago
  • 🇺🇸United States smustgrave

    Thanks @quietone test case seems to show the problem so will remove that tag

    Fix LGTM.

  • last update over 1 year ago
    29,909 pass
  • Status changed to Needs work over 1 year ago
  • 🇫🇮Finland lauriii Finland
    1. +++ b/core/modules/update/tests/src/Unit/ProjectStatusTest.php
      @@ -0,0 +1,39 @@
      +        'title' => Html::escape('This & that'),
      

      I think it would be better to write an integration test for this to make sure that this continues to work if the input changes.

    2. +++ b/core/modules/update/update.report.inc
      @@ -5,6 +5,7 @@
      +use Drupal\Component\Utility\Html;
      
      @@ -149,6 +150,7 @@ function template_preprocess_update_project_status(&$variables) {
      +  $variables['title'] = Html::decodeEntities($variables['title']);
      

      Why do we need to call \Drupal\Component\Utility\Html::decodeEntities here? Shouldn't we wrap the title with \Drupal\Component\Render\MarkupInterface where it gets escaped?

  • 🇳🇿New Zealand quietone

    It looks like it is escaped in the release xml.

    <?xml version="1.0" encoding="utf-8"?>
    <project xmlns:dc="http://purl.org/dc/elements/1.1/">
      <title>Save &amp;amp; Edit</title>
      <short_name>save_edit</short_name>
      <dc:creator>himerus</dc:creator>
  • Status changed to Needs review over 1 year ago
  • last update over 1 year ago
    29,891 pass, 2 fail
  • last update over 1 year ago
    29,912 pass
  • 🇳🇿New Zealand quietone

    The comment above is not correct.
    The XML from d.o is

    <project>
    <title>Save &amp; Edit</title>
    <short_name>save_edit</short_name>
    <dc:creator>himerus</dc:creator>
    <type>project_module</type>
    <supported_branches>8.x-1.</supported_bran
    

    where it is escaped but not double escape. The extra escape happens in the fetching process, in stream_get_contents().

    It took awhile to figure out how the update Functional tests work. Or, at least enough to add tests for problem.

    #24
    1. This patch adds a new test module and update XML with double escaped characters. The additions to \Drupal\Tests\update\Functional\UpdateContribTest::testUpdateContribBasic test the text on the updates page.
    2. I don't really understand this nor do I know about MarkupInterface. I opted to decode the title just after it is fetched.

    The interdiff is between the success and fail patch. There is no interdiff with a previous patch because this is a new approach. Testing confirms that the screenshots are still correct.

  • Status changed to RTBC over 1 year ago
  • 🇺🇸United States smustgrave

    Additional test coverage looks good.

    For #24
    1 looks like additional test coverage has been added
    2 decodeEntities is still there but a comment was added.

    LGTM!

  • last update over 1 year ago
    29,947 pass
  • last update over 1 year ago
    29,954 pass
  • last update over 1 year ago
    29,954 pass
  • last update over 1 year ago
    29,959 pass
  • last update over 1 year ago
    29,959 pass
  • last update over 1 year ago
    29,959 pass
  • last update over 1 year ago
    29,960 pass
  • last update over 1 year ago
    29,972 pass
  • last update over 1 year ago
    30,050 pass
  • last update over 1 year ago
    30,057 pass
  • last update over 1 year ago
    30,057 pass
  • 🇳🇿New Zealand quietone

    I'm triaging RTBC issues .

    After reading the IS and comments I don't see anything left to do. I guess there is still a question of whether adding the double escape to the title in source XML is the correct way to test this but I didn't see another way. See #26

    Leaving at RTBC.

  • last update over 1 year ago
    30,059 pass
  • last update over 1 year ago
    30,061 pass
  • last update over 1 year ago
    30,064 pass
  • last update over 1 year ago
    30,118 pass, 4 fail
  • last update over 1 year ago
    30,135 pass
  • Status changed to Needs review over 1 year ago
  • 🇳🇿New Zealand quietone

    Neither failure is related to the change here. Retesting.

  • Status changed to RTBC over 1 year ago
  • 🇺🇸United States smustgrave

    All green!

  • last update over 1 year ago
    30,136 pass
  • last update over 1 year ago
    30,137 pass
  • last update over 1 year ago
    30,137 pass
  • last update over 1 year ago
    30,147 pass
  • last update over 1 year ago
    30,147 pass
  • last update over 1 year ago
    30,151 pass
  • 26:11
    25:03
    Running
  • last update over 1 year ago
    30,162 pass
  • last update over 1 year ago
    30,169 pass
  • last update over 1 year ago
    30,157 pass, 2 fail
  • last update over 1 year ago
    30,169 pass
  • 🇺🇸United States smustgrave

    Odd I see all green.

  • last update over 1 year ago
    30,206 pass
  • last update over 1 year ago
    30,209 pass
  • last update over 1 year ago
    30,361 pass
  • last update over 1 year ago
    30,362 pass
  • Status changed to Needs work over 1 year ago
  • 🇺🇸United States xjm
    1. +++ b/core/modules/update/src/UpdateProcessor.php
      @@ -175,6 +176,9 @@ public function processFetchTask($project) {
      +        // The title is escaped in the XML from drupal.org and the fetch
      +        // process will encode it again. Therefor, it is decoded here.
      

      Two things:

      • Drupal.org is canonically capitalized when used as a sentence. (Or put otherwise, there is a website at https://drupal.org and it is called "Drupal.org".)
      • "Therefore" is misspelled.
    2. +++ b/core/modules/update/src/UpdateProcessor.php
      @@ -175,6 +176,9 @@ public function processFetchTask($project) {
      +        $available['title'] = Html::decodeEntities($available['title']) ?? '';
      

      Honestly, this makes me quite nervous. I read the explanation in the comment, but still.

      I mean, if the update data from d.o is compromised, the site has way worse problems than a little XSS in a title (i.e., full RCE). The same is true if the module author is malicious. But I'd still be inclined to harden it after the decode so that it's protected in case of future refactoring etc. This would also mean an additional test case for a

      Test module <script>confirm('DO YOU LIKE MY MODULE???');</script>
      

      or something.

    3. Has anyone tried the approach @lauriii suggested, Since @quietone confirmed it was single-escaping in the XML. That would also allow us to sanitize it for disallowed markup and mark it as safe at the same time.

  • 🇳🇿New Zealand quietone

    35.1. Fixed.

    I am not running tests because it is only a comment change.

    Still needs work for 35, 1 and 2.

  • 🇺🇸United States mfb San Francisco

    As far as I see, comment #25 correct, the bug appears to be on drupal.org

    https://updates.drupal.org/release-history/save_edit/current has <title>Save &amp;amp; Edit</title> - this is double escaped.

  • 🇺🇸United States mfb San Francisco

    I found the bug in project module release/project_release.drush.inc

        $xml = new SimpleXMLElement('<root></root>');
        $xml->release->name = check_plain($release->title);
    

    simplexml handles the escaping for you, so this should be $xml->release->name = $release->title;

    e.g.:

        $xml = new SimpleXMLElement('<root></root>');
        $xml->release->name = 'Save & Edit';
        echo $xml->asXml();
    

    Prints

    <?xml version="1.0"?>
    <root><release><name>Save &amp; Edit</name></release></root>

    which is correct.

  • Status changed to Needs review over 1 year ago
  • Open in Jenkins → Open on Drupal.org →
    Core: 7.x + Environment: PHP 5.6 & MySQL 5.5
    last update over 1 year ago
    5 pass
  • 🇺🇸United States mfb San Francisco

    Here's a patch for project module.

  • 🇺🇸United States xjm

    Hmm, nervous about that approach as well. :)

  • 🇺🇸United States mfb San Francisco

    @xjm well, all I can do is assure you that the property overloading used by project_release module, e.g. $xml->title = $project->title; is how you instruct SimpleXML to escape some text and add it to the document.

    SimpleXML actually does have a way to add already-escaped text to a document, but this is not it. The code could be rewritten to use this method, but would be more verbose, and presumably slower since the escaping would be happening via the drupal function rather than C code.

  • 🇺🇸United States drumm NY, US

    My concern would be if any version of Drupal (or other clients) double-decoded this data, so if resetting this to be technically correct would open them up to XSS/etc issues.

  • 🇺🇸United States mfb San Francisco

    I would say it's best to just fix the bug at the source, because if a client is HTML-decoding the data and treating it as trusted markup with no filtering, they would also have a security issue today. The key is that clients need to treat this data like any other untrusted, third-party, user-generated content; i.e. these fields should be HTML-encoded when being rendered in markup.

    But if you really want to leave the XML as-is for backwards compatibility reasons, then yes you could specify that these fields have an extra round of HTML-encoding, meaning clients need to get the data from XML parser, do the extra round of HTML-decoding, then HTML-encode it again when rendering in markup. Seems silly (to me) but certainly works as a solution.

  • 🇺🇸United States dww

    Coming here from 🐛 Double encoded ampersand in /admin/reports/updates Active which I was pinged about in Slack for a subsystem maintainer review.

    From a little Git archeology, looks like this double-encoding was introduced in #1003764: Add an alter hook invoked while generating release-history XML files when @drumm converted this from manually putting the XML together into using SimpleXML. Apparently no one noticed that SimpleXML is doing another round of encoding for us.

    I just did this:

    % wget https://updates.drupal.org/release-history/save_edit/current -O save_edit.xml
    ...
    HTTP request sent, awaiting response... 200 OK
    Length: 9702 (9.5K) [text/xml]
    Saving to: ‘save_edit.xml’
    
    save_edit.xml             100%[=====================================>]   9.47K  --.-KB/s    in 0s
    
    2025-01-15 14:45:23 (23.8 MB/s) - ‘save_edit.xml’ saved [9702/9702]
    % grep title save_edit.xml
    <title>Save &amp;amp; Edit</title>
    

    So the source is definitely double-encoded. It seems wonky to say that core should have to decode twice to get the raw values.

    I don't understand the "BC" concern here. The primary client for these feeds, update.module, was written to assume properly encoded markup in the XML feeds, and is now "broken" by the double encoding. If other clients are already decoding twice, and we stop double encoding, what's the harm?

    Here's a little test script:

    
    $inputs = [
      'not_encoded' => '<title>Save & Edit</title>',
      'encoded_once' => '<title>Save &amp; Edit</title>',
      'encoded_twice' => '<title>Save &amp;amp; Edit</title>',
    ];
    
    foreach ($inputs as $key => $value) {
      echo "------\n$key\n";
      echo "Raw: $value\n";
      $decoded = html_entity_decode($value, ENT_QUOTES | ENT_HTML5, 'UTF-8');
      echo "Decoded: $decoded\n";
      echo "Double decoded: " . html_entity_decode($decoded, ENT_QUOTES | ENT_HTML5, 'UTF-8') . "\n";
    }
    

    If I run this, I get:

    ------
    not_encoded
    Raw: <title>Save & Edit</title>
    Decoded: <title>Save & Edit</title>
    Double decoded: <title>Save & Edit</title>
    ------
    encoded_once
    Raw: <title>Save &amp; Edit</title>
    Decoded: <title>Save & Edit</title>
    Double decoded: <title>Save & Edit</title>
    ------
    encoded_twice
    Raw: <title>Save &amp;amp; Edit</title>
    Decoded: <title>Save &amp; Edit</title>
    Double decoded: <title>Save & Edit</title>
    

    We're still going to turn around and re-encode this when printing it in the available updates report. That happens via core/modules/update/templates/update-project-status.html.twig right here:

    <div class="project-update__title">
      {%- if url -%}
        <a href="{{ url }}">{{ title }}</a>
      {%- else -%}
        {{ title }}
      {%- endif %}
      {{ existing_version }}
      {% if install_type == 'dev' and datestamp %}
        <span class="project-update__version-date">({{ datestamp }})</span>
      {% endif %}
    </div>
    

    So Twig is going to escape this for us, and I don't see how this could be used as an attack vector.

    IMHO, this is the right fix, not the extra decode being proposed at #3493742. I am a maintainer for project*, so I could just commit this, but I can't deploy the fix on d.o anymore. 😅

  • 🇺🇸United States mfb San Francisco

    Yeah only bad code interpreting the feed could be a danger here, because untrusted third party strings need to be just that, untrusted, and run thru something like HTML Purifier or the filter built into drupal core; i.e. security issues shouldn't be possible regardless of what's in a third party string.

  • 🇺🇸United States drumm NY, US

    Thanks for the research!

    I don't understand the "BC" concern here. The primary client for these feeds, update.module, was written to assume properly encoded markup in the XML feeds, and is now "broken" by the double encoding. If other clients are already decoding twice, and we stop double encoding, what's the harm?

    My only real concern is if core was ever double-decoding. And I hope every version of core does always filter on output, so this should be fine.

    • drumm committed b8bedd4a on 7.x-2.x
      Issue #3069491 by quietone, mfb, ameymudras, dww, lauriii: Module names...
  • 🇺🇸United States drumm NY, US

    This is now deployed and I rebuilt the XML for https://www.drupal.org/project/entityreference_dragdrop

    Before rebuilding the XML for every project with an & in its title, I'd appreciate a final confirmation that this is looking good in Drupal core now.

  • 🇺🇸United States dww

    Sweet, thanks!

    Looks great to me on a local site still running 10.3.10 core. Just did:

    composer require drupal/entityreference_dragdrop
    drush en entityreference_dragdrop
    

    Visited /admin/reports/updates on the site, and saw this:

    Please rebuild all the projects with &.

    Thanks so much!
    -Derek

  • 🇺🇸United States drumm NY, US

    That's now done

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024