Channel description of RSS feeds is double-escaped

Created on 29 February 2024, 11 months ago
Updated 19 March 2024, 10 months ago

Problem/Motivation

This is a followup from πŸ› [10.2 regression] RSS feeds invalid due to   Fixed .

RSS feeds are now valid but have a warning on the W3C feed validator:

This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.

line 6, column 42: description should not contain HTML: &

<description>Training &amp;amp; Events</description>

Steps to reproduce

Create a feed in views with an RSS channel description that contains an ampersand, e.g. "Training & Events". Channel description is a field in the Feed:Style options setting section in views.

Checking the feed output against https://validator.w3.org it prints a warning for the channel description line: "description should not contain HTML: &amp;". The RSS feed literally contains &amp;amp; which is parsed into human-readable text &amp;. It should contain &amp; which is parsed as human-readable text &.

Proposed resolution

The \Drupal\Core\EventSubscriber\RssResponseRelativeUrlFilter::transformRootRelativeUrlsToAbsolute() method processes all RSS feed description elements as markup. However, RSS has two different kinds of description elements: item description elements, which according to the RSS specs are interpreted as markup, and channel description elements, which are interpreted as human-readable. So that method should skip channel description elements.

Remaining tasks

User interface changes

API changes

Data model changes

πŸ› Bug report
Status

Fixed

Version

10.2 ✨

Component
BaseΒ  β†’

Last updated about 5 hours ago

Created by

πŸ‡¨πŸ‡¦Canada OMD

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @OMD
  • πŸ‡ΊπŸ‡ΈUnited States mfb San Francisco

    What I found when trying to reproduce this issue is that Views outputs <channel><description>Training &amp;amp; Events</description></channel> as the feed description.

    However, it is recommended to output <channel><description>Training &amp; Events</description></channel>

    My interpretation of what the feed validator is saying is that it's recommended that a feed description be semantically considered to be plain text, and thus a user-entered feed description should be encoded once to be rendered in the feed description XML element: <channel><description>Training &amp; Events</description></channel>

    An item description, on the other hand, could be semantically considered to be markup, thus a user-entered item description would be encoded once to render valid markup from the entered text, and a second time to be rendered in the item description XML element: <item><description>Training &amp;amp; Events</description></item>

  • πŸ‡ΊπŸ‡ΈUnited States mfb San Francisco

    This does seem to be pretty heavily related to πŸ› [10.2 regression] RSS feeds invalid due to   Fixed after all, although it's a separate bug in the same code.

    It appears that \Drupal\Core\EventSubscriber\RssResponseRelativeUrlFilter::transformRootRelativeUrlsToAbsolute() is operating on channel description elements, but it should not, as these are considered to be human-readable plain text. It should only be operating on item description elements, which are considered to be markup.

  • Pipeline finished with Success
    11 months ago
    Total: 492s
    #107660
  • Status changed to Needs review 11 months ago
  • πŸ‡ΊπŸ‡ΈUnited States mfb San Francisco

    @OMD can you test my attempted fix in MR 6842? If it resolves the warning then we can update issue summary, add a test and reroll as a merge request on 11.x branch.

  • The & is being escaped twice?

  • πŸ‡ΊπŸ‡ΈUnited States mfb San Francisco

    @cilefen Yes, that's what I found when trying to reproduce the issue. If we confirm that the issue is basically the opposite of the title then I will update it :)

  • πŸ‡ΊπŸ‡ΈUnited States mfb San Francisco

    Added unit test coverage for &amp; in channel description element.

  • Pipeline finished with Success
    11 months ago
    Total: 612s
    #108515
  • Status changed to Needs work 11 months ago
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    From reading the issue summary provided I believe @mfb you were correct in your assumption.

    So can the issue summary be updated to match. Also MR should probably be pointed to 11.x

    Thanks.

  • Pipeline finished with Canceled
    11 months ago
    Total: 340s
    #111045
  • Status changed to Needs review 11 months ago
  • πŸ‡ΊπŸ‡ΈUnited States mfb San Francisco
  • Pipeline finished with Success
    11 months ago
    Total: 1804s
    #111048
  • Status changed to RTBC 11 months ago
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Thanks @mfb!

    Ran the test-only feature here https://git.drupalcode.org/issue/drupal-3424768/-/jobs/983276 which showed the test failure.

    The change to the loop makes sense and fixes the issue following the scenario described.

  • Status changed to Fixed 11 months ago
  • πŸ‡¬πŸ‡§United Kingdom longwave UK

    The findings here match with the comment in template_preprocess_views_view_rss():

      // The RSS 2.0 "spec" doesn't indicate HTML can be used in the description.
      // We strip all HTML tags, but need to prevent double encoding from properly
      // escaped source data (such as &amp becoming &amp;amp;).
      $variables['description'] = Html::decodeEntities(strip_tags($style->getDescription()));
    

    Committed and pushed e8db570e86 to 11.x and 3ff7664833 to 10.3.x and e851b33905 to 10.2.x. Thanks!

    • longwave β†’ committed e851b339 on 10.2.x
      Issue #3424768 by mfb, OMD, cilefen, smustgrave: Channel description of...
    • longwave β†’ committed 3ff76648 on 10.3.x
      Issue #3424768 by mfb, OMD, cilefen, smustgrave: Channel description of...
    • longwave β†’ committed e8db570e on 11.x
      Issue #3424768 by mfb, OMD, cilefen, smustgrave: Channel description of...
  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024