@mfb Some good news at last!
I did some more digging into the RSS view setup and found that the rewrite results text was actually coming from a twig template (the config in the Text field in the UI was being overidden). Hence, when I tried your suggestion in #25 to use "{{ body|raw }}
" for the body expression by updating the rewrite results through the UI it had no effect.
I went back and tried #25 again by using "{{ body|raw }}
" in the twig template and it fixed the problem. The output in rss.xml is now formatted correctly (and the double-encoding has gone).
Thank you for your help in resolving this issue.
We're now just waiting for your patch to be released to resolve the original RSS issue permanently. Hopefully it gets included in the next core release.
@mfb There is no "intentionally" double-encoded RSS feed.
The input article is not encoded (example already provided) and the view rewriting only uses simple, unencoded HTML markup.
If I get some time over the holiday I'll try to reproduce using a vanilla 10.2.0 install. Would that help?
@mfb I'm not sure where to go next with this. We're stuck with an upgrade issue and no idea where to look for the fix or even who to speak to next.
It occurs to me that the previous code in RssResponseRelativeUrlFilter was masking an underlying issue, and when you fixed it the underlying issue resurfaced.
Out of interest, why is the rss view preview formatted correctly, but the actual view output in rss.xml is incorrect? Aren't they meant to be the same?
@mfb If the issue isn't caused by RssResponseRelativeUrlFilter.php then why does reverting the patch in that file correct the HTML formatting issue?
Also, why does view preview look correct if the view has already generated the incorrect feed?
@mfb Tried adding the return to onResponse(), then dropped cache, and it made no difference.
Both the view preview and the rss.xml look the same as before.
@mfb Sorry, I'm not sure my knowledge of the module (or my PHP coding) is up to that. :(
I'd be happy to drop in a debug version of the php file and test it for you, or grant you access to the test instance so that you could do it yourself.
I'm not familiar with RssResponseRelativeUrlFilter.php. so let me know what debugging you need and I'd be happy to test it.
I had a look at the .theme file and couldn't find anything related to rss.
The rss feed has been in use for several years, so I'm not aware of any invalid markup. I sent you the link to the rss.xml earlier in case you could spot anything.
As far as I'm aware (and I'm the only developer), there is nothing altering the output after the view generates. It should just be core code.
Let me know if you need access to the test instance.
The preview of the view looks fine. Here is the example line from the view preview:
<p><p>Workforce management is the process of effectively managing and optimizing a company&#8217;s workforce to meet business goals and objectives. It involves a range of activities,
Does this mean that the escaping problem is happening AFTER the view is prepared?
Would it help if I gave you direct access to a test system where the problem exists?
Attaching screenshot of the body.
No, the body example I sent in a previous message actually came from drupal. The HTML is all unescaped.
<p>Workforce management is the process of effectively managing and optimizing a company’s workforce to meet business goals and objectives. It involves a range of activities, including staffing, scheduling, time and attendance tracking, performance management, and more. With the help of workforce management metrics and key performance indicators (KPIs), businesses can gain valuable insights into their workforce and make informed decisions to improve operational efficiency.</p>
The content is coming into drupal unescaped. The content has been coming into drupal using the Feeds module since 2019.
As I said, we don't use an editor. That's why the text format doesn't change the body at all.
I checked the "Raw HTML" text format and it has all filters disabled, so it shouldn't be escaping the content.
I tried changing the rewrite text to "body | raw" and it made no difference.
@mfb Sorry, I don't have a clean install that reproduces this.
The items in the rss feed come from a view. Specifically, the description for each rss item comes from "{{body}}".
The body of the rss item in my previous example is:
<p>Workforce management is the process of effectively managing and optimizing a company’s workforce to meet business goals and objectives. It involves a range of activities, including staffing, scheduling, time and attendance tracking, performance management, and more. With the help of workforce management metrics and key performance indicators (KPIs), businesses can gain valuable insights into their workforce and make informed decisions to improve operational efficiency.</p>
The Text format is "Raw HTML".
The line in the patch that causes the issue is:
$node->replaceChild($rss_dom->createTextNode(Html::transformRootRelativeUrlsToAbsolute($html_markup, $request->getSchemeAndHttpHost())), $node->firstChild);
Does that help?
@mfb Thanks for looking into this issue.
Unfortunately I don't have a test or reproducible steps.
If it helps, here is a link to an rss feed that has the issue:
https://teamrelated.com/newsletter/rss.xml
(Search for "#8217" to see the first item with a HTML entity that has the problem.)
I did some more investigation and found the following:
- Checked the newsletter for the previous week and the HTML entities were not appearing. The only change we've made this week was to upgrade from 10.1.7 to 10.2.0.
- The offending HTML entities in the rss feed are appearing because they were present in the body of the source article (eg. <p>
) and have then been in the rss xml file.
- I then took out the patch from #12 and the double-escaping in no longer present (but the rss feed does not validate with mailchimp).
In summary, it looks like the issue with the HTML entitles is a side effect of the patch:
- With the patch: rss file validates with mailchimp but the feed contains unconverted HTML entitles.
- Without the patch: rss file items look normal, but the feed file does not validate with mailchimp.
Here follows an example line from the rss file with and without the patch.
without patch:
<p><p>Workforce management is the process of effectively managing and optimizing a company&#8217;s workforce to meet business goals and objectives. It involves a range of activities,
with patch: (double-escaped)
<p>&lt;p&gt;Workforce management is the process of effectively managing and optimizing a company&amp;#8217;s workforce to meet business goals and objectives. It involves a range of activities,
Please let me know if there's anything else I can do to help.
We hit the same issue today after upgrading to 10.2.0. All our newsletters (based on RSS feeds) were broken.
Applied the fix to one site and the feed is now validating again (so all the RSS items now appear in mailchimp).
However the problem seems bigger than that. The description fields for the RSS items now contain unconverted HTML escape codes (such as "’", "—", "&") that weren't there before.
Please can this also be fixed?