- Issue created by @jurgenhaas
- Status changed to Needs review
7 months ago 7:33am 1 July 2024 - π©πͺGermany jurgenhaas Gottmadingen
MR!265 seems to be doing the job, please have a look.
- Status changed to Needs work
7 months ago 10:54am 1 July 2024 - πͺπΈSpain fjgarlin
Thanks for giving it a go at this. I left a review in the MR.
- Status changed to Needs review
7 months ago 12:44pm 1 July 2024 - π©πͺGermany jurgenhaas Gottmadingen
Oops, worked on that but not sure I caught it yet. Left a comment in the MR as well.
- Status changed to Needs work
7 months ago 1:26pm 1 July 2024 - Status changed to Needs review
7 months ago 1:40pm 1 July 2024 - Status changed to Needs work
7 months ago 2:00pm 1 July 2024 - πͺπΈSpain fjgarlin
Thanks for the changes.
I just tested and it seems to be adding the wrapping htm/body tags.
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><body> <p><strong>ECA is the n ... nauer</a></p> </body></html>
- Status changed to Needs review
7 months ago 2:10pm 1 July 2024 - π©πͺGermany jurgenhaas Gottmadingen
Ah, those are a problem? Better to be on the safe side, agreed. And thanks for the link. I've tested the usage of those 2 options and they're doing what it's saying.
- Status changed to Needs work
7 months ago 2:20pm 1 July 2024 - πͺπΈSpain fjgarlin
Tested the code on:
- https://fjgarlin-drupal.dev.devdrupal.org/project/eca (reading from https://git.drupalcode.org/project/eca/-/blob/2.0.x/README.md)
- https://fjgarlin-drupal.dev.devdrupal.org/project/config_notify (reading from https://git.drupalcode.org/project/config_notify)See the second one:
<h2>CONTENTS OF THIS FILE<ul> <li>Introduction</li> ...
and
If I try on the current www.drupal.org for that second module, I get valid markup:
<h2>CONTENTS OF THIS FILE</h2> <ul> <li>Introduction</li>
So, whatever it is, it is introduced by this MR. Maybe the regex or maybe something else, not sure.
If you want to test, this is the raw content of the file: https://git.drupalcode.org/project/config_notify/-/raw/8.x-1.x/README.md... - π©πͺGermany jurgenhaas Gottmadingen
That's strange, I ran this locally and received the expected output:
<h2><a href="#contents-of-this-file" aria-hidden="true" class="anchor" id="user-content-contents-of-this-file"></a>CONTENTS OF THIS FILE</h2><ul><li>Introduction</li>.....
Not sure where this went wrong.
The local script looks like this (3rd line starts the extra code):
// Html entities appearing in the markup are encoded. $html = html_entity_decode($html); // Remove data-sourcepos attributes. $html = preg_replace('/(<[^>]+) data-sourcepos=".*?"/si', '$1', $html); // Remove whitespace between tags. $html = preg_replace('~>\s+<~', '><', $html); // This code block replaces remaining linebreaks with spaces, except within // code or pre elements, where we need to keep the linebreaks. $html = str_replace("\n", '###TEMP###', $html); $document = new \DomDocument('1.0', 'UTF-8'); $document->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD); foreach (['code', 'pre'] as $tagName) { /** @var \DOMElement $element */ foreach ($document->getElementsByTagName($tagName) as $element) { $element->textContent = str_replace('###TEMP###', "\n", $element->textContent); } } $html = $document->saveHTML($document); $html = str_replace('###TEMP###', ' ', $html);
Don't know how to reproduce this. Have I missed something og the code in the MR? I don't see any differences, do you?
- πͺπΈSpain fjgarlin
My code with debugging:
... $html = str_replace("\n", '###TEMP###', $html); var_dump($html); $document = new \DomDocument('1.0', 'UTF-8'); $document->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD); foreach (['code', 'pre'] as $tagName) { /** @var \DOMElement $element */ foreach ($document->getElementsByTagName($tagName) as $element) { $element->textContent = str_replace('###TEMP###', "\n", $element->textContent); } } $html = $document->saveHTML($document); var_dump($html);
Results:
/var/www/dev/fjgarlin-drupal.dev.devdrupal.org/htdocs/sites/all/modules/drupalorg/drupalorg/drupalorg.module:3849:string '<h2>CONTENTS OF THIS FILE</h2><ul><li>Introduction</li><li>Requirements</li><li>Installation</li><li>Configuration</li><li>Maintainers</li></ul><h2>INTRODUCTION</h2><p>The module lets you trigger notifications depending on the current status of###TEMP###your configuration in production.</p><p>Notifications can be triggered via cron or instantly. Slack and email are###TEMP###currently supported.</p><ul><li><p>For a full description of the module visit:###TEMP###<a href="https://www.drupal.org/project/config_'... (length=1789) /var/www/dev/fjgarlin-drupal.dev.devdrupal.org/htdocs/sites/all/modules/drupalorg/drupalorg/drupalorg.module:3859:string '<h2>CONTENTS OF THIS FILE<ul> <li>Introduction</li> <li>Requirements</li>
As we can see, the closing
h2
disappears somehow in between loading and saving the HTML.I even went ahead and tested the following:
var_dump($html); $document = new \DomDocument('1.0', 'UTF-8'); $document->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD); $html = $document->saveHTML($document); var_dump($html);
And the output is identical to the above one, without the closing
h2
.
I suspect that it might be due to the PHP version we're running on π (better don't ask which).I'm not sure about what to try now...
- π©πͺGermany jurgenhaas Gottmadingen
Looks like your test code is missing the first instruction:
// Remove whitespace between tags. $html = preg_replace('~>\s+<~', '><', $html);
Maybe that's the reason?
- πͺπΈSpain fjgarlin
No, it's there. I just removed the part that wasn't relevant with the
...
I just added the relevant part where we still have the closing "h2" and then we don't anymore.
- πͺπΈSpain fjgarlin
Oh wait, I just saw where the closing "h2" is...
... <h2>MAINTAINERS</h2> <ul> <li>Fran Garcia-Linares (fjgarlin) - <a href="https://www.drupal.org/u/fjgarlin">https://www.drupal.org/u/fjgarlin</a> </li> </ul> </h2>
I think I know why it's happening. Let me do a quick test.
- πͺπΈSpain fjgarlin
I made a suggestion in the MR that fixes the issue and explains why it was happening.
Code tested here: https://fjgarlin-drupal.dev.devdrupal.org/project/config_notify
I think once that change is in place it might be good for RTBC.
- Status changed to Needs review
7 months ago 3:33pm 1 July 2024 - π©πͺGermany jurgenhaas Gottmadingen
I've applied your suggestion, this is genius.
- Status changed to RTBC
7 months ago 3:56pm 1 July 2024 - πͺπΈSpain fjgarlin
Reviewed the MR and tested in "eca", "api" and "config_notify" modules and it all looks good.
Also, this feels pretty safe as it's just working on the suggested markup, but not saving anything.I'm marking it RTBC. Thanks so much for the work here!
- π©π°Denmark ressa Copenhagen
So great how fast you got this fixed, thanks! I was first thinking that this was for Drupal core, but then saw it's for he drupalorg module ...
Do you know if a similar issue exists for Drupal core? I ask because I am importing Markdown content with Migrate module, and the line breaks are stripped during import, so lists break down ... My work-around is to insert an extra line break, for each line, which is not ideal ... I am using the Markdown Easy module for Markdown support in CKEditor.
Here's the migration, if I don't have the extra line breaks, all the text is on one line, and the Markdown breaks down.
id: sub_nodes label: Subject migration, based on github.com/dinarcon/ud_migrations source: plugin: embedded_data data_rows: - nid: 7021 title: 'How to use Drupal' body: 'you can use Drupal for web site building. Here are some benefits: * open source * Big community * A third reason So as you can see, there are many reasons to use Drupal' ids: nid: type: integer process: nid: nid title: title body/value: body body/format: plugin: default_value default_value: markdown destination: plugin: entity:node default_bundle: article migration_dependencies: optional: []
I would hope that this could work, but it doesn't:
body: 'you can use Drupal for web site building. Here are some benefits: * open source * Big community * A third reason So as you can see, there are many reasons to use Drupal'
I'll create a Drupal core issue, but thought I'd ask you guys first, in case you perhaps have some ideas, or maybe seen it before ...
- πͺπΈSpain fjgarlin
What does the format "markdown" look for you? Do you perhaps have the
Convert line breaks into HTML (i.e. <br> and <p>)
filter enabled in your text format? Any other filters or processors linked to the "markdown" format? - First commit to issue fork.
-
drumm β
committed 0331f442 on 7.x-3.x authored by
jurgenhaas β
Issue #3458219: Improve linebreak handling for imported MD files
-
drumm β
committed 0331f442 on 7.x-3.x authored by
jurgenhaas β
- π©π°Denmark ressa Copenhagen
Thanks for the suggestion @jgarlin, it could have been the solution ... sadly, it made no difference, after I removed both filters
Limit allowed HTML tags and correct faulty HTML
andConvert line breaks into HTML (i.e. <br> and <p>
). It may be Migrate or Migrate Plus then? I'll create an issue in one those projects, and hope for the best :-) - π©πͺGermany jurgenhaas Gottmadingen
Thank you so much @fjgarlin for working together on this and thanks @drumm for merging and deploying it. I've tested it on the ECA module page and it makes a huge difference. The description looks so much better now.
- π©π°Denmark ressa Copenhagen
Great!
I created π Preserve line breaks, to support importing Markdown format Active for my Migrate / Markdown challenge.
-
drumm β
committed 0331f442 on owner-tools-user-credit-view authored by
jurgenhaas β
Issue #3458219: Improve linebreak handling for imported MD files
-
drumm β
committed 0331f442 on owner-tools-user-credit-view authored by
jurgenhaas β
-
drumm β
committed 0331f442 on drupalorg-3447400-3447400-clean-up-community authored by
jurgenhaas β
Issue #3458219: Improve linebreak handling for imported MD files
-
drumm β
committed 0331f442 on drupalorg-3447400-3447400-clean-up-community authored by
jurgenhaas β