PlainTextOutput::renderFromHtml() could better handle spaces

Created on 5 July 2023, about 2 years ago

Problem/Motivation

Many HTML tags add horizontal space or a line break between them when rendered.
PlainTextOutput::renderFromHtml() simply removes HTML tags but does not add spaces, which can lead to a result that is surprising for users.

Steps to reproduce

Call PlainTextOutput::renderFromHtml('<p>Foo</p><p>Bar</p>'); .
The result will be FooBar but it would make more sense to have Foo Bar.

We noticed this when using HTML fields as metatag tokens. Some sentences are joined together without a space between them.

Proposed resolution

The method could add a space between each tag before stripping the tags.

Feature request
Status

Active

Version

9.5

Component
Render 

Last updated 1 day ago

Created by

🇫🇷France prudloff Lille

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @prudloff
  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 8.1 & MySQL 5.7
    last update about 2 years ago
    29,727 pass, 101 fail
  • Status changed to Needs review about 2 years ago
  • Status changed to Needs work about 2 years ago
  • 🇮🇳India keshavv India

    I have tested the MR with this code snippet.

    print_r(\Drupal\Component\Render\PlainTextOutput::renderFromHtml('<div class="test">Some content</div><p>Foo</p><p>Bar</p>'));
    

    It results

  • Pipeline finished with Failed
    4 months ago
    Total: 573s
    #435001
  • Pipeline finished with Failed
    4 months ago
    Total: 568s
    #435008
  • Pipeline finished with Success
    4 months ago
    Total: 411s
    #435042
  • Status changed to Needs review 4 months ago
  • 🇫🇷France prudloff Lille

    I added some tests.

  • 🇺🇸United States smustgrave

    Fixed up summary just slightly.

    Left 1 comment on MR

    If you are another contributor eager to jump in, please allow the previous poster at least 48 hours to respond to feedback first, so they have the opportunity to finish what they started!

  • 🇺🇸United States smustgrave

    You are absolutely correct. That was the only feedback I had

  • 🇫🇷France nod_ Lille

    I'll put that at least to NW, it's possibly a won't fix situation depending on how much exists out there that could help with the situation.

    So the fix here is to add a space between all tags, this breaks pretty fast, for example:

    <strong>test</strong><sup>nospace!</sup>
    

    Here we would not expect a space to be added. Whitespace in HTML is very complex, see https://blog.dwac.dev/posts/html-whitespace/ so hand crafting rules by hand is not a reasonable solution. We could parse the string as HTML and use textContent property from DOMNode and hope it does things correctly with html5.

  • 🇫🇷France prudloff Lille

    We could parse the string as HTML and use textContent property from DOMNode and hope it does things correctly with html5.

    I did a quick test like this:

    $dom = \Drupal\Component\Utility\Html::load('<p>Giraffes.</p><p>Wombats.</p>'); echo $dom->textContent;
    

    But it seems it never adds any space.

    A more robust solution would be to use the html2text library but it would be a bit overkill to add a new dependency just for this.

  • Status changed to Closed: won't fix about 2 months ago
  • 🇫🇷France prudloff Lille

    I did more tests and I agree we won't be able to handle various scenarios without having complex rules or add a new dependency, so I'm closing as wontfix.
    Sites that need this can use a library like html2text instead.

Production build 0.71.5 2024