Stripping HTML tags should leave one whitespace as separation

Created on 13 May 2023, over 1 year ago
Updated 25 June 2023, over 1 year ago

Problem/Motivation

When HTML tags are stripped (e.g. paragraphs, headings), words are no longer separated by a space.

Steps to reproduce

Trim a text with multiple parapgraphs and headings. Words get glued together where tehy shouldn't.

🐛 Bug report
Status

Fixed

Version

2.0

Component

Code

Created by

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @AndersTwo
  • First commit to issue fork.
  • 🇺🇸United States ultimike Florida, USA

    Confirmed this is a bug. I added a (currently failing) functional test and then a potential solution.

    -mike

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    20 pass
  • @ultimike opened merge request.
  • Status changed to Needs review over 1 year ago
  • 🇺🇸United States ultimike Florida, USA

    Test is passing - needs review.

    -mike

  • 🇮🇪Ireland lostcarpark

    I ran the tests and verified that they pass. Also confirmed test fails with line 409 of SmartTrimFormatter.php commented.

    Have left a couple of small suggestions.

    I was wondering why you'd used a functional test rather than a unit test, but I see that the tag stripping is all done in the viewElements() function of SmartTrimFormatter, which would be tricky to capture in a unit test.

  • Status changed to RTBC over 1 year ago
  • 🇺🇸United States markie Albuquerque, NM

    Tested this locally and reviewed. I am ok with the extra node creation because it can be used on multiple tests. And I am fine with the extra line. I guess we better get a release out to fix this bug asap yes?

  • 🇮🇪Ireland lostcarpark

    I was only looking at the lines of the change when I suggested combining operations on $output. When I look at how many operations take place on $output, I think it makes sense to keep them separate, and my comment feels a bit silly.

    I think using the extra node in additional tests would definitely make it worth adding. Perhaps a follow-up issue to add it to as many of the existing tests as appropriate would be worthwhile.

    A release soon makes sense. The question is should it be a 2.0.1 release with just the bugfix, or a 2.1.0 containing other merged changes?

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    31 pass
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    31 pass
  • Status changed to Fixed over 1 year ago
  • 🇺🇸United States markie Albuquerque, NM

    Great work on this. Merging and pushing out a new release.

  • Automatically closed - issue fixed for 2 weeks with no activity.

  • Status changed to Fixed over 1 year ago
  • 🇷🇴Romania claudiu.cristea Arad 🇷🇴

    This line added in formatter is changing the behaviour for some edge cases in an unexpected way:

    // Add space before each tag to ensure words don't run together.
    // Logic via https://stackoverflow.com/questions/12824899/strip-tags-replace-tags-by-space-rather-than-deleting-them
    $output = str_replace('<', ' <', $output);
    

    Consider this markup:

    Full stop should not be preceded by <em>space</em>.
    

    With this change, will become:

    Full stop should not be preceded by space .
    

    Note the space that is inserted before full stop.

Production build 0.71.5 2024