Unclear trimming behavior (whole words or not?) - missing setting

Created on 9 September 2022, about 2 years ago
Updated 20 September 2023, about 1 year ago

Problem/Motivation

I've classified this as bug, as you wouldn't expect this and it's not documented in the UI. Otherwise this would clearly be a feature request.

Truncating by characters currently truncates at word boundary, not directly at given char count. That's unexpected and should be documented. It would be even better to add an option to enable / disable this functionaly.

I think "Truncate at word boundary" should still be kept as default and makes sense in most cases, but there may be cases where you'd want to cut off exactly at char X. And an option is better than unread documentation, so we shouldn't make that hard assumption.

The implementation for the current word-bundary-truncating can be found in

  /**
   * Truncates a DOMNode by character count.
   *
   * @param \DOMNode $domnode
   *   Object to be truncated.
   */
  protected function domNodeTruncateChars(\DOMNode $domnode) {
    foreach ($domnode->childNodes as $node) {

      if ($this->foundBreakpoint == TRUE) {
        return;
      }

      if ($node->hasChildNodes()) {
        $this->domNodeTruncateChars($node);
      }
      else {
        $text = html_entity_decode($node->nodeValue, ENT_QUOTES, 'UTF-8');
        $length = mb_strlen($text);
        if (($this->charCount + $length) >= $this->limit) {
          // We have found our end point.
          $node->nodeValue = Unicode::truncate($text, $this->limit - $this->charCount, TRUE);
          $this->removeTrailingPunctuation($node);
          $this->removeProceedingNodes($node);
          $this->insertEllipsis($node);
          $this->foundBreakpoint = TRUE;
          return;
        }
        else {
          $this->charCount += $length;
        }
      }
    }
  }

In the line
$node->nodeValue = Unicode::truncate($text, $this->limit - $this->charCount, TRUE);
https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Component%21Util...

The third (TRUE) parameter is

bool $wordsafe: If TRUE, attempt to truncate on a word boundary. Word boundaries are spaces, punctuation, and Unicode characters used as word boundaries in non-Latin languages; see Unicode::PREG_CLASS_WORD_BOUNDARY for more information. If a word boundary cannot be found that would make the length of the returned string fall within length guidelines (see parameters $max_length and $min_wordsafe_length), word boundaries are ignored.

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

🐛 Bug report
Status

Active

Version

2.0

Component

Code

Created by

🇩🇪Germany Anybody Porta Westfalica

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇩🇪Germany Anybody Porta Westfalica

    Now I'm seeing words are no more trimmed at the end of the word, but there's still no option to select, if it *should* trim after while words or at the character. Could a maintainer perhaps clarify, what's the expected behavior?

    Would you agree an option for that makes sense? Then we'd be happy to provide a MR!

  • 🇺🇸United States ultimike Florida, USA

    I'm not sure I agree with this:

    Truncating by characters currently truncates at word boundary, not directly at given char count. That's unexpected and should be documented. It would be even better to add an option to enable / disable this functionaly.

    One of the main features of Smart Trim is that it doesn't cut off words. If you want to cut off at a specific character count regardless of whether it is a word-boundary or not, why not use the core "Trim" formatter?

    -mike

  • 🇩🇪Germany Anybody Porta Westfalica

    Thanks @ultimike! Well I think smart_trim has many additional benefits, that would even be helpful, if trimming at character count.

    The strange thing and primary reason for this issue is, that I have a case, where words are cut off in the middle using smart_trim and I was wondering a lot, how that happens... thanks for the clarification, so I'll have to have a look at the code and debug for the reasons... I wasn't sure anymore if smart_trim preserves the whole word.

  • 🇮🇪Ireland lostcarpark

    This issue has been open nearly a year, and I don't think there's a clear description of what the problem is. @Anybody, if words are getting split, that does sound like a possible bug, as it's definitely not the described behaviour of the module.

    Could you provide an example of some text, with the output Smart Trim is producing, and the output you'd expect?

  • 🇩🇪Germany Anybody Porta Westfalica

    Thanks, yes I'll add that here, once I run into it again next time. Sadly I don't remember anymore in which of the many projects I ran into this. So let's please leave this open.

    If nobody else runs into this, that's also a good indicator of another side effect. So let's see!

  • 🇩🇪Germany Grevil

    This issue has been open nearly a year, and I don't think there's a clear description of what the problem is. @Anybody, if words are getting split, that does sound like a possible bug

    I think that is not the problem he was facing. From the issue summary:

    Truncating by characters currently truncates at word boundary, not directly at given char count.

    So basically, if we have the sentence:
    Hello World
    and we want to trim the sentence at character 3, @Anybody expects the output to be:
    Hel
    Instead, the module resolves the sentence into:
    Hello World
    Which he thinks is unexpected and not documented enough.

    And this issue is dedicated to having a setting with which you can switch between both trimming behaviors. Hope that clears it up!

  • First commit to issue fork.
  • @markie opened merge request.
Production build 0.71.5 2024