text_summary() should output valid HTML and Unicode text

Created on 13 February 2008, almost 17 years ago
Updated 10 September 2024, 5 months ago

Problem/Motivation

The autogenerated summary can produce invalid HTML if the summary cuts off in the middle of a closing tag.

Steps to reproduce

  1. Set trimmed length to 200.
  2. Remove the HTML corrector from the Full HTML input format (or create a new input format without HTML input corrector). (See http://drupal.org/node/221252.)
  3. Create a new page, Full HTML input format (or whichever you created), with this content:
    The maximum number of characters used in the trimmed version of a post. <!-- <p>Drupal will use this setting to determine at which offset long posts should be trimmed.</p> The trimmed version of a post is typically used as a teaser when displaying the post on the main page, in XML feeds, etc. To disable teasers, set to 'Unlimited'. --> Note that this setting will only affect new or updated content and will not affect existing teasers.</li>
      <li>
  4. Preview the page, look at the result and the source to see the broken HTML.

Proposed resolution

Ensure that auto-generated summaries contain valid HTML.

Remaining tasks

  1. Add handling of body content to remove markup before counting characters to test against selected trimmed length.
  2. Review and test.
  3. Add / modify tests to coordinate with this change.

User interface changes

TBD

API changes

TBD

Data model changes

TBD

Release notes snippet

TBD

Original report by gpk

Steps to reproduce

As requested here: http://drupal.org/node/220783#comment-728258.

Set trimmed length to 200.

Remove the HTML corrector from the Full HTML input format (or create a new input format without HTML input corrector). (See http://drupal.org/node/221252.)

Create a new page, Full HTML input format (or whichever you created), with this content:

The maximum number of characters used in the trimmed version of a post. <!-- <p>Drupal will use this setting to determine at which offset long posts should be trimmed.</p> The trimmed version of a post is typically used as a teaser when displaying the post on the main page, in XML feeds, etc. To disable teasers, set to 'Unlimited'. --> Note that this setting will only affect new or updated content and will not affect existing teasers.

Preview the page, look at the result and the HTML source ...

Also see #263

Remaining tasks

  1. Add handling of body content to remove markup before counting characters to test against selected trimmed length.
  2. Review and test.
  3. Add / modify tests to coordinate with this change.
πŸ› Bug report
Status

Closed: duplicate

Version

11.0 πŸ”₯

Component
TextΒ  β†’

Last updated about 8 hours ago

Created by

πŸ‡¬πŸ‡§United Kingdom gpk

Live updates comments and jobs are added and updated live.
  • Needs backport to D7

    After being applied to the 8.x branch, it should be considered for backport to the 7.x branch. Note: This tag should generally remain even after the backport has been written, approved, and committed.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • @larowlan Re #259

    In D 10.2.3, text_summary (trimmed) output is still much shorter than the selected number of characters in "Trimmed limit." It appears that it still counts markup within the character count.

    E.g. I set a Trimmed limit to 1500 characters but get this for trimmed output:

    This is only 900 characters.

    If the node contains something with more markup, such as bullet points, the trimmed value is even shorter. e.g. with the same filler text put into a numbered list, like this:

    Meeting recording:
    [video embed]

    Your questions -- answered!

    This is some filler text that I am using to test the trimmed text function.

    1. This is some filler text that I am using to test the trimmed text function. This is some filler text that I am using to test the trimmed text function.
    2. This is some filler text that I am using to test the trimmed text function. This is some filler text that I am using to test the trimmed text function.
    3. This is some filler text that I am using to test the trimmed text function. This is some filler text that I am using to test the trimmed text function. This is some filler text that I am using to test the trimmed text function.
    4. This is some filler text that I am using to test the trimmed text function.
    5. This is some filler text that I am using to test the trimmed text function.
    6. This is some filler text that I am using to test the trimmed text function.

    And continues on with more paragraphs of text here.

    Then the trimmed output looks like this:

    With the same text as in the first screenshot, this time the trimmed value only has 139 characters.

  • πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10

    Thanks @leeksoup - can you update the issue summary with remaining tasks etc?

  • @larowlan - Do the remaining items need to be split off into a new / separate issue?

  • πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10

    I think this issue is fine, thank you for updating the issue summary!

  • πŸ‡¦πŸ‡ΊAustralia pameeela

    Should this be split into two issues? One for the valid HTML and one for excluding markup from character count? I think the character count part of it can't really be called a bug since it is explicitly tested to work that way, meaning it is intentional behaviour. I do agree that it makes sense to exclude it but that seems like a feature request.

    I also think this would need to be opt-in for existing sites because it will change what is displayed for some sites.

    Updated IS to be a bit more clear.

  • πŸ‡¦πŸ‡ΊAustralia pameeela
  • πŸ‡©πŸ‡ͺGermany Anybody Porta Westfalica

    @pameeela I agree the focus should be to fix the broken HTML. Not counting the HTML characters can be a less relevant follow-up feature!

  • Status changed to Postponed: needs info 5 months ago
  • πŸ‡¦πŸ‡ΊAustralia pameeela

    Updating this issue to reflect that the bug reported was about invalid markup. However, I'm unable to reproduce it on D11, so I think maybe it's fixed in CKE5? I can't reproduce it using basic or full HTML regardless of whether 'Correct faulty and chopped off HTML' is enabled.

    Marking postponed in case I'm missing something obvious. I will create a separate issue for excluding tags from trimming, as already noted.

  • Status changed to Closed: duplicate 5 months ago
  • πŸ‡¦πŸ‡ΊAustralia pameeela

    Actually, I just noticed #3067116: text_summary() returns malformed (not normalized) HTML for basic_html and other formats that use filter_html instead of filter_htmlcorrector β†’ so I think it was fixed there. but by then this issue had already expanded to include the trimming. So going to close this one.

Production build 0.71.5 2024