text_summary() should output valid HTML and Unicode text, and not count markup characters as part of the text length

Created on 13 February 2008, over 16 years ago
Updated 26 April 2024, 2 months ago

Problem/Motivation

The autogenerated summary can produce invalid HTML if the summary cuts off in the middle of a closing tag.

Steps to reproduce

  1. Set trimmed length to 200.
  2. Remove the HTML corrector from the Full HTML input format (or create a new input format without HTML input corrector). (See http://drupal.org/node/221252.)
  3. Create a new page, Full HTML input format (or whichever you created), with this content:
    The maximum number of characters used in the trimmed version of a post. <!-- <p>Drupal will use this setting to determine at which offset long posts should be trimmed.</p> The trimmed version of a post is typically used as a teaser when displaying the post on the main page, in XML feeds, etc. To disable teasers, set to 'Unlimited'. --> Note that this setting will only affect new or updated content and will not affect existing teasers.</li>
      <li>
  4. Preview the page, look at the result and the source to see the broken HTML.

Proposed resolution

Ensure that auto-generated summaries contain valid HTML.

Remaining tasks

  1. Add handling of body content to remove markup before counting characters to test against selected trimmed length.
  2. Review and test.
  3. Add / modify tests to coordinate with this change.

User interface changes

TBD

API changes

TBD

Data model changes

TBD

Release notes snippet

TBD

Original report by gpk

Steps to reproduce

As requested here: http://drupal.org/node/220783#comment-728258.

Set trimmed length to 200.

Remove the HTML corrector from the Full HTML input format (or create a new input format without HTML input corrector). (See http://drupal.org/node/221252.)

Create a new page, Full HTML input format (or whichever you created), with this content:

The maximum number of characters used in the trimmed version of a post. <!-- <p>Drupal will use this setting to determine at which offset long posts should be trimmed.</p> The trimmed version of a post is typically used as a teaser when displaying the post on the main page, in XML feeds, etc. To disable teasers, set to 'Unlimited'. --> Note that this setting will only affect new or updated content and will not affect existing teasers.

Preview the page, look at the result and the HTML source ...

Also see #263

Remaining tasks

  1. Add handling of body content to remove markup before counting characters to test against selected trimmed length.
  2. Review and test.
  3. Add / modify tests to coordinate with this change.
πŸ› Bug report
Status

Needs work

Version

11.0 πŸ”₯

Component
TextΒ  β†’

Last updated 2 days ago

Created by

πŸ‡¬πŸ‡§United Kingdom gpk

Live updates comments and jobs are added and updated live.
  • Needs backport to D7

    After being applied to the 8.x branch, it should be considered for backport to the 7.x branch. Note: This tag should generally remain even after the backport has been written, approved, and committed.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • @larowlan Re #259

    In D 10.2.3, text_summary (trimmed) output is still much shorter than the selected number of characters in "Trimmed limit." It appears that it still counts markup within the character count.

    E.g. I set a Trimmed limit to 1500 characters but get this for trimmed output:

    This is only 900 characters.

    If the node contains something with more markup, such as bullet points, the trimmed value is even shorter. e.g. with the same filler text put into a numbered list, like this:

    Meeting recording:
    [video embed]

    Your questions -- answered!

    This is some filler text that I am using to test the trimmed text function.

    1. This is some filler text that I am using to test the trimmed text function. This is some filler text that I am using to test the trimmed text function.
    2. This is some filler text that I am using to test the trimmed text function. This is some filler text that I am using to test the trimmed text function.
    3. This is some filler text that I am using to test the trimmed text function. This is some filler text that I am using to test the trimmed text function. This is some filler text that I am using to test the trimmed text function.
    4. This is some filler text that I am using to test the trimmed text function.
    5. This is some filler text that I am using to test the trimmed text function.
    6. This is some filler text that I am using to test the trimmed text function.

    And continues on with more paragraphs of text here.

    Then the trimmed output looks like this:

    With the same text as in the first screenshot, this time the trimmed value only has 139 characters.

  • πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10

    Thanks @leeksoup - can you update the issue summary with remaining tasks etc?

  • @larowlan - Do the remaining items need to be split off into a new / separate issue?

  • πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10

    I think this issue is fine, thank you for updating the issue summary!

  • πŸ‡¦πŸ‡ΊAustralia pameeela

    Should this be split into two issues? One for the valid HTML and one for excluding markup from character count? I think the character count part of it can't really be called a bug since it is explicitly tested to work that way, meaning it is intentional behaviour. I do agree that it makes sense to exclude it but that seems like a feature request.

    I also think this would need to be opt-in for existing sites because it will change what is displayed for some sites.

    Updated IS to be a bit more clear.

  • πŸ‡¦πŸ‡ΊAustralia pameeela
Production build 0.69.0 2024