Filter has false positives

Created on 10 June 2024, 21 days ago
Updated 29 June 2024, 2 days ago

Problem/Motivation

Once engaged, the HighlightJs filter regex incorrectly matches any string of "language-*" in the node text as a code block to be highlighted.

For example, if a node with a code block also contains the text "language-specific" or "language-dependent" anywhere in the body, the filter regex matches those text strings as code blocks to be highlighted, with "specific" and "dependent" as the respective languages.

Additionally, the filter triggers on isolated <code> tags that are not inside a <pre> tag.

According to the highlight.js defaults and the help text for the filter (on the text format configuration page), the filter should only trigger on <pre><code> cases.

This creates a couple of problems:

  • Triggers the attempted loading of highlight.js language files for those invalid languages.
  • Results in a javascript error that prevents valid code blocks from being highlighted.
  • Results in unnecessary requests for pages that don't contain valid code blocks.

Steps to reproduce

  1. Create a new node containing a valid code block.
  2. Add the text "language-invalid" anywhere on the page.
  3. Publish and view the node.
  • Only the highlight.js language for the valid code block is loaded.
  • The valid code block is highlighted correctly.
  • A request is made for a language named "invalid", which causes a Javascript error.
  • The valid code block is not highlighted due to the Javascript error.
  1. Create a new node containing only a <code> tag that is not inside a <pre> tag, and give that tag a highlight.js language.
    For example:
    <code class="language-php">// some PHP code....
  2. Publish and view the node.

No highlight.js assets are loaded for the page.

highlight.js assets are loaded, even though there's no valid code block that highlight.js can highlight.

Proposed resolution

  • In HighlightJs.php, refine the logic in the filter that searches for code to be highlighted.
  • In highlightjs_input_filter.js, add a catch for the import to handle import errors. Refining the regex in the filter will mitigate false positives, but even with valid language imports there still may be Javascript import errors. Adding a catch will handle those errors more gracefully.
πŸ› Bug report
Status

Fixed

Version

1.1

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States kentr

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @kentr
  • Assigned to kentr
  • πŸ‡ΊπŸ‡ΈUnited States kentr
  • Merge request !14Issue #3453745: Fix filter false positives β†’ (Merged) created by kentr
  • Issue was unassigned.
  • Status changed to Needs review 18 days ago
  • πŸ‡ΊπŸ‡ΈUnited States kentr
  • Pipeline finished with Skipped
    6 days ago
    #207618
  • First commit to issue fork.
  • Status changed to Fixed 6 days ago
  • πŸ‡ΊπŸ‡ΈUnited States nmangold United States

    Everything was reproduced exactly as described and the solution looks good. I don't love the idea of the catching the error in the console log by default. I would like that more if it were an option that was disabled by default, and could be configured via the module settings page, e.g. debug mode. That said, I will merge this MR, but I might include that configuration at a later date.

    Thank you for your contribution.

  • πŸ‡ΊπŸ‡ΈUnited States kentr

    @nmangold

    I don't love the idea of the catching the error in the console log by default.

    Just to be sure, I wanted to clarify that the catch is what prevents fetch errors from breaking valid code blocks on the page. I defer to you on whether or not the error is then written to the console, but IMO the catch is necessary (the catch could absorb the error by default if you don't want the error written to the console).

  • πŸ‡ΊπŸ‡ΈUnited States nmangold United States

    Thanks for the clarification. So, it is essentially a more graceful failure. In that case, I like the idea of the console log by default for now. So, the content creator is not clueless about why the "invalid" code blocks are not functioning as expected. Especially, with the issue regarding non-aliased languages, #3401537.

Production build 0.69.0 2024