Duplicated HTML IDs

Created on 15 May 2023, about 1 year ago
Updated 24 November 2023, 7 months ago

Problem/Motivation

When heading ID is generated, ToC API only takes into account IDs it generates, but not those already present in the
source. There is a chance that user generated content would have a duplicating ID. The chances are higher for the IDs
being generated based on the heading title.

Steps to reproduce

Generate ToC for the following HTML:

<p><a id="documents"></a></p>

<h3>Documents</h3>

Proposed resolution

Take into account IDs from the source for ToC.

Remaining tasks

User interface changes

API changes

Data model changes

🐛 Bug report
Status

Fixed

Version

1.0

Component

Code

Created by

🇳🇿New Zealand RoSk0 Wellington

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @RoSk0
  • Status changed to Needs review about 1 year ago
  • 🇳🇿New Zealand RoSk0 Wellington

    Here is a patch which fixes and issue. Together with test. Test passes locally , but it can't be run in CI as all the other tests in the module are outdated and broken.

  • Status changed to RTBC about 1 year ago
  • 🇳🇿New Zealand ericgsmith

    Patch looks good - fixes the error with duplicate IDs.

    Test is good too - as noted by you it could be a unit test, I think given currently the tests are not running that this shouldn't prevent it being RTBC and a follow up issue can be created to get the tests running.

    +++ b/tests/src/Kernel/TocBuilderTest.php
    @@ -0,0 +1,68 @@
    +   * @todo Convert to unit test and include in TocTest, once that is fixed.
    
  • Status changed to Needs work about 1 year ago
  • 🇳🇿New Zealand ericgsmith

    Sorry, was a bit premature on setting RTBC.

    On closer look the test and code assumes that the ID precedes the generated ID, e.g

    <h3>Documents</h3>
    
    <p><a id="documents"></a></p>
    

    Will still produce duplicate IDs.

  • Status changed to Needs review about 1 year ago
  • 🇳🇿New Zealand RoSk0 Wellington

    Thanks for the review Eric and picking that issue up!

    Also spotted this.

    When ToC API faces this type of HTML:

    <p><a name="documents"></a></p>
    <h2>Documents</h2>
    

    with now deprecated, name attribute.

    It produces this:

    <h2 id="documents">Documents<a href="#documents" class="anchor">#</a></h2>
    <h2 id="documents">Documents<a href="#documents" class="anchor">#</a></h2>
    

    and throws this warning:

    Warning: Uninitialized string offset 1 in Drupal\toc_api\TocBuilder->renderContent() (line 75 of /app/web/modules/contrib/toc_api/src/TocBuilder.php) 
    

    So I added a workaround and test for that as well.

  • Status changed to RTBC about 1 year ago
  • 🇳🇿New Zealand ericgsmith

    Nice work - looks good to me.

    Ran test locally on D10 / PHP 8.1 and all green.

  • 🇳🇿New Zealand RoSk0 Wellington

    Nice , thanks!

    I was testing on D9.5, PHP 8.0 so we have a good test coverage!

  • Status changed to Needs review about 1 year ago
  • 🇳🇿New Zealand RoSk0 Wellington

    We've identified a huge performance hit with introduced ID deduplication code on larger HTML documents. For the context, the documents we were testing this are around 5 Mb of plain HTML and are very complex , with lots of tables , headings and ID attributes.

    Rendering the page in development environment was taking around 15 min to render and failing at times.

    With this new approach based on XPath filtering , rather than whole DOM processing, it is now way more performance and we saw results of under 60 seconds to render on a cold cache.

    Based on the XPath spec:

    The resulting node sequence is returned in document order.

    So we shouldn't loose heading order here. Tests to prove this would be nice to have surely, I don't have capacity for that at the moment.

  • Status changed to RTBC about 1 year ago
  • 🇳🇿New Zealand ericgsmith

    Awesome work! Significant improvement when dealing with large content.

  • 🇳🇿New Zealand RoSk0 Wellington

    We have this patch running in production since 2023-06-27 with no negative effects found to date.

  • Status changed to Fixed 7 months ago
  • 🇦🇺Australia VladimirAus Brisbane, Australia

    Thank you! Commited and released! 🍻

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.69.0 2024