Twig extension strips anchor tags from the processed text

Created on 6 July 2023, about 1 year ago
Updated 19 July 2023, about 1 year ago

Problem/Motivation

Calling the |spamspan twig filter on the content of text fields strips anchor tags from the text.
This is due to the SpamspanInterface::ALLOWED_HTML doesn't contain the a anchor tag and Xss::filter() filters it out.

Steps to reproduce

Apply the spamspan twig filter on an HTML text field's output with content having anchor tags as well inside.

Proposed resolution

Make sure Xss::filter() doesn't filter it out by adding anchor tag to SpamspanInterface::ALLOWED_HTML.

🐛 Bug report
Status

Fixed

Version

3.0

Component

Code

Created by

🇧🇪Belgium joevagyok

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @joevagyok
  • 🇧🇪Belgium joevagyok

    I wrote unit test to test the twig extension. This patch will fail because it doesn't include the fix in order to prove the problem stated by the issue.

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update about 1 year ago
    1 pass
  • 🇧🇪Belgium joevagyok

    This patch will fail because it doesn't include the fix in order to prove the problem stated by the issue. I also fix dependency injection in the extension which should always refer to renderer interface and not the rednerer object.

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update about 1 year ago
    1 pass, 1 fail
  • 🇧🇪Belgium joevagyok

    Uploading the final patch containing the fix.

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update about 1 year ago
    10 pass
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update about 1 year ago
    10 pass
  • @joevagyok opened merge request.
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update about 1 year ago
    10 pass
  • Status changed to Needs review about 1 year ago
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update about 1 year ago
    10 pass
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update about 1 year ago
    10 pass
  • 🇧🇪Belgium joevagyok

    Uploading the latest patch in which I removed an unused use statement and an extra line.

  • 🇩🇪Germany Anybody Porta Westfalica

    Thanks @joevagyok this also please needs tests to ensure it works as expected.

  • 🇧🇪Belgium joevagyok

    Did you check the patch? Please see the patch and TwigExtensionUnitTest case which asserts the problem.

  • 🇩🇪Germany Anybody Porta Westfalica

    Thanks @joevagyok, but while most of the tests make sense to me, some don't, like

    +    yield 'should maintain anchor classes' => [
    +      '<a class="someclass" data-before="before" href="mailto:email@example.com" id="someid" data-after="after"></a>',
    +      '<span class="spamspan"><span class="u">email</span> [at] <span class="d">example.com</span><span class="e">class="someclass" data-before="before" id="someid" data-after="after"</span></span>',
    +    ];

    Printing
    class="someclass" data-before="before" id="someid" data-after="after"
    as text is crazy. To preserve them, we'd then better at further spans to keep the attributes or change certain attributes into data-PREFIX- attributes.

    Any suggestions?

  • Status changed to Needs work about 1 year ago
  • 🇩🇪Germany Anybody Porta Westfalica

    PS: Could you please use MR's instead of patches? Makes life for developers and maintainers easier. :)

  • 🇧🇪Belgium joevagyok

    @Anybody I took the test cases from the existing module tests.

    The stuff that you mention is actually the problem I raised in #3372537 🐛 Extra attributes of the a tag are rendered in the obfuscated format Fixed where I proposed a simple solution without changing the whole obfuscation logic and spamspan JS to normalize it.

    I think we can leave them in the markup as is and it's enough to hide them from the end user, since the current JavaScript logic knows exactly what to do with it and normalize it back correctly.

  • 🇩🇪Germany Anybody Porta Westfalica

    @joevagyok thanks, we'll take a deeper look! @Grevil will reply.

    Let me say, I'm very happy about your help and the progress spamspan makes now. Has been dead and buggy for quite a while. That's really awesome.

  • 🇧🇪Belgium joevagyok

    Yeah I agree! Nice to have you guys on board!

    To eliminate any confusion regarding the obfuscation, I want to clarify the following, which relates to #3372537 🐛 Extra attributes of the a tag are rendered in the obfuscated format Fixed as well and it is vital to understand.

    '<span class="spamspan"><span class="u">email</span> [at] <span class="d">example.com</span><span class="e">class="someclass" data-before="before" id="someid" data-after="after"</span></span>'
    The markup above is when the twig/text filter obfuscates the email link into chunks of HTML markup. This step happens before the JavaScript would normalize it back into a correct mailto link. This would be visible only, to users without JavaScript enabled in the browser, so the JavaScript can't normalize it back into a correct mailto link. However, we need to take this into account when when we look at the text printed text, because there might be users who has no JS enabled, and to those visitors we need to keep the email readable, as well as for screen readers for disabled users. That is why I made the issue I mentioned above, which aims to hide the attributes fro mthe printed text keeping it clean and readable for users without JavaScript.

    This is what I assert in the twig unit test, the intermediate step before JavaScript kicks would kick in, that is why you see all that spamspan markup in the assertion. That has to be tested as well to make sure the JavaScript will have the markup in the format it expects. Not to only test the final normalized link by JavaScript.

  • 🇩🇪Germany Anybody Porta Westfalica

    Thanks @joevagyok, I agree, but I'd suggest we change the result of the PHP filter here to use additional spans for the attributes whiche are currently printed as text, so that this is readable instead. Based on that, the JS should convert that to back perfect output, like before.

    So from my perspective the (testd) result of the PHP filter should be from:
    <a class="someclass" data-before="before" href="mailto:email@example.com" id="someid" data-after="after"></a>
    turned into something like:
    <span class="spamspan"><span class="spamspan--ahref" class="someclass" data-before="before" id="someid" data-after="after"><span class="u">email</span> <span class="spamspan--at">[at]</span> <span class="d">example.com</span></span>

    Which is human-readable and not printed markup, like before.

    This can be combined and obfuscated as needed by things like: Allow custom spamspan class to make it harder for spambots to detect spamspan Active .

  • 🇧🇪Belgium joevagyok

    For me the approach is fine, however, I would not mix that into this issue. I will implement the discussed approach in #3372537 🐛 Extra attributes of the a tag are rendered in the obfuscated format Fixed since this issue is about the twig filter, which is something else.

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update about 1 year ago
    16 pass, 1 fail
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update about 1 year ago
    25 pass
  • 🇧🇪Belgium joevagyok

    Moving back to Needs review as the scope of the issue is covered in the MR.

  • Assigned to Grevil
  • Status changed to Needs review about 1 year ago
  • 🇩🇪Germany Anybody Porta Westfalica
  • First commit to issue fork.
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update about 1 year ago
    25 pass
  • Status changed to Needs work about 1 year ago
  • 🇩🇪Germany Grevil

    "SpamspanInterface::ALLOWED_HTML" is also used in "SpamspanTrait.php" in the "filterTagContents" method with the following comment:

    // Nested elements are illegal.

    So there is probably a reason, why it wasn't in the "ALLOWED_HTML" array. I'll have a look in #2988954: SpamSpan filter should only apply to HTML text nodes , where this comment got introduced together with the Trait itself.

  • Status changed to RTBC about 1 year ago
  • 🇩🇪Germany Grevil

    Ah, I see this is a general practice, to never nest anchor elements. But I'd say that shouldn't be handled by our filter, so RTBC!

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update about 1 year ago
    25 pass
  • Status changed to Fixed about 1 year ago
  • 🇩🇪Germany Grevil

    Only removed the unnecessary comments, so no need to wait for the tests to pass again.

  • Issue was unassigned.
  • 🇩🇪Germany Anybody Porta Westfalica
  • Assigned to joevagyok
  • Status changed to Active about 1 year ago
  • 🇩🇪Germany Anybody Porta Westfalica

    ArgumentCountError: Too few arguments to function Drupal\spamspan\TwigExtension\SpamspanExtension::__construct(), 1 passed in /web/core/lib/Drupal/Component/DependencyInjection/Container.php on line 259 and exactly 2 expected in Drupal\spamspan\TwigExtension\SpamspanExtension->__construct() (line 39 of /web/modules/contrib/spamspan/src/TwigExtension/SpamspanExtension.php).

    after upgrading to 3.1.0!
    Ideas?

  • Issue was unassigned.
  • Status changed to Fixed about 1 year ago
  • 🇩🇪Germany Anybody Porta Westfalica

    Seems correct code-wise. After clearing caches in the DB (because drush cr also threw this error), works fine again... Super strange, perhaps a side-effect of the other issue.

  • 🇧🇪Belgium joevagyok

    I was able to reproduce the problem, and it happens if you don't execute drush cr to clear the caches and rebuild the service container after composer update. Just make sure to clear the cache and it's good.

  • 🇩🇪Germany Anybody Porta Westfalica

    Follow-up as it now happens to me on another project where I just ran updates.
    So this definitely needs an update hook to clear the relevant cache(s).

    🐛 Rebuild container after update to avoid "ArgumentCountError" for the "SpamspanExtension" Fixed

  • 🇧🇪Belgium joevagyok

    Cache clear should be part of the deployment process which runs the composer update. Providing cache clear in update paths is not a good practice, because you can break other update paths after. I saw the changes and for me rebuilding the container is fine, however such consequences are something worth noting, and rely on solid deployment steps.

  • 🇩🇪Germany Anybody Porta Westfalica

    @joevagyok thanks!

    Do you know, if update.php runs cache flush?
    I think I tested via drush updb and needed the cache clear, otherwise I ran into the issue. That was the reason.

  • 🇧🇪Belgium joevagyok

    Nope, drush updb doesn't clear cache in general. IF cache clear is essential in update path, which is rare, we always clear the problematic cache directly, with scope, like entity storage cache for a given entity storage with the entity ID. Global site wide clear cache should come after drush updb. Such problem usually appears, when services introduced or replaced, or dependencies changed or modules being removed.

    The error above appeared consistently for me if I invalidated cache after composer update and before drush updb which usually result in such problems.

  • 🇩🇪Germany Anybody Porta Westfalica

    Thanks, I had the issue on two sites even after drush updb, following composer update directly.
    10/12 projects didn't run into the issue. Really strange.

    So should we keep the cache clear?

  • 🇧🇪Belgium joevagyok

    Service container rebuild should be fine.

    However, I wonder about the effectiveness of it. If the drush updb cannot run, and spamspan update probably won't be the first to run, how does it work at all? Also, what is the failing update doing with the site? Answers to these questions are very different from project to project.

  • 🇩🇪Germany Anybody Porta Westfalica

    drush updb runs. For the two affected projects it worked fine and fixed the issue.

    So let's see what happens?

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024