Fix Email address being removed in the link text and fix every second link not being obfuscated

Created on 5 July 2023, over 1 year ago
Updated 13 July 2023, over 1 year ago

Problem/Motivation

I noticed that if I want to use an email address as the label of the link in text that address will be removed or replaced by the address in the href attribute. This is because of an old problem, if it is left there everything will be obfuscated in a duplicate manner and will be normalized badly. See parent issue attached to this issue. #305464

Now I believe that it is a loss of information to the end user and careless editors might miss this. We should not just remove data on rendering or replace based on the assumption that the user want to put the same email as in the href, even if it is wrong from usability point of view.

Steps to reproduce

Add the following markup to a WYSIWYG which has the spamspan filter enabled:
<a href="mailto:test@example.com">Send email to test@example.com</a>
You will see that the email from the link text is removed.

Proposed resolution

I implemented a patch that instead of removing the email completely from the link text it applies a basic level of obfuscation as follows:
Send email to testatexampledotcom
And afterwards the JS detects this obfuscated email in the text of the link and normalize it back to Send email to test@example.com that will be readable for screen readers for disabled users just fine.

🐛 Bug report
Status

Fixed

Version

3.0

Component

Code

Created by

🇧🇪Belgium joevagyok

Live updates comments and jobs are added and updated live.
  • Needs tests

    The change is currently missing an automated test that fails when run with the original code, and succeeds when the bug has been fixed.

Sign in to follow issues

Comments & Activities

  • Issue created by @joevagyok
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    1 pass
  • @joevagyok opened merge request.
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    1 pass
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    1 pass, 1 fail
  • 🇧🇪Belgium joevagyok

    Uploading a patch for composer patching based on the opened MR.

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    2 pass
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    2 pass
  • Status changed to Needs review over 1 year ago
  • 🇩🇪Germany Anybody Porta Westfalica

    @joevagyok thanks! Also needs tests, please. Shouldn't break again.

  • 🇧🇪Belgium joevagyok

    @Anybody Tests are implemented, see patch and Spamspan FunctionalJavascript test class.

  • Assigned to Grevil
  • Status changed to Needs work over 1 year ago
  • 🇩🇪Germany Anybody Porta Westfalica

    Thanks @joevagyok this looks excellent to me and is indeed a major problem. So let's call this a bug.

    I think we need to be aware of certain cases and test that, especially with the "Convert URLs into links" filter! See https://www.drupal.org/project/spamspan/issues/3373844#comment-15145179 📌 Remove wrong filter order assumption in the README. Needs work for details.

    We need to ensure that the following text entered into the Text Editor and filtered by "Convert URLs into links" filter works as expected and is not double-obfuscated or something like that.

    First question is: How does the core "Convert URLs into links" filter handle the following examples:
    <a href="mailto:info@example.com">info@example.com</a>
    <a href="mailto:info@example.com">Contact us: info@example.com</a>
    Contact us: info@example.com

    (without using spamspan). Can someone try that, please?

    If one of them ends up in:
    <a href="mailto:info@example.com"><a href="mailto:info@example.com">info@example.com</a></a>
    I guess that would be a bug in that filter, and I hope (and think) it doesn't.

    Once that's safe, we can proceed with the examples from the issue summary.

  • 🇩🇪Germany Anybody Porta Westfalica

    I even think this is a major bug. The examples listed in #10 are very typical. We should also have tests for these with and without JS.

  • 🇧🇪Belgium joevagyok

    Normally you would set it up in a way that "Convert URLs into links" filter is executed first on the text, then the spamspan.
    We are testing this case in our platform. There is no double obfuscation happening. I can confirm it is working fine. I will send screenshot to @Grevil on slack.

  • 🇧🇪Belgium joevagyok

    Here I upload 3 screenshot with the examples mentioned in #10:

    1. The body field content in the WYSIWYG
    2. The result text without JavaScript enabled, in obfuscated format
    3. The result text with JavaScript enabled, in normalized format
  • 🇧🇪Belgium joevagyok

    This was actually a very good exercise because I have noticed another bug:
    Using the following example:

    <a data-link-style="default" href="mailto:first@example.com">first@example.com</a></p>
    
    <p><a data-link-style="default" href="mailto:second@example.com">second@example.com</a></p>
    
    <p><a data-link-style="default" href="mailto:third@example.com">third@example.com</a></p>
    
    <p><a data-link-style="default" href="mailto:fourth@example.com">fourth@example.com</a></p>

    Every second link a tag was not obfuscated, as you can see in the previous comment's image , the markup contains a link making the second example show as link even though JavaScript is not enabled. This should not happen!

    It is due to the SpamspanDomTrait::processAsDom() functions logic where replaceDomNode function replaces the existing DOM node with a new one. Because of the way PHP's DOMDocument works, this operation modifies the DOMDocument in place. So the next time the loop in processAsDom continues, it's iterating over a modified document, which may not contain the same elements in the same order as before resulting in the problem presented above.

    I will push a fix and a test covering this case in the MR. I will upload a patch too for composer.

  • First commit to issue fork.
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    16 pass
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    16 pass
  • 🇩🇪Germany Grevil

    Nice find @joevagyok let me know when it is ready to review! :)

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    16 pass, 2 fail
  • 🇩🇪Germany Grevil

    Changes concerning the original email adress link text removal LGTM! Same with the iteration bug code! Let's see what the finished tests have to say, once @joevagyok finishes them!

    @Anybody are you fine with tackling the "iteration bug" mentioned in #14 in this issue (after renaming the issue)? Or would you prefer to add a follow-up issue?

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    16 pass
  • 🇧🇪Belgium joevagyok

    I pushed the fix for the iteration problem and ported the test into the new test structure.

  • Status changed to RTBC over 1 year ago
  • 🇩🇪Germany Grevil

    Very nice @joevagyok! RTBC from my side! Let's wait for @Anybody for the final approval, concerning the extra setting and whether or not to split this issue.

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.0.7 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    16 pass
  • 🇧🇪Belgium joevagyok

    In order to comply with the request to test this without JS too, I adapted the kernel test in my last commit.

  • Status changed to Fixed over 1 year ago
  • 🇩🇪Germany Grevil

    All green! And kernel test looks good. Just talked with @Anybody internally and he agrees with fixing both issues here!

    Thanks a LOT @joevagyok!

  • Issue was unassigned.
  • 🇩🇪Germany Anybody Porta Westfalica
  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024