Colon identified as a spam character

Created on 15 March 2023, almost 2 years ago

The reject patterns that come pre-configured with this module include a series of spaces, followed by a colon. In our experience with using this default pattern, any colon in a form response is identified as a spam character, even if it isn't preceded by any spaces. One of our forms requests a text input that includes a time, so real responses like "5:30pm, notarize a form" are flagged as spam.

Could the pre-configured reject patterns with spaces and colons be enclosed in single quotes to avoid this issue? It looks like the change would need to occur in the file protected_forms.settings.yml.

πŸ› Bug report
Status

Active

Version

2.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States sclsweb

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @sclsweb
  • Status changed to Postponed: needs info almost 2 years ago
  • πŸ‡ΊπŸ‡ΈUnited States AltaGrade

    Could you please try enabling Special Characters language script and see if the problem still persists?

  • πŸ‡ΊπŸ‡ΈUnited States sclsweb

    Is Special Characters language script available in 2.0.2? (That is the version I'm using, and I can't see an option for Special Characters. Do I need to be testing this with 2.0.x-dev?)

    On 2.0.2 I have the following language scripts enabled, and it doesn't allow a colon as in "5:30pm":
    - Currency Symbols
    - Latin
    - Miscellaneous Symbols

  • πŸ‡ΊπŸ‡ΈUnited States AltaGrade

    Miscellaneous Symbols it is. Please enable that one and give another try as semicolon and similar symbols are not part of Latin language scripts.

  • πŸ‡ΊπŸ‡ΈUnited States sclsweb

    I can confirm that even with Miscellaneous Symbols enabled, a colon is still being flagged as spam, using the default reject patterns which include a series of spaces with a colon.

    When I use Latin as the only language script (WITHOUT Miscellaneous Symbols), and remove " :" from the reject patterns or put quotation marks around it, colons are not flagged as spam. To me this suggests that colons are part of Latin language script, unrelated to Miscellaneous Symbols, and the default reject pattern is the issue.

    If " :" is intended to be a pre-installed reject pattern, could it be enclosed in quotes to avoid over-blocking?

  • πŸ‡©πŸ‡ͺGermany Anybody Porta Westfalica
  • Status changed to Active over 1 year ago
  • πŸ‡©πŸ‡ͺGermany Anybody Porta Westfalica

    First a (unit?) test should be added to ensure only expected words are blocked and no others. colon should be one example.

    Then we can see the status and fix it :)

  • Assigned to Grevil
  • πŸ‡©πŸ‡ͺGermany Anybody Porta Westfalica

    @Grevil will check this in the context of ✨ Needs schema check and tests for all functionality Needs work as soon as we have the time.

  • Issue was unassigned.
  • πŸ‡©πŸ‡ͺGermany Anybody Porta Westfalica
  • πŸ‡©πŸ‡ͺGermany Anybody Porta Westfalica

    @lrwebks: Could you check this please and add a test?

  • πŸ‡©πŸ‡ͺGermany lrwebks Porta Westfalica
  • πŸ‡©πŸ‡ͺGermany lrwebks Porta Westfalica

    I have definitely run into this issue before, where not enclosing strings in YAML was leading to unexpected behavior, so I will certainly take a look. I am also fairly confident that the colon is part of the Latin language character set, since it is a part of ASCII, which is almost entirely contained within the Latin set. I'll check it out!

  • πŸ‡©πŸ‡ͺGermany lrwebks Porta Westfalica

    The culprit of the problem is the following, as far as I can see:
    In line 280 of protected_forms.module there is a helper function to turn the comma separated string from the settings page into an array. Sensibly enough, it also strips away the whitespace from all array items, since some users put a whitespace after each comma and others don't:

    // Trim white spaces of array values in php.
    $array = array_map('trim', $array);
    

    This of course also deletes the placed whitespace in front of the colon, that is then flagged as spam.

    A simple but not very gentle solution would of course be to not trim white space and force the user to not put spaces after their comma…
    What do you think about this regarding a solution?

  • πŸ‡©πŸ‡ͺGermany Anybody Porta Westfalica

    textarea separated by newlines makes a lot of sense to me and then remove the trimming and keep expected whitespaces.
    This will need an update hook to replace the comma by "\n" and trim the values.

  • πŸ‡©πŸ‡ͺGermany Anybody Porta Westfalica

    Please ensure that the default values are preceded by a space by default: :, :,. Anyway I'd even vote to remove them, because I think they are super risky in general?

    @AltaGrade: What was the idea behind these signs?

  • πŸ‡©πŸ‡ͺGermany Anybody Porta Westfalica

    I guess @lrwebks proposed solution will also solve ✨ Possibility to set words as reject pattern Needs review ?

  • πŸ‡©πŸ‡ͺGermany lrwebks Porta Westfalica

    Everything should be working now. I have also tested the update hook manually and the conversion from the comma separated list to the line separated list works absolutely fine.

  • πŸ‡©πŸ‡ͺGermany lrwebks Porta Westfalica
  • πŸ‡©πŸ‡ͺGermany lrwebks Porta Westfalica
Production build 0.71.5 2024