Exempt words if prefixed or suffixed with certain characters.

Created on 29 May 2025, about 1 month ago

Problem/Motivation

This might seem like an all-too-specific use case but experience on our project has showed that it does crop up a fair amount.

Consider the word 'bit'.

When used alone it may mean (synonym): 'binary digit', but when used in 'bit-by-bit' it probably doesn't warrant a glossary pop up.

It's not just the simple En Dash (–) though; it can also be useful/desirable to exempt words if they are joined (either side) by any of these...

Colon (:)
Semicolon (;)
Apostrophe (’)
Slash (/)
Underscore (_)
Em Dash (—)

Might it be possible to allow a setting in the configuration UI where we could supply a string of possible prefix/suffix characters?

Just an idea, but a good one think!

Feature request
Status

Active

Version

4.2

Component

User interface

Created by

🇬🇧United Kingdom SirClickALot Somerset

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @SirClickALot
  • 🇬🇧United Kingdom SirClickALot Somerset
  • 🇫🇷France mably

    Not sure to understand how it should work... 😉

  • 🇬🇧United Kingdom SirClickALot Somerset

    Apologies if I wasn't clear.

    Let me go through the example again.

    Imagine that 'bit' is a glossary term.

    If 'bit' appears in the main text then of course it SHOULD be flagged and be clickable.

    If 'bit-by-bit' or 'bit:by:bit' or 'bitbybit' appears in the main text, it SHOULD NOT be flagged.

    We wondered whether this could be achieved by providing a UI where we can specify 'delimiters' like these.

  • 🇫🇷France mably

    Ok, we are currently separating words in text using the following regexp when "full-word mode" is activated : "/\b(bit)\b/"

    You want to be able to replace it with something like: "/(?<!\w|-)bit(?!\w|-)/" that removes hyphen from word separators.

    Does I understand it correctly?

  • 🇫🇷France mably

    @sirclickalot feel free to give a try to this issue's MR.

    Still wondering what kind of character, besides the hyphen possibly, we could remove from the list of word separators.

  • 🇬🇧United Kingdom SirClickALot Somerset

    Hi @mably,

    Apologies for the delay, I was actually giving the new feature a good outing early this morning.

    Absolutely, the MR seems to do exactly what we we were suggesting but I was also hitting some extraordinarily long delays ending in timeouts on my local machine which my, or may not have been related to the new feature.

    I will experiment in more detail later today and report back.

  • 🇬🇧United Kingdom SirClickALot Somerset

    Right, I've just spent a good couple of hours experimenting again.

    I seem to get in a bit of a mess with the various patches.

    I applied 43.patch and at that point I had the new backend configuration option but the wording of the label was a bit different to the one you showed in #10 - it's all a blurred now in my mind, but I think that at that point, the new feature seemed to be working (at least doing something!) but I couldn't say for sure because I was also having tremendously long page load times????

    I then noticed your change of wording in #10 so I applied commit_id=b3239a5d89cc051a913df860673e53fcfe8d1601 over the top and the new backend configuration disappeared!

    Thoroughly bamboozled at that point, I started over!

    I rolled back to a clean 4.2.0.

    I then patched using 43.patch

    I have switched OFF all other options just to avoid any confusion / conflict...

    I have the following text Tax term in the database...

    I have the following FOUR undesired word boundary characters in the configuration...

    That's -, , :, and finally /

    Multiple cache clears later (including browser) and I see...

    I have also tried using only one of the four characters (e.g. :) in the configuration but no joy there.

    I have also tried only one of the four characters in quotes (e.g. ":") in the configuration just in case but no joy there.

    So, contrary what I might have aid earlier, I really can't see that it's work at all.

    Probably not what you were expecting but I cannot see quite where I have gone wrong? ;-(

    Hope this is of some use at least.

  • 🇫🇷France mably

    Match full-word is required in fact.

    We should make that clearer in the configuration form.

    Could you give it another try with "match full-word" activated?

  • 🇫🇷France mably

    The "boundary exceptions" field will now be displayed only when "Match full-word" is activated:

  • 🇬🇧United Kingdom SirClickALot Somerset

    Hi @mably,

    Right, after another 90 mins or so playing, I have what I think is some useful and interesting feedback.

    I reinstated all of my other chosen options including the use of Synonyms because I wanted to test it all 'real world'.

    I applied the patch from...

    https://git.drupalcode.org/project/term_glossary/-/merge_requests/43/diffs?commit_id=f22b11c68d130c71396e3861a2795d35167c46f9

    ...over the top of where I was at the end of #12 and once again I had problems - for example, I ended with a duplicate boundary_exceptions key in term_glossary.schema.yml and the auto-hiding of the new feature input form element (the JS) didn't work.

    I am wondering if I am perhaps mistaken in believing that ought to be able to apply these versions of the patch additively like this?

    Anyway, back to clean 4.2.0, applied 43.patch again and it's all working as expected!

    HOWEVER...

    You may recall I alluded earlier to my suspicion that I was seeing some significant performance issues as I added more and more characters to the 'Characters that should not be...' input.

    So I did some more testing gradually increasing the number and found something interesting.

    Config value; - Result: Reasonable page load speed for local dev page.
    Config value; -— Result: Reasonable page load speed for local dev page.
    Config value; -—: Result: Reasonable page load speed for local dev page.
    Config value; -—:/ Result: PHP timeout!

    It seemed at first that tipping it over to 4 characters with that last /had caused this but...

    Config value; / Result: Result: PHP timeout!

    It was the / character that was causing it all along.

    I retested this thesis several times over and it 100% repeatable for me at least.

    I also noted that the / character was not one that you had tried back in #6

  • 🇫🇷France mably

    The "/" probably requires some specific escaping. Will have a look at it.

  • 🇬🇧United Kingdom SirClickALot Somerset

    Absolutely, working exactly as required.

    • mably committed ea2f5209 on 4.x
      Issue #3527331 by mably, sirclickalot: Exempt words if prefixed or...
  • 🇫🇷France mably

    Included in release 4.3.0-rc1 .

  • 🇫🇷France mably

    Working with the following exception pattern :-/\:

  • 🇬🇧United Kingdom SirClickALot Somerset

    I can confirm that 4.3.0-rc1completely solves this issue.

    Thanks

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024