"Contains" operator doesn't know whether the content field contains HTML-escaped strings so you can't reliably include characters like ampersands in searches

Created on 5 August 2024, 6 months ago
Updated 23 August 2024, 5 months ago

Problem/Motivation

In Drupal 10.2.4, Views, added "Full Html" field (body field) as exposed filter in the filter criteria.
While search term has "&" symbol, not getting the results.
Example: "jack & jill went up to the hill".

I will not get the results if I'm searching as "jack & jill", but if I type only "jack &", I will get the results.

Steps to reproduce

Step 1: Install Drupal 10.2.4

Step 2: Add some contents for "Article" content type, and in the body field, add some text with "&" symbol.

Step 3: Create a view with page.

i) Format: Fields

ii) Fields: Title and Body fields added.

iii) Filter criteria:

        Published = yes
        Content type ="Article"
        Add "Body" field and expose it. Operator selected as "contains".

iv) Apply and save the view.

Proposed resolution

I need to get the results when I type sometext with "&" symbol also.

🐛 Bug report
Status

Closed: won't fix

Version

11.0 🔥

Component
Views 

Last updated 2 days ago

Created by

🇮🇳India sivagami Bangalore

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @sivagami
  • Status changed to Postponed: needs info 6 months ago
  • I am 99% sure the query syntax is ...LIKE "%Jack & Jill%"..., so what you see is the expected behavior. Views filters are not a search engine and they don't support search keywords.

    More information is needed for this to be taken as a bug report.

  • Status changed to Active 6 months ago
  • I am 100% sure what I wrote above is wrong. You are searching literally for "jack & jill", which should be found, because this works in a raw query.

  • This is because CKEditor 5 changes the import text to jack & jill went up to the hill before you save the node. You can see this at the CKEditor 5 demo. In fact searching jack & jill works.

    Something will have to inform Views that in some fields, an ampersand is escaped.

    If your need is urgent consider using a real search engine backend, like Solr.

  • 🇮🇳India sivagami Bangalore

    Hi @cilefen,
    In fact searching jack & jill works. => Yes, its working when I search with &, but in reality, end users of the site will use just "&", not &

  • 🇮🇳India vinmayiswamy

    Hi @cilefen and @sivagami,

    Thanks for the insights. Given the challenge with HTML-escaped characters like &, Here are some potential approaches to address the issue:

    Custom Filter Handling: We could enhance the Views module with a custom filter to preprocess search terms and decode HTML entities before executing the search. This would involve using functions such as html_entity_decode() to ensure accurate matching. We’d need to be careful about performance and compatibility issues.

    Search Backend Integration: Another option could be integrating a search engine backend like Solr or Elasticsearch. These tools handle special characters more robustly and offer advanced search capabilities, though they would require additional setup and configuration.

    User Instructions: As a temporary measure, we might update user documentation to guide users to search with HTML-escaped characters (e.g., &). While this doesn’t address the root cause, it could help manage expectations in the short term.

    I’d love to hear your thoughts on these approaches. Do you see any other solutions we should explore, or have any opinions on which of these might be the most practical and effective?

    Looking forward to your input and further discussion.

    Thanks!

  • 🇮🇳India vinmayiswamy

    Hi @cilefen,

    Thank you for your response. I want to clarify that my comment was based on my understanding and analysis, and I wasn’t using any AI tools for my contribution. I appreciate your feedback and am happy to discuss any of the suggestions further.

    Looking forward to your thoughts!

    Thanks!

  • Sorry! These days you can’t be sure. It exceptionally well-written.

  • “Custom Filter Handling” is the way to solve this in Drupal Core. We need a setting on the filter to indicate if the search field contains escaped HTML. I have no idea what to do with combined filters.

  • 🇮🇳India vinmayiswamy

    Hi @cilefen,

    Thank you for the feedback. I agree that implementing a custom filter to handle HTML-escaped characters like "&" is a practical solution. This should address the issue with special characters in search queries.

    To implement this, we’ll need to introduce a configuration option in the Views module to specify whether the search field contains HTML-escaped content. This setting will be integrated into the Views filter configuration.

    Next, we’ll have to enhance the filtering logic to preprocess search terms using html_entity_decode(), ensuring that HTML entities are converted back to their respective characters before executing the search. This adjustment will help match special characters accurately against the content.

    We should also consider performance implications and ensure that this enhancement doesn’t introduce significant overhead, particularly with large datasets. Compatibility with existing Views functionalities is another important aspect to address.

    Regarding combined filters, we may need to explore additional strategies. For example, we should ensure that HTML decoding is consistently applied across all filters. We might also need to refine how filters interact with each other and consider more advanced query handling to manage different scenarios effectively.

    If there are any details I might have overlooked or additional considerations we should discuss, please let me know. I’d also appreciate any further suggestions or feedback you can provide.

    Thanks!

  • 🇳🇱Netherlands Lendude Amsterdam

    Wouldn't a filter setup like this also give 'wrong' results if the field value contains something like an HTML tag in the spot the user is searching? I don't think filtering on HTML containing fields not giving the results an end user might expect is really new, nor really a bug. The & thing might be new, but just a new symptom on an existing behaviour I think.
    Views searches in the literal value in the database, that is by design. Putting a 'contains' filter on a full HTML text field has always been dubious and unperformant as far as I know.

  • That's all true. This probably can't be "fixed".

  • 🇮🇳India sivagami Bangalore

    Hi @everyone, I have managed with html_entity_decode() to fix "&".
    Thanks you all!!

  • Status changed to Closed: won't fix 5 months ago
  • 🇮🇳India prashant.c Dharamshala

    @sivagami

    Great you found the solution.
    I think we can close this issue as not an issue with Drupal core #13 🐛 "Contains" operator doesn't know whether the content field contains HTML-escaped strings so you can't reliably include characters like ampersands in searches Closed: won't fix .

Production build 0.71.5 2024