Better support for query filtering

Created on 16 March 2024, 9 months ago

Problem/Motivation

This issue is related to [3415624]. In current design, it's up to the storage client implementation to support or not query filtering. If a client does not support filtering, the caller does not know. For instance, with views (xnttviews), one could set filters but they may not be effective and no error or warning would be displayed, to let the admin or end user know that the storage client does not support filtering.

Another filtering problem comes from the field mapper system introduced last year. While, simple field mapper maps fields 1-to-1, other field mappers (JSONField, Field Processors and string processing+JSON Path mappers and future new mappers) can associate one Drupal field to more than one external source field. Or even worse: a mapping may not be reversible (think of a hash function that converts a source field into a hash value for a Drupal field).

The query() (and countQuery()) method of storage clients receives filter operations on Drupal field values. From the storage client perspective, the task will be to filter an already mapped value that may come from one or more source fields. While it might be more efficient to filter directly on the source values (on the source side), it might not be possible. Think of a database external entity: it's more efficient to filter fields in a SQL query but if the Drupal field comes from a non-reversible mapping value (like a hash or a mix of multiple fields), it's not possible to handle that in the SQL query.

However, it is always possible to filter values once they are mapped but it means all the external entities must be fetched from the source to be filtered afterward which is not very efficient.

Proposed resolution

The idea would be to let storage clients pre-filter what they can on the source (raw) fields when possible (ie. Drupal-side fields that a directly mapped 1-to-1 to a source field and filtering using an operator that is supported by the storage client). After pre-filtering, a set of matching entities are returned to the base storage client class that can provide post-filtering on the entity set with the reminding filters and warn for unsupported filters. It would provide an (inefficient but available) default set of supported filter operators for all storage clients while the storage client could also provide their more efficient filter operators when available. In other words, all REST clients would support 'LIKE' or '<=' operators as treated on the Drupal side while a database storage client would be able to tell that is supports the LIKE operator directly on the source data efficiently.

  1. add a public method to storage clients that tells if a given field can be filtered using a given operator. Something like "getSupportedFilters(optional_drupal_field_name) : array" which would return an associative array like that: ['=' => TRUE, '!=' => TRUE, 'LIKE' => TRUE,...] when called with a Drupal-side field name and ['=' => [field1, field2], '!=' => [field1, field2], 'LIKE' => [field2],...] when called with no field name. This method will use a protected method getClientSupportedFilters() that would be implemented by clients to tell which filter they support natively. For instance a 'LIKE' filter would be natively supported by a database client but not by a REST client. However, the REST client result could support a 'LIKE' filter if it is processed on the Drupal side on the fetched entities.
  2. change the way the query() method works on base class: no need to be overridden by storage clients but they'll need to override a preFilterQuery() method instead that would basically do the same job than the previous query() version (so existing client implementation would just need to rename the current method in fact). The query() method would call 2 methods: preFilterQuery() and postFilterQuery(). query() will also call getSupportedFilters() to know which filter to pass to pre-filtering and pass the reminders to post-filtering
  3. change the way the countQuery() works on base class: no need to be overridden by storage clients but they'll need to override a preFilterCountQuery() method instead that would basically do the same job than the previous count() version (so existing client implementation would just need to rename the current method in fact). The countQuery() method would call getSupportedFilters() and if all filters are supported it will call preFilterCountQuery(). Otherwise, it will use the (new) query() method to get the count.
  4. add a protected method "preFilterQuery()" that will receive the same arguments of the current query() method but with "filtered" filters: only filters supported by getSupportedFilters() will be passed. The other filters would be processed by "postFilterQuery()". However, the field names to filter would not be Drupal field names but their corresponding source field names provided by the field mapper for 1-to-1 mapped fields. Other mapped fields would be processed in post-filtering.
  5. add a protected method "postFilterQuery()" that would not need to be overridden by client storages that would process the remaining filters on mapped Drupal fields. It would not be very efficient but functional.

Remaining tasks

That's a first though to share. It needs more thinking and discussions.
I'll soon provide a fork with some implementations.

User interface changes

None.

API changes

A couple of new methods on base storage client class, on field mapper as well and some minor modifications on current plug-ins.

Data model changes

None.

Feature request
Status

Active

Version

2.0

Component

Code

Created by

🇫🇷France guignonv

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Production build 0.71.5 2024