Better support for query filtering and data saving

Created on 16 March 2024, 3 months ago
Updated 20 May 2024, about 1 month ago

Problem/Motivation

In current version of external entities, storage clients can not filter external entities using the $parameter in query() and countQuery() methods. They may not save external data properly as well, depending on how it is mapped, which may lead to external data corruption.

This issue is related to 📌 FieldMapper design issues to address Postponed . In current design, it's up to the storage client implementation to support or not query filtering. If a client does not support filtering, the caller does not know. For instance, with views (xnttviews), one could set filters but they may not be effective and no error or warning would be displayed, to let the admin or end user know that the storage client does not support filtering.

The storage client methods query() and countQuery() receive filter operations on Drupal field names and values with "Drupal operators". From the storage end perspective, those fields don't exist and those operators can be specific to Drupal. The storage client has to make some sort of mapping to translate Drupal filtering into end storage filtering.

A simple case:
For instance, consider a REST endpoint with an external entity that has a "raw" field name "altitude" which is mapped (simple mapper) to a Drupal field "field_point_altitude" (of an external entity "MyLocation") and you want to get all external entities with an altitude higher than 200. The query() method will receive a $parameter:

[[
  'field' => 'field_point_altitude',
  'value' => 200,
  'operator' => '>',
]]

But "field_point_altitude" has no existence on the endpoint while "altitude" does. Furthermore, the operator '>' does not mean anything to the endpoint (ie. the URL can't use a query string like "?altitude>200&..."). But we could have another way of filtering that if the endpoint has a query parameter "min-altitude": then our query could be turned into "?min-altitude=201&...". But what if there's no way available to apply that filter? Would the storage client return entities without considering the filter, would it return nothing even if some entities could match, should it raise an error instead as the filter is not supported? The current behavior will just return all entities with an altitude of 200 without warning.

Another filtering problem comes from the field mapper system introduced last year. While, simple field mapper maps fields 1-to-1, other field mappers (JSONField, Field Processors and string processing+JSON Path mappers and future new mappers) can associate one Drupal field to more than one external source field. Or even worse: a mapping may not be reversible (think of a hash function that converts a source field into a hash value for a Drupal field).

Proposed resolution

In StorageClientBase class, add 7 methods:

  • public querySource() (+interface): queries the storage source with its own known fields and operators (given already "translated"). It would correspond to the current storage client implementations of the query() method in fact (they may just need to rename the method).
  • public countQuerySource() (+interface): similarly, queries the storage source with its own known fields and operators (given already "translated") to count the results. It would correspond to the current storage client implementations of the countQuery() method in fact (they may just need to rename the method). A default implementation will be the current implementation of the countQuery() method.
  • protected postFilterQuery(): filters in a result set, raw entities that fulfill the given parameters (with source/raw field names but Drupal operators). Default implementation will support all the default Drupal operators ('=', '<>', '>', '>=', '<', '<=', 'STARTS_WITH', 'CONTAINS', 'ENDS_WITH', 'IN', 'NOT IN', 'IS NULL', 'IS NOT NULL', 'BETWEEN' and 'NOT BETWEEN' ref) and apply those to raw data.
  • Too complicated with limited use cases.
  • Too complicated with limited use cases.
  • protected getFieldMappings() helper functions that returns an array of Drupal field names as keys and arrays with their corresponding mappings as values. This method will just forward the call to its field mapper through ->externalEntityType->getFieldMapper()->getFieldMappings(). This method is needed to know which field are directly mapped and therefore could be filtered (but still may not if the operator is not supported).
  • protected getFieldDefinitions()helper function that returns field definitions from the entity field manager getFieldDefinitions() method.
  • public transliterateDrupalFilters() given an array of Drupal filters (query $parameters), returns a translated array (translated field names, translated operators and translated values) that can be use to directly query the storage source and an array with not translatable filters.
  • protected testDrupalFilter() used to check if a field value passes a filter condition

ExternalEntityStorageClientBase class will provide a default implementation for the query() method that will select and translate parameters supported by the storage source, call querySource() and if there were unsupported filters, filter the results with those unsupported filters by calling postFilterQuery(). If some fields cannot be filtered (because of the way they are mapped or because the operator is unknown), they will be ignored but corresponding warnings will be issued. This default implementation will help upgrade of existing storage clients as they would just have to rename their public query() method into a public querySource() method with minor changes.

ExternalEntityStorageClientBase class countQuery() will need to be rewritten a little. The current implementation should be copied into public countQuerySource(). If all the filters are supported by the source, it will just call countQuerySource(), otherwise it will do the same as currently: count the query() result. It looks like it does the same job in both cases but the difference will come in clients that will just need to override the countQuerySource() method and change that default behavior when all can be filtered on the storage source side.

ExternalEntity class will have an additional protected property: array $rawData, that will hold the original raw data (provided by ExternalEntityStorage::mapFromRawStorageData()) and set through public setSourceRawData() and retrieved through public getSourceRawData(). This is required for saving operations since fields without mapping would be lost otherwise (see FieldMapperInterface interface changes below). It will be used in its toRawData() method when calling the field mapper createRawDataFromEntityValues().

FieldMapperInterface interface will have:

  • new constants: MAPPING_CONSTANT, MAPPING_DIRECT (one-to-one field mapping without alteration), MAPPING_SIMPLE (mapping a Drupal field from a subset of unaltered raw source fields using simple mapping), MAPPING_EXPRESSION, MAPPING_ALIAS (mapping from another field mapping definition, a kind of "alias" where a mapping is defined at one place and used in several others, may be used by file/image fields, fields with multiple properties and automated mapping)
  • modified methods getFieldMapping() and getFieldMappings() that returns mapping as arrays. Each property mapping will be defined by an array of 2 keys: 'type' which holds the type of mapping (one of the above constants) and 'mapping' which holds the corresponding mapping definition (a constant, a source field name, a JSON Path for multiple source fields or a complex expression).
  • new method setFieldMapping(string $drupal_field_machinename, string $mapping, int $mapping_type = static::MAPPING_DIRECT) allows to change current config and alter current field mapping (dynamically). This function will be very useful for xnttmulti to support joins between storage sources, for xnttmanager to inspect and test mappings and maybe for other future applications.
  • Not needed as it can be done through extractFieldPropertyValuesFromRawData() (now public)
  • Not needed as it can be done through createRawDataFromEntityValues() (public).
  • createRawDataFromEntityValues() method changed to include an additional parameter for original raw data. Rational: when raw data is loaded, all its fields may not be mapped on the "Drupal side"; but when saving then, the fields not mapped would be lost as createRawDataFromEntityValues() would ignore them. We can't transfer the responsibility of adding missing field values to the external entity type has it has no knowledge on were modified fields were mapped in the raw array. So it's the field mapper responsibility to start from the original raw data and make modifications on top of it before it is sent to the storage source for saving. Therefore, the new method signature would be:
    public function createRawDataFromEntityValues(array $entity_values, array $original_values);
  • new method public getMappedFieldName() that returns the raw field names if directly mapped or NULL if not available or mapped directly. Note: for the JSONPath field mapper, a field mapped with the expression "$.somefield" would return "somefield" in this case.
  • Replaced by a new helper method on StorageClientBase: public getSourceIdFieldName() that calls field mapper public getMappedFieldName()

The idea behind those changes would be to allow a complete external entities filtering support and enable safe external entities editing operations without loosing non-mapped field values.
It would let storage clients pre-filter what they can on the source (raw) fields when possible (ie. Drupal-side fields that a directly mapped 1-to-1 to a source field and filtering using an operator that is supported by the storage client). After pre-filtering, a set of matching entities are returned to the base storage client class that can provide post-filtering on the entity set with the reminding filters and warn for unsupported filters. It would provide an (inefficient but available) default set of supported filter operators for all storage clients while the storage clients could also provide their more efficient filter operators when available. In other words, all clients (REST,...) would support all Drupal operators, more or less efficiently depending on what is available on the source.

Remaining tasks

It's implemented in current dev v3 (2024/05/06).

User interface changes

None.

API changes

A couple of new methods on base storage client class, changes on field mapper interface and external entity class.
Current storage clients will have to rename their current query() method into querySource() and countQuery() method into countQuerySource() (if it was implemented).
Field mapper plugins will have to implement new methods getFieldMapping(), getFieldMappings() and setFieldMapping() unless we provide a default implementation in FieldMapperBase class that would return an empty array and issue a warning or something like that (I am not in favor of that; it's only if we want to simplify the upgrade on existing plugins).

While those changes would maintain most of the thing working, it would be better for plugin maintainers to fully take into account the changes in their plugins. As a maintainer of several of such plugins (if not most of them), I would easily perform the upgrade (currently in process and planning to provide PR to some other plugins I don't manage since there are not so many).

Data model changes

None.

Feature request
Status

Fixed

Version

3.0

Component

Code

Created by

🇫🇷France guignonv

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Production build 0.69.0 2024