Allow to index empty fields

Created on 15 March 2023, over 1 year ago
Updated 19 August 2024, 3 months ago

Setup

  • Solr version: 8.11.1
  • Drupal Core version: 9.4.9
  • Search API version: 8.x-1.28
  • Search API Solr version: 4.2.10
  • Configured Solr Connector: standard

Problem/Motivation

A Solr server and a view can be configured so that results are retrieved directly from the index without loading the entity. However, when a field is empty, Drupal\search_api\Plugin\views\field::preLoadResultItems() loads the entity because apparently it figures that if there is no value in the index, then it could be found in the loaded entity. There should be a way to rely only on indexed values, whether they are empty or not.

There is an "Index empty Fulltext fields" checkbox on the server configuration, but currently no solution exists for other types, like normal text fields, numbers and so on.

Steps to reproduce

Given a set of entities with some empty fields:

  • Set up a Search API Solr server, and enable "Retrieve result data from Solr" in the "Advanced" section.
  • Create an index attached to this server. Select your entity as a datasource. Choose some fields to index, among which one that can be empty (I tested with normal text fields). Launch indexing.
  • Create a view for this index that displays only indexed fields. Check "Skip item access checks" in the query options and make sure you do not check the entity field rendering option (so the entity is not loaded in both cases).

When a row of this view contains empty fields, the corresponding entity is loaded.

Proposed resolution

Some ideas briefly mentioned during a discussion on Slack:

  • Use a dummy value. Might not work for some types
  • Add a NULL handling in Search API to avoid that loading.
  • A kill switch for entity loading.

Remaining tasks

  • Decide on an implementation.
  • Add some settings in the admin form (should they be attached to the server or the index?).
✨ Feature request
Status

Needs review

Version

1.0

Component

Framework

Created by

πŸ‡«πŸ‡·France fmb PerpinyΓ , Catalonia, EU

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @fmb
  • πŸ‡ΊπŸ‡ΈUnited States tedfordgif

    I'm not sure there is a general solution here, but this probably belongs on the Search API project. For example, Elasticsearch supports null value replacement when indexing, notably with the same-type restriction mkalkbrenner discussed on Slack (can't index a "NULL" string for a long int field).

    An alternative general approach would be to create an additional field for each nullable field in the index. E.g. for solr create is_field_number and null_field_number, the latter being a type="boolean" field added to the schema.xml. Search API would add the general support, and each backend would declare support and implement it differently depending on the type of the field. I'm not sure this actually works in practice, and it is complex.

    Perhaps instead of the general/framework component, this issue should be scoped for only the Views integration? Is there another place where empty values cause extra entity loads? Maybe just spin out a separate ticket for that approach.

  • πŸ‡©πŸ‡ͺGermany mkalkbrenner πŸ‡©πŸ‡ͺ

    I think, the simplest solution would be the "kill switch". If a backend sets it, search_api should treat a missing field as NULL and don't try to load it from the entity. The result should be the same, just faster.

  • First commit to issue fork.
  • Status changed to Needs review 3 months ago
  • πŸ‡³πŸ‡±Netherlands bojan_dev

    Added setting/kill switch as suggested, please review.

  • Pipeline finished with Success
    3 months ago
    Total: 314s
    #258279
  • Pipeline finished with Success
    3 months ago
    Total: 313s
    #258288
Production build 0.71.5 2024