Prevent entity loading by retrieving index field values from Database index

Created on 13 December 2023, 7 months ago
Updated 24 February 2024, 4 months ago

Problem/Motivation

I dont see a way in search_api_db module to retrieve index field values directly from query result. At present, an additional entity load queries are run on each result.

Steps to reproduce

$result = Index::load('my_index')->query()->execute();
$resultItem = reset($result->getResultItems());
$resultItemFields = $resultItem->getFields(FALSE); // pass FALSE to skip load field values via additional query
dump($resultItemFields);

This dump shows empty array, if I didn't pass FALSE - it will do additional query to load values, that can be available directly in index search result.

Proposed resolution

We can add values directly to search query result.

✨ Feature request
Status

Needs work

Version

1.0

Component

Database backend

Created by

🇮🇳India gaurav_manerkar Vasco Da Gama, Goa

Live updates comments and jobs are added and updated live.
  • Needs tests

    The change is currently missing an automated test that fails when run with the original code, and succeeds when the bug has been fixed.

Sign in to follow issues

Comments & Activities

  • Issue created by @gaurav_manerkar
  • 🇦🇹Austria drunken monkey Vienna, Austria

    Thanks for suggesting this feature.
    I’d definitely be open for patches (please no MRs, as testing doesn’t work for those) implementing this, but it does seem a bit tricky. Especially for fulltext fields, which are a) heavily preprocessed and b) not stored verbatim in the DB backend. And just retrieving non-fulltext fields seems of limited use.

    Also, we of course have to take into account that caching the entity field values in this way won’t work properly in all cases, and cannot really be used at all when using “entity field rendering” when displaying the view (or additional access checks, etc. – see here → ). So, anything we implement should probably be optional, at least if it increases disk usage.

    Still, there could be solutions that make sense for this. As said, I’m open for patches or discussions.

    PS: It seems you (like many others – it's really easy to misinterpret) are confused by the "Issue tags" field. As the guidelines → state, they aren't meant for free text tags related to the issue, but only for specific categorization purposes, usually by module maintainers.
    So, if you aren't sure your current usage is correct, please just leave the field empty.

  • Status changed to Needs work 4 months ago
  • 🇳🇱Netherlands ekes

    There are two parts to this I think. One loading the data that has been stored as single field data so it can be used rather than having to load entities to get it. The second to be able to store longer string data, as you can't retrieve the fulltext fields from the database being stored just as the token parts. So two patches to start to unpick this. Both at the least require tests. But the former, loading the field data is simpler. The later making a storage field probably needs more thinking. I'll push both as a branch on the issue too though, as it might be easier for others to also work from there? Finally I'd say documentation of how, and when, these should work will also be needed.

    Loading field data

    The first patch 3408375-03-load-index-field-data.patch adds an option, in much the same was as solr does (the configuration form stuff is c&p), to include the field data in the results. After which point if you do manage to configure the views fields correctly, and only use stored fields (yes not fulltext ones), the entities do not get loaded.

    Storage field

    The second patch 3408375-03-string-storage.patch adds a type only for storage. Nothing special is done, so it gets added as both a column on the search base table, and it's own table, I'm guessing this is underdesirable. It won't be searched, and if I read correctly, creating temporary tables with 'TEXT' types is (a bit, not as much as an entity load) slower. So it maybe only wants to be the additional table? I've not delved into what the queries look like when it's there. The other thing that would be handy is if solr and db implemented the same search api type (rather than having solr_ and db_ versions).

  • 🇦🇹Austria drunken monkey Vienna, Austria

    Thanks a lot for your start on this!
    The general direction definitely looks good. And you’re right, maybe we should move the StorageStringDataType class to the Search API itself (while keeping it still optional for backends to support, of course) so the Solr module might be able to transition to using that, too, in the future. (Would need feedback from Markus first, to make sure this is really in his interest.)

    +++ b/modules/search_api_db/src/Plugin/search_api/backend/Database.php
    @@ -1820,6 +1853,17 @@ class Database extends BackendPluginBase implements AutocompleteBackendInterface
    +      foreach ($retrieved_field_names as $field_name) {
    +        if (!$this->getDataTypeHelper()->isTextType($indexed_fields[$field_name]->getType())) {
    +          $db_query->addField('t', $field_name);
    +        }
    +      }
    

    I think this will not also fail for fulltext fields but also for all multi-valued fields. Retrieving all others within the initial DB query might actually work well enough, yes, but we’ll definitely need separate queries for all multi-valued fields we want to retrieve. (Or at least it would be needlessly complicated to implement this with just a single DB query.)

    In my mind, the search_api_retrieved_field_values query option was also optional and I’d have thought we’d default to retrieving all field values if it was missing. However, I now see that the Solr backend doesn’t do that, so I guess it makes sense to mimick that behavior. Probably the right thing to do most of the time, and should at least work well with Views, which is the main thing.

    +++ b/modules/search_api_db/src/Plugin/search_api/backend/Database.php
    @@ -907,6 +909,10 @@ class Database extends BackendPluginBase implements AutocompleteBackendInterface
    +    if ($db_field['type'] === 'text') {
    +      // Do not index text only for storage.
    +      return;
    +    }
    

    I think this is a misunderstanding in the purpose of the columns added to the shared table for fulltext fields: these are definitely not for storage (which we’re not doing yet) but for sorting, so they’re still needed.

Production build 0.69.0 2024