πŸ‡ΊπŸ‡ΈUnited States @pdcarto

Account created on 28 September 2003, about 21 years ago
#

Recent comments

πŸ‡ΊπŸ‡ΈUnited States pdcarto

Paging @Wim Leers - the issue branch has still not been merged (there is no MR).

πŸ‡ΊπŸ‡ΈUnited States pdcarto

This issue is marked "Closed (fixed)" but it has not been tagged into a release. That forces anyone who wants to use this fix to use a dev branch of this module which, by definition, is not recommended for production sites.

The remaining open related issue πŸ› TypeError: 'WidgetBase::errorElement(): Argument #1 ($element) must be of type array, null given, called in WidgetBase.php' Active is weakly described (no steps to reproduce, unable to be reproduced by some commenters) and should not, in my opinion, block getting this work out to a tagged release.

@johnv, would you be willing to get this merged/tagged into a release?

πŸ‡ΊπŸ‡ΈUnited States pdcarto

The updated patch β†’ works for me - fixes this issue with group v2.2.2

πŸ‡ΊπŸ‡ΈUnited States pdcarto

Anybody reading all the way to the bottom of this issue thread may wonder why it's marked as "Fixed", when the last comment describes a bunch of manual steps. That's because those steps were added in the group.install file in 2.0.x: https://www.drupal.org/project/group/issues/3355034#comment-15075342 πŸ“Œ Improve upgrade path from 8.x-1.x to 2.0.x Fixed

πŸ‡ΊπŸ‡ΈUnited States pdcarto

I'm not sure that this actually fixed the problem, or possibly I'm seeing a different problem. I see a `deleteItems` task in `search_api_task` with 1680 ids. Solr fails with a "too many boolean clauses" message.

In my case, a parent object is being deleted (a pdf file), spawning the deletion of the indexed hocr text for each of its 1680 children (pages).

I tried editing `maxBooleanClauses` - setting it to a very big number (default is 1024) . Initially I edited and re-installed `solrconfig_query.xml` and restarted solr, which had no impact. Then I found search_api_solr's `search_api_solr.solr_cache.cache_queryresult_default_7_0_0` configuration and changed it there, which again had no impact.

It seems to me that there is one problem here with two possible solutions:

  1. Figure out how to make solr use and honor the `maxBooleanClauses` setting.
  2. Actually chunk the solr queries (notwithstanding the changes in 8e3cf13e, solr doesn't seem to actually be splitting the huge number of booleans into separate queries)
πŸ‡ΊπŸ‡ΈUnited States pdcarto
πŸ‡ΊπŸ‡ΈUnited States pdcarto

I can also confirm that the patch fixes the issue.

πŸ‡ΊπŸ‡ΈUnited States pdcarto

Thanks for picking this up adel-by β†’ !

I have tested merge request !10 with a search api solr view with entity reference display, and I get good OAI results without errors.

I also tested with a standard drupal content view with entity reference display, and likewise saw expected results and did not see any errors.

I have not tested using a non-solr search_api indexed view. It seems to me that if someone can do that test, then we will have covered all the likely places where this MR could introduce a problem.

πŸ‡ΊπŸ‡ΈUnited States pdcarto

MR created.

The main points:

  • In \Drupal\rest_oai_pmh\Plugin\QueueWorker\RestOaiPmhViewsCacheBase::processItem, it checks to see if $view->getBaseEntityType() returns non-empty before it tries to get the id. If it returns false, it does not try to fill in $this->member_entity_type or $this->member_entity_storage
  • In \Drupal\rest_oai_pmh\Plugin\QueueWorker\RestOaiPmhViewsCacheBase::indexViewRecords, it checks to see if $this->member_entity_type is empty.
  • If so it looks to see if the member $id matches the view result identifier pattern used by search_api views: ^entity:(?<entity_type>[a-z\-_]+)\/(?<entity_id>[a-z0-9]+):(?<entity_language>[a-z]+)$
  • If the $id matches that pattern, then it sets the entity type and entity id from that string, and then proceeds to try to load the entity storage entity and then the entity.
  • If the pattern does not match, then the entity is not loaded, causing the rest of the loop to be skipped and move on to the next member. Should we do more here? E.g. Log the skipped item?
  • This last step is inside a try/catch. If there is an error (e.g. the entity storage entity cannot be loaded), it logs an error and leaves the entity undefined, thereby skipping this record and moving on to the next member. I'm not sure if this is the desired behavior.
  • If $this->member_entity_type is not empty, then it proceeds as normal, loading the entity storage and then the entity.
πŸ‡ΊπŸ‡ΈUnited States pdcarto

Thinking about this further, I feel I must be missing something, or else there is some fundamental disconnect between views, search_api, search_api_solr, and how this module operates. I'm further confused since it seems that other issues seem to suggest that search_api_solr does work with this module. So I wonder mine is an edge case. Could it be that $view->getBaseEntityType only returns false if your search api index is configured to index multiple entity types (data sources)?

That said, even if it's not usual, a given solr index can contain records for any number of entity types. Nodes, media, and files, for example. When you create a view from that search_api_solr index, nowhere are you required to specify an entity type, though you can filter by datasource. The datasource for a given record in the solr index may not correspond to an entity type at all, and the datasource can be different from one search api view result record to the next. So the assumption in $this->member_entity_type = $view->getBaseEntityType()->id(); seems wrong to me.

On the other hand, the $id for search_api views result row here (example: "entity:node/109229:en") provides all the information needed to identify the entity if it exists. So if we must have a loaded entity in \Drupal\rest_oai_pmh\Plugin\QueueWorker\RestOaiPmhViewsCacheBase::indexViewRecords, we have the information needed to get it if $view->getBaseEntityType() returns false.

I'll be testing some code to do this.

Production build 0.71.5 2024