🇩🇪
Account created on 12 March 2007, over 17 years ago
#

Merge Requests

Recent comments

🇩🇪Germany mkalkbrenner 🇩🇪

The main issue is that the tests require different Solr versions. Using Github actions we could simply use docker to start them.
That seems way more complicated using the GitLab CI and the drupal templates.

🇩🇪Germany mkalkbrenner 🇩🇪

It seems that there's an issue that the *.bin files aren't picked up anymore when creating the zip. Maybe caused by a third party lib we use. I need to verify this.

🇩🇪Germany mkalkbrenner 🇩🇪

Nesting something in drupal entity field is something different the nested documents in Solr.
Or did you write custom code to index that data as a nested document?

🇩🇪Germany mkalkbrenner 🇩🇪

This patch in #20 can't be applied. It would break the Drupal 11 compatibility which is already declared in composer.json and info.yml.

🇩🇪Germany mkalkbrenner 🇩🇪

tried commenting put DependencySerializationTrait as per #4 but then I still receive

Fatal error: Declaration of Drupal\search_api_solr\Plugin\search_api\backend\SearchApiSolrBackend::__sleep() must be compatible with Drupal\search_api\Backend\BackendPluginBase::__sleep(): array in /var/www/html/web/modules/contrib/search_api_solr/src/Plugin/search_api/backend/SearchApiSolrBackend.php on line 5077

You mix two different issues here. Removing DependencySerializationTrait fixes __wakeup(), not __sleep()!

🇩🇪Germany mkalkbrenner 🇩🇪

Not all patches mentioned here are the correct ones for Search API Solr.
I'm preparing a new Search API Solr release.

🇩🇪Germany mkalkbrenner 🇩🇪

I think I would be faster if people stop asking over different channels and if I need to answer less questions ;-)

🇩🇪Germany mkalkbrenner 🇩🇪

I'll leave that issue open for others to find it. But the the fix is wrong!

The correct fix is to not include the DependencySerializationTrait anymore as this one is already included by PluginBase.
This way the implementation in Search API works and we already committed that change to Search API Solr.

🇩🇪Germany mkalkbrenner 🇩🇪

Thanks for your feedback. We should add a note to the settings form about that.

🇩🇪Germany mkalkbrenner 🇩🇪

I thought, Search API Attachments includes a field type to add to your entities to store the extracted text.

🇩🇪Germany mkalkbrenner 🇩🇪

I verified the code again. The Processor adds a boost factor. But such boost factors only get applied to the query if the first sort criteria is score. It is named Search API Relevance in views.

It works perfectly well.
I checked the debug output in https://www.drupal.org/project/search_api_solr/issues/3450331#comment-15... 💬 Boosting Not Working On Content Type Based and Recent Dates Active gain. It contains no sort parameter and that must be your issue.

🇩🇪Germany mkalkbrenner 🇩🇪

BTW I consider to remove support for this manually set query parser as it causes a lot of issues. Usually edismax has to be set per field in case of multiple values, not for the entire query.

I think I will start by writing warnings to the log.

🇩🇪Germany mkalkbrenner 🇩🇪

$solarium_query->addParam('defType', 'edismax');

Good luck with that one. If it works for you, you could use it. But it has many side effects.

🇩🇪Germany mkalkbrenner 🇩🇪
"fl":"ss_search_api_id,ss_search_api_language,score,hash",
"fq":["+index_id:testing_solr_boost",
        "ss_search_api_language:(\"en\" \"und\")"],
"q":"(tcedgem_X3b_en_body:(+\"eros\")^1 tcedgem_X3b_und_body:(+\"eros\")^1 tcngramm_X3b_en_title:(+\"eros\")^2 tcngramm_X3b_und_title:(+\"eros\")^2)",

The created field is neither on the list of fields to be returned nor part of the query or filters.
I agree that there's a difference i the behaviour compared to the database backend, even if the database backend is "cheating".
Boost are not blindly added to each query. Only if the query has something to do with that field.

We need to debug this step by step and then decide wether to fix something in the module or not.

1. Try to add filter like create > 0 in the view.

2. Enable retrieve results from Solr on the server edit page.

🇩🇪Germany mkalkbrenner 🇩🇪
+++ b/modules/search_api_solr_devel/src/Controller/DevelController.php
@@ -203,7 +203,7 @@ public function entitySolr(RouteMatchInterface $route_match) {
+                      $summary_row['object_size'] = \Drupal\Component\Utility\DeprecationHelper::backwardsCompatibleCall(\Drupal::VERSION, '10.2.0', fn() => \\Drupal\Core\StringTranslation\ByteSizeMarkup::create(strlen(json_encode($fields))), fn() => format_size(strlen(json_encode($fields))));

parse error

🇩🇪Germany mkalkbrenner 🇩🇪

It is way more readable if you format the scoring:

0.221677 <= weight(tcedgem_X3b_en_body:eros in 0) [SchemaSimilarity], result of:
... 0.221677 <= score(freq=2.0), computed as boost * idf * tf from:
... ... 0.280615 <= idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
... ... ... 35.000000 <= n, number of documents containing term
... ... ... 46.000000 <= N, total number of documents with field
... ... 0.789968 <= tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
... ... ... 2.000000 <= freq, occurrences of term within document
... ... ... 1.200000 <= k1, term saturation parameter
... ... ... 0.750000 <= b, length normalization parameter
... ... ... 536.000000 <= dl, length of field (approximate)
... ... ... 2081.565200 <= avgdl, average length of field

 5.033529 <= weight(tcngramm_X3b_en_title:eros in 0) [SchemaSimilarity], result of:
... 5.033529 <= score(freq=1.0), computed as boost * idf * tf from:
... ... 2.000000 <= boost
... ... 3.444683 <= idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
... ... ... 1.000000 <= n, number of documents containing term
... ... ... 46.000000 <= N, total number of documents with field
... ... 0.730623 <= tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:
... ... ... 1.000000 <= freq, occurrences of term within document
... ... ... 1.200000 <= k1, term saturation parameter
... ... ... 0.750000 <= b, length normalization parameter
... ... ... 3.000000 <= dl, length of field
... ... ... 39.304348 <= avgdl, average length of field

I can't see that you queried any date.
Can you post the the entire query sent to Solr?
use search_api_solr_devel to get it.

🇩🇪Germany mkalkbrenner 🇩🇪
+++ b/tests/src/Functional/ViewsTest.php
@@ -29,6 +29,7 @@ class ViewsTest extends SearchApiViewsTest {
+    parent::setUp();

This is wrong.I'll remove it on commit.

🇩🇪Germany mkalkbrenner 🇩🇪

I already spent some tine to migrate our github based tests to gitlab. But it isn't easy as we require different Solr versions to test against.
Any help is welcome!

🇩🇪Germany mkalkbrenner 🇩🇪

I recommend that you read the solr docuementation.
But because security.json has nothing to do with this module directly, I'll close the issue.

🇩🇪Germany mkalkbrenner 🇩🇪

Sorry, but this recommendation is stupid.
The direct query parser uses the stop words, too. That depends on the fields that a queries, not the parser.

The problem is that they still force edismax internally in the connector. I explained multiple times why this is a bad idea, but they're ignoring it.

🇩🇪Germany mkalkbrenner 🇩🇪

Thanks for breaking the Solr tests again ;-)

🇩🇪Germany mkalkbrenner 🇩🇪

I don't think that it is the module's fault. I assume you didn't do your config management right and overwrite updated configs with old ones at some point in time before.

🇩🇪Germany mkalkbrenner 🇩🇪

Your security.json activates basic auth. So you need to configure the Basic Auth connector in Search API Solr, not the Standard connector.

🇩🇪Germany mkalkbrenner 🇩🇪

You need to properly install search_api_solr again and save your views. I assume that the dependencies of the views aren't correct.
Then uninstall the module again.

🇩🇪Germany mkalkbrenner 🇩🇪

I think we should adopt the approach of https://www.drupal.org/project/tmgmt and https://www.drupal.org/project/search_api_clir .

TMGMT offers a centralized component for translations and provides the plugin infrastructure. There're plugins for DeepL, Google Translate, etc ...
Search API CLIR leverages this service to get machine translations at index time.

So there should be a centralized service (module) that can provide vectors for text fields. It should provide a plugin infrastructure to connect to different services (remote or locally).

Search API Solr should then leverage that service to get the vectors.

Having a vectors field on entities doesn't help as we need to build vectors from the search phrase, right?

So it has to be "real time" service with it's own caching, somehow simliar to search_api_clir.

🇩🇪Germany mkalkbrenner 🇩🇪

Thanks for all these investigations. In order to be able to discuss the best approach, I need to dive into dense vector searches by myself first.
I already had a lot of comments in mind when reading your posts, but I want to avoid to reply too quickly.

I suggest to focus on producing the vectors first. How should we do that in Drupal? How to we we leverage a external service?
Maybe we can take https://www.drupal.org/project/search_api_clir as an example. It is able to index machine translations created by external services.

🇩🇪Germany mkalkbrenner 🇩🇪

Stopwords are adjustable. They're managed as drupal configs and get applied if you generate and deploy a configset.
But if acquia doesn't allow to update the configset, I asume that they're using the the default. And in the default, "and" and "of" are declared as stopwords.

AFAIK Acquia still forces the dismax query parser which also leads to different results and a different interpretation of "all words" and "any word".

🇩🇪Germany mkalkbrenner 🇩🇪

It should be possible to run commands on Solr with basic authentication enabled, but I could not find how.

@Yanvis, this is documented:
https://solr.apache.org/guide/8_11/basic-authentication-plugin.html#usin...

You have to set environment variables.

🇩🇪Germany mkalkbrenner 🇩🇪

I just refactored the ddev integration. The permissions to be set in security.json changed in Solr 9:
https://github.com/ddev/ddev-solr/pull/32

🇩🇪Germany mkalkbrenner 🇩🇪

If it happens with both backends, it can't be a Search API Solr issue.

🇩🇪Germany mkalkbrenner 🇩🇪

Newlines or breaks?

And why do you need them? Can you describe the use-case?

🇩🇪Germany mkalkbrenner 🇩🇪

Yes, I did it and the tests passed :-)

Thanks for your help.

🇩🇪Germany mkalkbrenner 🇩🇪

The patch is already committed to github.

🇩🇪Germany mkalkbrenner 🇩🇪

Even if Solr itself doesn't provide any specific support for Welsh, I'm fine to add this config.

🇩🇪Germany mkalkbrenner 🇩🇪

4.2.x is not supported anymore. I assume that the error is already fixed in 4.3.x. What happens if you update the module?

🇩🇪Germany mkalkbrenner 🇩🇪

Nice. SolrMultisiteDocument hasn't been a subclass in the early implementation. But now it is.

🇩🇪Germany mkalkbrenner 🇩🇪

Thanks for the patch. But as explained on the project page, it doesn't get tested on drupal.org CI.
You need to create a PR for https://github.com/mkalkbrenner/search_api_solr to run the entire test suite. And I want to see if this patch breaks anything in the still supported versions.

🇩🇪Germany mkalkbrenner 🇩🇪

If we (optionally) use igbinary to serialize this array, I expect that we'll save 80% of this memory.

🇩🇪Germany mkalkbrenner 🇩🇪

A quick explanation about the old algorithm to fix references via ID:

First run:
Import new and updated entities, but skip path aliases. Store old ID and new ID of newly created entities in a mapping array. Store a list of newly created entities that reference other entities via IDs in another array (NEW).

Second run:
Run updates on all entities stored in array (NEW) and correct the reference IDs according to the mapping array. Still skip the path aliases.

Third run:
Import path aliases. In case of new aliases, adjust the referenced entity IDs according to the mapping array.

I think that this algorithm could be kept. The first batch has to create a second batch of newly created entities that reference other entities via IDs and skip path aliases. For path aliases it has to create the third batch.

🇩🇪Germany mkalkbrenner 🇩🇪
+++ b/src/Importer.php
@@ -291,93 +369,98 @@ class Importer {
-    // All entities with entity references will be imported two times to ensure
-    // that all entity references are present and valid. Path aliases will be
-    // imported last to have a chance to rewrite them to the new ids of newly
-    // created entities.
-    for ($i = 0; $i <= 2; $i++) {

I just can repeat what I commented on the MR.
The current patch removed an essential feature:
All entities with entity references will be imported two times to ensure that all entity references are present and valid. Path aliases will be imported last to have a chance to rewrite them to the new ids of newly created entities.

This strategy is essential to correct references via IDs (not everything works with UUIDs yet).
Especially path aliases are special and break using the proposed patch.
The patch only works for the content in total.
But exporting from A and importing into B breaks as he ID collisions in references aren't corrected anymore.

🇩🇪Germany mkalkbrenner 🇩🇪

We can discuss to move it into search_api_solr_admin. Otherwise I consider it to be too dangerous.

So if you prepare a patch, I'll review it.

🇩🇪Germany mkalkbrenner 🇩🇪

This has been corrected in git already before the release. We have tests, at least some ;-)

🇩🇪Germany mkalkbrenner 🇩🇪

+1 to add smustgrave as co-maintainer. Two BEF issues block the facets module since more than a year and we need to advice people to install patches.

🇩🇪Germany mkalkbrenner 🇩🇪

If I understand correctly, solr.DateRangeField uses the spatial functionality.
So in your example it would return it, since they intersect. unless I'm mistaken?

You could create the query using a Search API query and it will be executed (like before with NULL instead of *) if you use Solr as backend. And you simply get the result Solr returns.

But it is undefined in Search API itself as the database backend doesn't support it.

🇩🇪Germany mkalkbrenner 🇩🇪
+++ b/src/EventSubscriber/ConfigSubscriber.php
@@ -65,7 +65,9 @@ class ConfigSubscriber implements EventSubscriberInterface {
+        $saved_config->isNew())

The check doesn't work.
isNew() never returns true in the event subscriber even if the config is "new".

🇩🇪Germany mkalkbrenner 🇩🇪

date range queries vs. date range fields vs. Search API DateRange Processor

These are all different things even if they all contain data range in their name.

So I understand that [2024-03-04T00:00:00Z TO *] is perfectly valid, which would filter on >= 2024-03-04T00:00:00Z

Correct, but >= * isn't.

I'm confused. From Solr's doc:

Solr’s DateRangeField supports the same point in time date syntax described above (with date math described below) and more to express date ranges.

Yes it supports the same point in time date syntax when writing value to the these fields at index time!
If you index the date range [2003 TO 2005] you could perfectly query it like my_date_range_field:2004.
But if you query it like my_date_range_field:[2000 TO 2004], the result is kind of undefined as 2003 is in that range while 2005 is not.

🇩🇪Germany mkalkbrenner 🇩🇪

You're right, the array needs to be initialized as empty per document.

🇩🇪Germany mkalkbrenner 🇩🇪

After reviewing the code, I'm convinced we don't have an issue here, but a support request. '*' was never supported in our queries, but NULL.

So I converted this issue into a feature request, to additionally support the wildcard character '*' for range queries as input.
And I added some more tests.

Production build 0.69.0 2024