- Issue created by @sokru
- 🇫🇮Finland sokru
Lowering the priority since the feature is same as in 8.x-7.x.
- 🇨🇦Canada mparker17 UTC-4
According to the ElasticSearch 8.12 Guide, in the REST APIs section, under Index API -> Update mapping API, it sounds like:
- You can add new fields to an index anytime
- You can add new properties to an existing field anytime
- You can add multi-fields (i.e.: index the same field in different ways) anytime
- You can change mapping parameters for an existing field anytime
- You have to reindex to change the mapping of an existing field
- To rename a field, the suggested way to do so is to create an alias field, which you can do anytime
- (the docs don't describe how to delete/remove a field, but checking with our client's search team lead, that also requires reindexing)
To quote from that page in the section about chang[ing] the mapping of an existing field, "If you need to change the mapping of a field in other indices, create a new index with the correct mapping and reindex your data into that index." (similar to what's described for OpenSearch in #3285438-15: The whole index gets cleared/deleted when any change in the search index configuration is imported/synced → ).
So it looks like we're in a similar situation to the Search API OpenSearch maintainers.
- 🇨🇦Canada mparker17 UTC-4
Thinking about how to implement Phase 2 (i.e.: rebuild the index)...
In #3285438-9: The whole index gets cleared/deleted when any change in the search index configuration is imported/synced → , @longwave suggests using a blue/green deployment method. That is to say, for each Search API Index defined in Drupal's configuration (e.g.: machine name
foo
), we would need to work with (at least) 2 ElasticSearch Indexes (e.g.:foo_blue
andfoo_green
). We'd initially pick one of them to be "active" (e.g.:foo_green
), create it, and work with it normally. Later, if there was a configuration change that required us to re-index, then we would...- create the other index (e.g.:
foo_blue
) with the changed configuration - reindex the old index (
foo_green
) to the new one (foo_blue
) - set the new index (
foo_blue
) as the "active" index - delete the old index (
foo_green
)
To signify which is the "active" index, we should create at least 1 Index Alias, that points to the currently-"active" index.
Aside: ElasticSearch allows you to create readable Index Aliases that point to 1..* Indexes; but only allows you to create writeable Index Aliases that point to 1 Index. The simplest approach for elasticsearch_connector might be to create 1 Index Alias (e.g.:
foo
, named after the Search API Index) that points to the active Index; but if that doesn't work, we might have to create 2 (foo_read
, andfoo_write
).While I think a blue/green method is a reasonable approach, I can see it being a source of confusion: admins/DevOps might (reasonably) ask questions like...
- "When I created 1 Search API Index, I expected 1 ElasticSearch Index, so why did I get 2-3 things: 1-2 ElasticSearch Index Aliases, and an ElasticSearch Index; and why don?"
- "Why the sudden spike in CPU / disk usage when I change a text field to an integer in the index settings?"
- "Why were there 2 indexes for a short time?" (especially if the reindex operation resulted in a spike in the disk/memory/CPU monitoring logs),
- "Why did the old index that I was referring to by name get deleted?"
... so if we take this approach, then we need to be pretty clear about what to expect, and how to interact with it (e.g.: if you need to read/write to ElasticSeach directly, read/write to the Index Alias, not the Index directly) both in the Search API Index UI, in the module's README, and/or in other documentation.
I'd be interested in hearing feedback from other maintainers about whether they think this is the right approach, and whether we should create 1 or 2 aliases.
- create the other index (e.g.:
- 🇫🇮Finland sokru
I agree with Phase 2 approach, only small detail is to cover the need to also set the index as read-only during the step two "reindex the old index (foo_green) to the new one (foo_blue)".
I think we should use this issue to cover only phase 1 and use ✨ Support Aliases API and zero downtime mapping updates Active for phase 2.
- 🇨🇦Canada mparker17 UTC-4
Sounds great! Thank you, @sokru!
I have started writing tests, but they're still in the early stages of implementation. I will push them to this issue soon, then start on implementing phase 1.
- Merge request !48Issue #3429647: The whole index gets cleared when any change in the search index configuration is imported → (Open) created by mparker17
- Open on Drupal.org →Core: 10.2.1 + Environment: PHP 8.1 & MySQL 5.7last update
8 months ago Waiting for branch to pass - Open on Drupal.org →Core: 10.2.1 + Environment: PHP 8.1 & MySQL 5.7last update
8 months ago Waiting for branch to pass - Status changed to Needs work
8 months ago 1:57am 27 March 2024 - 🇨🇦Canada mparker17 UTC-4
Still working on this; but I've made some good progress.
- Status changed to Needs review
3 months ago 1:12am 24 August 2024 - 🇨🇦Canada mparker17 UTC-4
When I copy the changes to
.gitlab-ci.yml
from ✨ Support for Search API Spellcheck Active — i.e.: by disabling_PHPUNIT_CONCURRENT
and increasing the memory resources for Elasticsearch — the tests pass.Reviewing our goals from the issue summary and comments, I think this is ready for review now.
I will update the issue summary to better reflect the new scope of this ticket in relation to ✨ Support Aliases API and zero downtime mapping updates Active