- ๐จ๐ฆCanada mparker17 UTC-4
I've been thinking about how to implement this ticket...
- Assumption: the scope of this issue only covers Elasticsearch Index aliases, which we use when changing the mapping of an existing field or deleting an existing field. A separate ticket should handle creating an Elasicsearch Field Alias when renaming a field.
- Currently (before this ticket):
- When you create a Search API Index, it reserves 1 "real" Elasticsearch Index (e.g.: named
prefix_indexname_suffix
) and 0 aliases. - This Elasticsearch Index is always active.
- When you make a Search API query to this Search API Index, it queries the active/"real" Elasticsearch Index.
- If you do something that requires the Elasticsearch Index to be cleared (i.e.: change the mapping of an existing field or deleting an existing field), then the Elasticsearch Index is cleared and you have to reindex its contents. (which is undesirable, and why this ticket exists)
- When you create a Search API Index, it reserves 1 "real" Elasticsearch Index (e.g.: named
- In this ticket, we change the behavior so that:
- When you create a Search API Index, it reserves 2 "real" Elasticsearch Indexes (e.g.: named
prefix_indexname_suffix_blue
andprefix_indexname_suffix_green
), and 1 Elasticsearch Index Alias (e.g.: namedprefix_indexname_suffix
).- Question: What do we use at the end of the real Elasticsearch Index names?
I chose_blue
and_green
in the example above, as a reference to Blueโgreen deployments... but_a
and_b
as in A/B testing could work. I'm open to suggestions. - Question: Do we create both real Elasticsearch Indexes when we first create the Search API Index? This could be confusing without messaging in the UI explaining what we're doing?
Or do we only create one Elasticsearch Index at first, and defer creating the other Elasticsearch Index until we change the mapping of an existing field or remove a field? This risks someone creating another index with a conflicting name before we need to create the second index.
- Question: What do we use at the end of the real Elasticsearch Index names?
- Only 1 of the 2 "real" Elasticsearch Indexes is active at a time.
- When you make a Search API query to this Search API Index, it queries the Elasticsearch Index Alias.
- My understanding is that Elasticsearch Index Aliases act exactly like the Elasticsearch Indexes that they are connected to, so we shouldn't have to change any query code.
- If you do something that requires the Elasticsearch Index to be cleared (i.e.: change the mapping of an existing field or delete an existing field), then:
- The change is made to the non-active Elasticsearch Index (causing the non-active Index to be cleared),
- The active Elasticsearch Index's contents are migrated to the non-active Elasticsearch Index,
- The non-active Elasticsearch Index becomes the active index
- Question: after changing the active index, do we delete the non-active index, empty it, or leave it with (old) data inside it?
- When you create a Search API Index, it reserves 2 "real" Elasticsearch Indexes (e.g.: named
@sokru do you have any thoughts?
Also, should we reach out to the Search API OpenSearch โ maintainers for input?
- ๐ซ๐ฎFinland sokru
3.1.1: What do we use at the end of the real Elasticsearch Index names?
I'd go with blue/green suffix.
3.1.2: Do we create both real Elasticsearch Indexes when we first create the Search API Index?
I'd say it make sense to create them when creating the index. But i think this should be configurable somehow, eg. on Search API index settings
/admin/config/search/search-api/index/INDEX_NAME/edit
one can choose if they want to opt-out from this feature? Opting out could be left as follow-up issue.3.2.1: How/where should we store the currently-active Elasticsearch Index?
I think the state makes more sense, because the alias changes should be be temporary.
3.4.4: After changing the active index, do we delete the non-active index, empty it, or leave it with (old) data inside it?
I'd say we would need a UI for this and let users manually trigger Reindex API to refresh the non-active index.
Mockup how the UI form could look like:
- ๐ซ๐ฎFinland sokru
And surely we could benefit from insights of Opensearch maintainers.
- ๐จ๐ฆCanada mparker17 UTC-4
- ๐จ๐ฆCanada mparker17 UTC-4
I've done some prototyping with PHPStorm's HTTP Client and Elasticsearch 8.10.2. If you have an IntelliJ IDE you can add all the code snippets below to a
.http
file, modify the variables, and try it out for yourself... but I'm going to break it up so I can explain what each section does.A brief note on these listings... the JSON is used in the request bodies. For the sake of brevity, I am not showing the response bodies, but you can test it for yourself if you want to see the results.
The following code sets up some variables we will use throughout the demo... you'll probably want to modify them for your environment...
### Variables @host = https://elasticsearch:9200 @index = zerodowntime
I usually start by running a connection test to see if everything's okay (which is only useful for this demo).
### Connection test GET {{host}}/_cluster/health
Set up an index for the first time
Now, let's pretend that we're creating an index in the Search API settings...
We start by setting up the indexes that will store the data.
### Index setup: Reserve the green index namespace PUT {{host}}/{{index}}_green ### Index setup: Reserve the blue index namespace PUT {{host}}/{{index}}_blue
Let's arbitrarily pick the "green" index to start using (i.e.: the "active" index)...
### Index setup: Close the green index so we can set mappings POST {{host}}/{{index}}_green/_close ### Index setup: Set mappings on the green index PUT {{host}}/{{index}}_green/_mappings Content-Type: application/json { "properties": { "name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 }, "suggest": { "type": "completion" } } }, "author": { "type": "keyword", "ignore_above": 256 }, "release_date": { "type": "date", "format": "strict_date_optional_time||epoch_second" }, "page_count": {"type": "integer"} } }
Next, we create an alias that points to the "green" index...
### Alias setup: Open the green index so we can set an alias POST {{host}}/{{index}}_green/_open ### Alias setup: Create an alias for _green POST {{host}}/_aliases Content-Type: application/json { "actions": [ { "add": { "index": "{{index}}_green", "alias": "{{index}}", "is_write_index": true } } ] }
Normal usage 1
Now, let's use the index normally with the original configuration... I'm assuming "normal" usage is creating documents (i.e.: with Search API's tracker) and searching (i.e.: with a Search API front-end of some kind).
### Usage: Add Data 1 into the active index via its alias (pointing to green) POST {{host}}/{{index}}/_doc Content-Type: application/json {"name": "Ansible for DevOps", "author": "Jeff Geerling", "release_date": "2011-01-01", "page_count": 452} ### Usage: Add Data 2 into the active index via its alias (pointing to green) POST {{host}}/{{index}}/_doc Content-Type: application/json {"name": "The Design of Everyday Things", "author": "Don Norman", "release_date": "2013-01-01", "page_count": 180} ### Usage: Add Data 3 into the active index via its alias (pointing to green) POST {{host}}/{{index}}/_doc Content-Type: application/json {"name": "Drupal 8 Module Development", "author": "Daniel Sipos", "release_date": "2017-01-01", "page_count": 547} ### Test-only usage: Flush data after writing documents POST {{host}}/{{index}}/_flush ### Usage: Query the active index alias (pointing to green) for Data 1: expect 1 result GET {{host}}/{{index}}/_search Content-Type: application/json { "query": { "match": { "name": "DevOps" } } }
Changing field mappings 1
Now, let's say an administrator changes some field settings that would normally require reindexing all the data (in this case, "author" changes from type Keyword to type Text)...
We start by deleting and re-creating the inactive (blue) index, then set the new mappings on it (note that we don't strictly have to delete and re-create the "blue" index before setting mappings in this particular case because we didn't set any mappings on it during the setup... but if we had used it before โ as we do with "green" below in the "Changing field mappings 2" section โ then we would have to delete and re-create it).
### Change settings: Delete the blue index DELETE {{host}}/{{index}}_blue ### Change settings: Create the blue index PUT {{host}}/{{index}}_blue ### Change settings: Close the blue index so we can set mappings POST {{host}}/{{index}}_blue/_close ### Change settings: Set new mappings on the blue index POST {{host}}/{{index}}_blue/_mappings Content-Type: application/json { "properties": { "name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 }, "suggest": { "type": "completion" } } }, "author": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 }, "suggest": { "type": "completion" } } }, "release_date": { "type": "date", "format": "strict_date_optional_time||epoch_second" }, "page_count": {"type": "integer"} } } ### Change settings: Open the blue index for reindexing POST {{host}}/{{index}}_blue/_open
Now we can reindex from the old-active index to the new-active index
### Change settings: Reindex data from green to blue POST {{host}}/_reindex Content-Type: application/json { "source": { "index": "{{index}}_green" }, "dest": { "index": "{{index}}_blue" } }
Now we can update the alias...
### Change settings: Update the (active) index alias to point to the blue index POST {{host}}/_aliases Content-Type: application/json { "actions": [ { "remove": { "index": "{{index}}_green", "alias": "{{index}}" } }, { "add": { "index": "{{index}}_blue", "alias": "{{index}}", "is_write_index": true } } ] } ### Change settings: Close the (now-inactive) green index for usage POST {{host}}/{{index}}_green/_close
Normal usage 2
Now, let's use the index normally with the new configuration...
### Usage: Add Data 4 into the active index via its alias (pointing to blue) POST {{host}}/{{index}}/_doc Content-Type: application/json {"name": "Linux Kernel in a Nutshell", "author": "Greg Kroah-Hartman", "release_date": "2007-01-01", "page_count": 182} ### Test-only usage: Flush data after writing documents POST {{host}}/{{index}}/_flush ### Usage: Query the active index via its alias (pointing to blue) for Data 1: expect 1 result GET {{host}}/{{index}}/_search Content-Type: application/json { "query": { "match": { "name": "DevOps" } } } ### Usage: Query the active index via its alias (pointing to blue) for Data 4: expect 1 result GET {{host}}/{{index}}/_search Content-Type: application/json { "query": { "match": { "author": "kroah" } } }
Changing field mappings 2
Now, let's say an administrator changes some more field settings changes, that โ again โ would require reindexing all the data (in this case, we change "author" from Text back to Keyword)...
We start by deleting and re-creating the inactive (green) index, then set the new mappings on it (note that, this time, we must delete the green index first, otherwise we will get an error).
### Change settings: Delete the (inactive) green index DELETE {{host}}/{{index}}_green ### Change settings: Re-create the green index PUT {{host}}/{{index}}_green
Set the new mappings, and reindex to green again...
### Change settings: Close the green index so we can set mappings POST {{host}}/{{index}}_green/_close ### Change settings: Set new mappings on the green index POST {{host}}/{{index}}_green/_mappings Content-Type: application/json { "properties": { "name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 }, "suggest": { "type": "completion" } } }, "author": { "type": "keyword", "ignore_above": 256 }, "release_date": { "type": "date", "format": "strict_date_optional_time||epoch_second" }, "page_count": {"type": "integer"} } } ### Change settings: Open the green index for reindexing POST {{host}}/{{index}}_green/_open ### Change settings: Reindex data from blue to green POST {{host}}/_reindex Content-Type: application/json { "source": { "index": "{{index}}_blue" }, "dest": { "index": "{{index}}_green" } } ### Change settings: Update the alias to point to the green index POST {{host}}/_aliases Content-Type: application/json { "actions": [ { "remove": { "index": "{{index}}_blue", "alias": "{{index}}" } }, { "add": { "index": "{{index}}_green", "alias": "{{index}}", "is_write_index": true } } ] } ### Change settings: Close the (now-inactive) blue index for usage POST {{host}}/{{index}}_blue/_close
Normal usage 3
Now, let's use the index normally with the new-new configuration...
### Usage: Add Data 5 into the index alias (pointing to green) POST {{host}}/{{index}}/_doc Content-Type: application/json {"name": "Drupal 7 Module Development", "author": "Matt Butcher", "release_date": "2010-01-01", "page_count": 394} ### Test-only usage: Flush data after writing documents POST {{host}}/{{index}}/_flush ### Usage: Query the index alias (pointing to green) for Data 2: expect 1 result GET {{host}}/{{index}}/_search Content-Type: application/json { "query": { "match": { "name": "DevOps" } } } ### Usage: Query the index alias (pointing to green) for Data 4: expect 1 result GET {{host}}/{{index}}/_search Content-Type: application/json { "query": { "match": { "author": "kroah" } } } ### Usage: Query the index alias (pointing to green) for Data 3 and Data 5: expect 2 results GET {{host}}/{{index}}/_search Content-Type: application/json { "query": { "match": { "name": "Drupal" } } }
Deleting the index
If you want to re-run this test, then you'll have to clean up the alias and both indexes afterwards.
Search API indexes also get deleted sometimes; we can use the same procedure when that happens too...
### Teardown: Delete the alias POST {{host}}/_aliases Content-Type: application/json { "actions": [ { "remove": { "index": "{{index}}_green", "alias": "{{index}}" } } ] } ### Teardown: Delete the blue index DELETE {{host}}/{{index}}_blue ### Teardown: Delete the green index DELETE {{host}}/{{index}}_green
- ๐จ๐ฆCanada mparker17 UTC-4
Briefly, I found this worked on Elasticsearch 7 and OpenSearch 2, so I created โจ Support Index Aliases and zero downtime mapping updates Active in the Search API OpenSearch queue