- Issue created by @mparker17
- π¨π¦Canada mparker17 UTC-4
(copy of comment from #3248665-8: Support Aliases API and zero downtime mapping updates β , but lightly edited to mention the version of OpenSearch I used and use an openseach URL in the variables β but note the request syntax is unchanged and therefore might not take full advantage of OpenSearch-specific features)
I've done some prototyping with PHPStorm's HTTP Client and OpenSearch 2.17.1. If you have an IntelliJ IDE you can add all the code snippets below to a
.http
file, modify the variables, and try it out for yourself... but I'm going to break it up so I can explain what each section does.A brief note on these listings... the JSON is used in the request bodies. For the sake of brevity, I am not showing the response bodies, but you can test it for yourself if you want to see the results.
The following code sets up some variables we will use throughout the demo... you'll probably want to modify them for your environment...
### Variables @host = https://opensearch:9200 @index = zerodowntime
I usually start by running a connection test to see if everything's okay (which is only useful for this demo).
### Connection test GET {{host}}/_cluster/health
Set up an index for the first time
Now, let's pretend that we're creating an index in the Search API settings...
We start by setting up the indexes that will store the data.
### Index setup: Reserve the green index namespace PUT {{host}}/{{index}}_green ### Index setup: Reserve the blue index namespace PUT {{host}}/{{index}}_blue
Let's arbitrarily pick the "green" index to start using (i.e.: the "active" index)...
### Index setup: Close the green index so we can set mappings POST {{host}}/{{index}}_green/_close ### Index setup: Set mappings on the green index PUT {{host}}/{{index}}_green/_mappings Content-Type: application/json { "properties": { "name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 }, "suggest": { "type": "completion" } } }, "author": { "type": "keyword", "ignore_above": 256 }, "release_date": { "type": "date", "format": "strict_date_optional_time||epoch_second" }, "page_count": {"type": "integer"} } }
Next, we create an alias that points to the "green" index...
### Alias setup: Open the green index so we can set an alias POST {{host}}/{{index}}_green/_open ### Alias setup: Create an alias for _green POST {{host}}/_aliases Content-Type: application/json { "actions": [ { "add": { "index": "{{index}}_green", "alias": "{{index}}", "is_write_index": true } } ] }
Normal usage 1
Now, let's use the index normally with the original configuration... I'm assuming "normal" usage is creating documents (i.e.: with Search API's tracker) and searching (i.e.: with a Search API front-end of some kind).
### Usage: Add Data 1 into the active index via its alias (pointing to green) POST {{host}}/{{index}}/_doc Content-Type: application/json {"name": "Ansible for DevOps", "author": "Jeff Geerling", "release_date": "2011-01-01", "page_count": 452} ### Usage: Add Data 2 into the active index via its alias (pointing to green) POST {{host}}/{{index}}/_doc Content-Type: application/json {"name": "The Design of Everyday Things", "author": "Don Norman", "release_date": "2013-01-01", "page_count": 180} ### Usage: Add Data 3 into the active index via its alias (pointing to green) POST {{host}}/{{index}}/_doc Content-Type: application/json {"name": "Drupal 8 Module Development", "author": "Daniel Sipos", "release_date": "2017-01-01", "page_count": 547} ### Test-only usage: Flush data after writing documents POST {{host}}/{{index}}/_flush ### Usage: Query the active index alias (pointing to green) for Data 1: expect 1 result GET {{host}}/{{index}}/_search Content-Type: application/json { "query": { "match": { "name": "DevOps" } } }
Changing field mappings 1
Now, let's say an administrator changes some field settings that would normally require reindexing all the data (in this case, "author" changes from type Keyword to type Text)...
We start by deleting and re-creating the inactive (blue) index, then set the new mappings on it (note that we don't strictly have to delete and re-create the "blue" index before setting mappings in this particular case because we didn't set any mappings on it during the setup... but if we had used it before β as we do with "green" below in the "Changing field mappings 2" section β then we would have to delete and re-create it).
### Change settings: Delete the blue index DELETE {{host}}/{{index}}_blue ### Change settings: Create the blue index PUT {{host}}/{{index}}_blue ### Change settings: Close the blue index so we can set mappings POST {{host}}/{{index}}_blue/_close ### Change settings: Set new mappings on the blue index POST {{host}}/{{index}}_blue/_mappings Content-Type: application/json { "properties": { "name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 }, "suggest": { "type": "completion" } } }, "author": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 }, "suggest": { "type": "completion" } } }, "release_date": { "type": "date", "format": "strict_date_optional_time||epoch_second" }, "page_count": {"type": "integer"} } } ### Change settings: Open the blue index for reindexing POST {{host}}/{{index}}_blue/_open
Now we can reindex from the old-active index to the new-active index
### Change settings: Reindex data from green to blue POST {{host}}/_reindex Content-Type: application/json { "source": { "index": "{{index}}_green" }, "dest": { "index": "{{index}}_blue" } }
Now we can update the alias...
### Change settings: Update the (active) index alias to point to the blue index POST {{host}}/_aliases Content-Type: application/json { "actions": [ { "remove": { "index": "{{index}}_green", "alias": "{{index}}" } }, { "add": { "index": "{{index}}_blue", "alias": "{{index}}", "is_write_index": true } } ] } ### Change settings: Close the (now-inactive) green index for usage POST {{host}}/{{index}}_green/_close
Normal usage 2
Now, let's use the index normally with the new configuration...
### Usage: Add Data 4 into the active index via its alias (pointing to blue) POST {{host}}/{{index}}/_doc Content-Type: application/json {"name": "Linux Kernel in a Nutshell", "author": "Greg Kroah-Hartman", "release_date": "2007-01-01", "page_count": 182} ### Test-only usage: Flush data after writing documents POST {{host}}/{{index}}/_flush ### Usage: Query the active index via its alias (pointing to blue) for Data 1: expect 1 result GET {{host}}/{{index}}/_search Content-Type: application/json { "query": { "match": { "name": "DevOps" } } } ### Usage: Query the active index via its alias (pointing to blue) for Data 4: expect 1 result GET {{host}}/{{index}}/_search Content-Type: application/json { "query": { "match": { "author": "kroah" } } }
Changing field mappings 2
Now, let's say an administrator changes some more field settings changes, that β again β would require reindexing all the data (in this case, we change "author" from Text back to Keyword)...
We start by deleting and re-creating the inactive (green) index, then set the new mappings on it (note that, this time, we must delete the green index first, otherwise we will get an error).
### Change settings: Delete the (inactive) green index DELETE {{host}}/{{index}}_green ### Change settings: Re-create the green index PUT {{host}}/{{index}}_green
Set the new mappings, and reindex to green again...
### Change settings: Close the green index so we can set mappings POST {{host}}/{{index}}_green/_close ### Change settings: Set new mappings on the green index POST {{host}}/{{index}}_green/_mappings Content-Type: application/json { "properties": { "name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 }, "suggest": { "type": "completion" } } }, "author": { "type": "keyword", "ignore_above": 256 }, "release_date": { "type": "date", "format": "strict_date_optional_time||epoch_second" }, "page_count": {"type": "integer"} } } ### Change settings: Open the green index for reindexing POST {{host}}/{{index}}_green/_open ### Change settings: Reindex data from blue to green POST {{host}}/_reindex Content-Type: application/json { "source": { "index": "{{index}}_blue" }, "dest": { "index": "{{index}}_green" } } ### Change settings: Update the alias to point to the green index POST {{host}}/_aliases Content-Type: application/json { "actions": [ { "remove": { "index": "{{index}}_blue", "alias": "{{index}}" } }, { "add": { "index": "{{index}}_green", "alias": "{{index}}", "is_write_index": true } } ] } ### Change settings: Close the (now-inactive) blue index for usage POST {{host}}/{{index}}_blue/_close
Normal usage 3
Now, let's use the index normally with the new-new configuration...
### Usage: Add Data 5 into the index alias (pointing to green) POST {{host}}/{{index}}/_doc Content-Type: application/json {"name": "Drupal 7 Module Development", "author": "Matt Butcher", "release_date": "2010-01-01", "page_count": 394} ### Test-only usage: Flush data after writing documents POST {{host}}/{{index}}/_flush ### Usage: Query the index alias (pointing to green) for Data 2: expect 1 result GET {{host}}/{{index}}/_search Content-Type: application/json { "query": { "match": { "name": "DevOps" } } } ### Usage: Query the index alias (pointing to green) for Data 4: expect 1 result GET {{host}}/{{index}}/_search Content-Type: application/json { "query": { "match": { "author": "kroah" } } } ### Usage: Query the index alias (pointing to green) for Data 3 and Data 5: expect 2 results GET {{host}}/{{index}}/_search Content-Type: application/json { "query": { "match": { "name": "Drupal" } } }
Deleting the index
If you want to re-run this test, then you'll have to clean up the alias and both indexes afterwards.
Search API indexes also get deleted sometimes; we can use the same procedure when that happens too...
### Teardown: Delete the alias POST {{host}}/_aliases Content-Type: application/json { "actions": [ { "remove": { "index": "{{index}}_green", "alias": "{{index}}" } } ] } ### Teardown: Delete the blue index DELETE {{host}}/{{index}}_blue ### Teardown: Delete the green index DELETE {{host}}/{{index}}_green
- π¦πΊAustralia kim.pepper πββοΈπ¦πΊSydney, Australia
Looks great. I assume the re-index from one to another is relatively fast?
- π¨π¦Canada mparker17 UTC-4
@kim.pepper: to be honest, I haven't tested it with large data sets, so I don't know for sure.
Both operations in my proof-of-concept took only a few milliseconds, but they're also only working with 3-4 pieces of very very simple data.
That being said, I would assume that the OpenSearch _reindex operation would be faster than what we have to do now, which is to: (a) clear the index, and (b) walk through all the content in Drupal and re-post it into the now-empty index in OpenSearch. I would expect it to be faster because...
- the _reindex operation only involves one system (OpenSearch, vs. what we have to do now with MySQL+PHP+OpenSearch); and;
- the data doesn't have to be transformed during the _reindex operation (OpenSearch internal format -> OpenSearch internal format; vs. what we have to do now with RDBMS internal -> SQL result -> (network) -> PHP memory -> JSON -> (network) -> OpenSearch internal)