Support Aliases API and zero downtime mapping updates

Created on 10 November 2021, about 3 years ago
Updated 22 March 2024, 10 months ago

Problem/Motivation

- Aliases API is nice feature for production environment. It allows zero downtime mapping updates.
- When changing the index mappings via UI or config import, BackendClient::updateIndex() clears index in order to new settings/mappings to be effective.
- Currently elasticsearch_connector nor search_api_opensearch does not support Aliases, instead it requires manual work with Alias API and indexing.

Proposed resolution

- Use getAlias(), putAlias(), updateAliases(), deleteAlias() on Elastic\Elasticsearch\Endpoints\Indices https://github.com/elastic/elasticsearch-php/blob/main/src/Endpoints/Ind...
- Create form to manage Elasticsearch Aliases OR use aliases transparently with Elastic\Elasticsearch+Search API.

Remaining tasks

- Gather input if this kind of feature should on Drupal.
- MR for this feature.
- Create documentation for production use, how and when to use.

User interface changes

- New form to manage Elasticsearch Aliases(?)

โœจ Feature request
Status

Active

Version

8.0

Component

Code

Created by

๐Ÿ‡ซ๐Ÿ‡ฎFinland sokru

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • ๐Ÿ‡ซ๐Ÿ‡ฎFinland sokru
  • ๐Ÿ‡จ๐Ÿ‡ฆCanada mparker17 UTC-4

    I've been thinking about how to implement this ticket...

    1. Assumption: the scope of this issue only covers Elasticsearch Index aliases, which we use when changing the mapping of an existing field or deleting an existing field. A separate ticket should handle creating an Elasicsearch Field Alias when renaming a field.
    2. Currently (before this ticket):
      1. When you create a Search API Index, it reserves 1 "real" Elasticsearch Index (e.g.: named prefix_indexname_suffix) and 0 aliases.
      2. This Elasticsearch Index is always active.
      3. When you make a Search API query to this Search API Index, it queries the active/"real" Elasticsearch Index.
      4. If you do something that requires the Elasticsearch Index to be cleared (i.e.: change the mapping of an existing field or deleting an existing field), then the Elasticsearch Index is cleared and you have to reindex its contents. (which is undesirable, and why this ticket exists)
    3. In this ticket, we change the behavior so that:
      1. When you create a Search API Index, it reserves 2 "real" Elasticsearch Indexes (e.g.: named prefix_indexname_suffix_blue and prefix_indexname_suffix_green), and 1 Elasticsearch Index Alias (e.g.: named prefix_indexname_suffix).
        1. Question: What do we use at the end of the real Elasticsearch Index names?
          I chose _blue and _green in the example above, as a reference to Blueโ€“green deployments... but _a and _b as in A/B testing could work. I'm open to suggestions.
        2. Question: Do we create both real Elasticsearch Indexes when we first create the Search API Index? This could be confusing without messaging in the UI explaining what we're doing?
          Or do we only create one Elasticsearch Index at first, and defer creating the other Elasticsearch Index until we change the mapping of an existing field or remove a field? This risks someone creating another index with a conflicting name before we need to create the second index.
      2. Only 1 of the 2 "real" Elasticsearch Indexes is active at a time.
        1. Question: how/where should we store the currently-active Elasticsearch Index?
          Config makes sense to me because the Field definitions are also stored in config (but State could be another option).
      3. When you make a Search API query to this Search API Index, it queries the Elasticsearch Index Alias.
        1. My understanding is that Elasticsearch Index Aliases act exactly like the Elasticsearch Indexes that they are connected to, so we shouldn't have to change any query code.
      4. If you do something that requires the Elasticsearch Index to be cleared (i.e.: change the mapping of an existing field or delete an existing field), then:
        1. The change is made to the non-active Elasticsearch Index (causing the non-active Index to be cleared),
        2. The active Elasticsearch Index's contents are migrated to the non-active Elasticsearch Index,
        3. The non-active Elasticsearch Index becomes the active index
        4. Question: after changing the active index, do we delete the non-active index, empty it, or leave it with (old) data inside it?

    @sokru do you have any thoughts?

    Also, should we reach out to the Search API OpenSearch โ†’ maintainers for input?

  • ๐Ÿ‡ซ๐Ÿ‡ฎFinland sokru

    3.1.1: What do we use at the end of the real Elasticsearch Index names?

    I'd go with blue/green suffix.

    3.1.2: Do we create both real Elasticsearch Indexes when we first create the Search API Index?

    I'd say it make sense to create them when creating the index. But i think this should be configurable somehow, eg. on Search API index settings /admin/config/search/search-api/index/INDEX_NAME/edit one can choose if they want to opt-out from this feature? Opting out could be left as follow-up issue.

    3.2.1: How/where should we store the currently-active Elasticsearch Index?

    I think the state makes more sense, because the alias changes should be be temporary.

    3.4.4: After changing the active index, do we delete the non-active index, empty it, or leave it with (old) data inside it?

    I'd say we would need a UI for this and let users manually trigger Reindex API to refresh the non-active index.

    Mockup how the UI form could look like:

  • ๐Ÿ‡ซ๐Ÿ‡ฎFinland sokru

    And surely we could benefit from insights of Opensearch maintainers.

  • ๐Ÿ‡จ๐Ÿ‡ฆCanada mparker17 UTC-4

    Updated the issue summary

  • ๐Ÿ‡จ๐Ÿ‡ฆCanada mparker17 UTC-4

    Updated the issue summary with more details from comments #3 and #4

  • ๐Ÿ‡จ๐Ÿ‡ฆCanada mparker17 UTC-4

    I've done some prototyping with PHPStorm's HTTP Client and Elasticsearch 8.10.2. If you have an IntelliJ IDE you can add all the code snippets below to a .http file, modify the variables, and try it out for yourself... but I'm going to break it up so I can explain what each section does.

    A brief note on these listings... the JSON is used in the request bodies. For the sake of brevity, I am not showing the response bodies, but you can test it for yourself if you want to see the results.

    The following code sets up some variables we will use throughout the demo... you'll probably want to modify them for your environment...

    ### Variables
    @host = https://elasticsearch:9200
    @index = zerodowntime
    

    I usually start by running a connection test to see if everything's okay (which is only useful for this demo).

    ### Connection test
    GET {{host}}/_cluster/health
    

    Set up an index for the first time

    Now, let's pretend that we're creating an index in the Search API settings...

    We start by setting up the indexes that will store the data.

    ### Index setup: Reserve the green index namespace
    PUT {{host}}/{{index}}_green
    
    ### Index setup: Reserve the blue index namespace
    PUT {{host}}/{{index}}_blue
    

    Let's arbitrarily pick the "green" index to start using (i.e.: the "active" index)...

    ### Index setup: Close the green index so we can set mappings
    POST {{host}}/{{index}}_green/_close
    
    ### Index setup: Set mappings on the green index
    PUT {{host}}/{{index}}_green/_mappings
    Content-Type: application/json
    
    {
        "properties": {
            "name": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    },
                    "suggest": {
                        "type": "completion"
                    }
                }
            },
            "author": {
                "type": "keyword",
                "ignore_above": 256
            },
            "release_date": {
                "type": "date",
                "format": "strict_date_optional_time||epoch_second"
            },
            "page_count": {"type": "integer"}
        }
    }
    

    Next, we create an alias that points to the "green" index...

    ### Alias setup: Open the green index so we can set an alias
    POST {{host}}/{{index}}_green/_open
    
    ### Alias setup: Create an alias for _green
    POST {{host}}/_aliases
    Content-Type: application/json
    
    {
        "actions": [
            {
                "add": {
                    "index": "{{index}}_green",
                    "alias": "{{index}}",
                    "is_write_index": true
                }
            }
        ]
    }
    

    Normal usage 1

    Now, let's use the index normally with the original configuration... I'm assuming "normal" usage is creating documents (i.e.: with Search API's tracker) and searching (i.e.: with a Search API front-end of some kind).

    ### Usage: Add Data 1 into the active index via its alias (pointing to green)
    POST {{host}}/{{index}}/_doc
    Content-Type: application/json
    
    {"name": "Ansible for DevOps", "author": "Jeff Geerling", "release_date": "2011-01-01", "page_count": 452}
    
    ### Usage: Add Data 2 into the active index via its alias (pointing to green)
    POST {{host}}/{{index}}/_doc
    Content-Type: application/json
    
    {"name": "The Design of Everyday Things", "author": "Don Norman", "release_date": "2013-01-01", "page_count": 180}
    
    ### Usage: Add Data 3 into the active index via its alias (pointing to green)
    POST {{host}}/{{index}}/_doc
    Content-Type: application/json
    
    {"name": "Drupal 8 Module Development", "author": "Daniel Sipos", "release_date": "2017-01-01", "page_count": 547}
    
    ### Test-only usage: Flush data after writing documents
    POST {{host}}/{{index}}/_flush
    
    ### Usage: Query the active index alias (pointing to green) for Data 1: expect 1 result
    GET {{host}}/{{index}}/_search
    Content-Type: application/json
    
    {
        "query": {
            "match": {
                "name": "DevOps"
            }
        }
    }
    

    Changing field mappings 1

    Now, let's say an administrator changes some field settings that would normally require reindexing all the data (in this case, "author" changes from type Keyword to type Text)...

    We start by deleting and re-creating the inactive (blue) index, then set the new mappings on it (note that we don't strictly have to delete and re-create the "blue" index before setting mappings in this particular case because we didn't set any mappings on it during the setup... but if we had used it before โ€” as we do with "green" below in the "Changing field mappings 2" section โ€” then we would have to delete and re-create it).

    ### Change settings: Delete the blue index
    DELETE {{host}}/{{index}}_blue
    
    ### Change settings: Create the blue index
    PUT {{host}}/{{index}}_blue
    
    ### Change settings: Close the blue index so we can set mappings
    POST {{host}}/{{index}}_blue/_close
    
    ### Change settings: Set new mappings on the blue index
    POST {{host}}/{{index}}_blue/_mappings
    Content-Type: application/json
    
    {
        "properties": {
            "name": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    },
                    "suggest": {
                        "type": "completion"
                    }
                }
            },
            "author": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    },
                    "suggest": {
                        "type": "completion"
                    }
                }
            },
            "release_date": {
                "type": "date",
                "format": "strict_date_optional_time||epoch_second"
            },
            "page_count": {"type": "integer"}
        }
    }
    
    ### Change settings: Open the blue index for reindexing
    POST {{host}}/{{index}}_blue/_open
    

    Now we can reindex from the old-active index to the new-active index

    ### Change settings: Reindex data from green to blue
    POST {{host}}/_reindex
    Content-Type: application/json
    
    {
      "source": {
        "index": "{{index}}_green"
      },
      "dest": {
        "index": "{{index}}_blue"
      }
    }
    

    Now we can update the alias...

    ### Change settings: Update the (active) index alias to point to the blue index
    POST {{host}}/_aliases
    Content-Type: application/json
    
    {
        "actions": [
            {
                "remove": {
                    "index": "{{index}}_green",
                    "alias": "{{index}}"
                }
            },
            {
                "add": {
                    "index": "{{index}}_blue",
                    "alias": "{{index}}",
                    "is_write_index": true
                }
            }
        ]
    }
    
    ### Change settings: Close the (now-inactive) green index for usage
    POST {{host}}/{{index}}_green/_close
    

    Normal usage 2

    Now, let's use the index normally with the new configuration...

    ### Usage: Add Data 4 into the active index via its alias (pointing to blue)
    POST {{host}}/{{index}}/_doc
    Content-Type: application/json
    
    {"name": "Linux Kernel in a Nutshell", "author": "Greg Kroah-Hartman", "release_date": "2007-01-01", "page_count": 182}
    
    ### Test-only usage: Flush data after writing documents
    POST {{host}}/{{index}}/_flush
    
    ### Usage: Query the active index via its alias (pointing to blue) for Data 1: expect 1 result
    GET {{host}}/{{index}}/_search
    Content-Type: application/json
    
    {
        "query": {
            "match": {
                "name": "DevOps"
            }
        }
    }
    
    ### Usage: Query the active index via its alias (pointing to blue) for Data 4: expect 1 result
    GET {{host}}/{{index}}/_search
    Content-Type: application/json
    
    {
        "query": {
            "match": {
                "author": "kroah"
            }
        }
    }
    

    Changing field mappings 2

    Now, let's say an administrator changes some more field settings changes, that โ€” again โ€” would require reindexing all the data (in this case, we change "author" from Text back to Keyword)...

    We start by deleting and re-creating the inactive (green) index, then set the new mappings on it (note that, this time, we must delete the green index first, otherwise we will get an error).

    ### Change settings: Delete the (inactive) green index
    DELETE {{host}}/{{index}}_green
    
    ### Change settings: Re-create the green index
    PUT {{host}}/{{index}}_green
    

    Set the new mappings, and reindex to green again...

    ### Change settings: Close the green index so we can set mappings
    POST {{host}}/{{index}}_green/_close
    
    ### Change settings: Set new mappings on the green index
    POST {{host}}/{{index}}_green/_mappings
    Content-Type: application/json
    
    {
        "properties": {
            "name": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    },
                    "suggest": {
                        "type": "completion"
                    }
                }
            },
            "author": {
                "type": "keyword",
                "ignore_above": 256
            },
            "release_date": {
                "type": "date",
                "format": "strict_date_optional_time||epoch_second"
            },
            "page_count": {"type": "integer"}
        }
    }
    
    ### Change settings: Open the green index for reindexing
    POST {{host}}/{{index}}_green/_open
    
    ### Change settings: Reindex data from blue to green
    POST {{host}}/_reindex
    Content-Type: application/json
    
    {
      "source": {
        "index": "{{index}}_blue"
      },
      "dest": {
        "index": "{{index}}_green"
      }
    }
    
    ### Change settings: Update the alias to point to the green index
    POST {{host}}/_aliases
    Content-Type: application/json
    
    {
        "actions": [
            {
                "remove": {
                    "index": "{{index}}_blue",
                    "alias": "{{index}}"
                }
            },
            {
                "add": {
                    "index": "{{index}}_green",
                    "alias": "{{index}}",
                    "is_write_index": true
                }
            }
        ]
    }
    
    ### Change settings: Close the (now-inactive) blue index for usage
    POST {{host}}/{{index}}_blue/_close
    

    Normal usage 3

    Now, let's use the index normally with the new-new configuration...

    ### Usage: Add Data 5 into the index alias (pointing to green)
    POST {{host}}/{{index}}/_doc
    Content-Type: application/json
    
    {"name": "Drupal 7 Module Development", "author": "Matt Butcher", "release_date": "2010-01-01", "page_count": 394}
    
    ### Test-only usage: Flush data after writing documents
    POST {{host}}/{{index}}/_flush
    
    ### Usage: Query the index alias (pointing to green) for Data 2: expect 1 result
    GET {{host}}/{{index}}/_search
    Content-Type: application/json
    
    {
        "query": {
            "match": {
                "name": "DevOps"
            }
        }
    }
    
    ### Usage: Query the index alias (pointing to green) for Data 4: expect 1 result
    GET {{host}}/{{index}}/_search
    Content-Type: application/json
    
    {
        "query": {
            "match": {
                "author": "kroah"
            }
        }
    }
    
    ### Usage: Query the index alias (pointing to green) for Data 3 and Data 5: expect 2 results
    GET {{host}}/{{index}}/_search
    Content-Type: application/json
    
    {
        "query": {
            "match": {
                "name": "Drupal"
            }
        }
    }
    

    Deleting the index

    If you want to re-run this test, then you'll have to clean up the alias and both indexes afterwards.

    Search API indexes also get deleted sometimes; we can use the same procedure when that happens too...

    ### Teardown: Delete the alias
    POST {{host}}/_aliases
    Content-Type: application/json
    
    {
        "actions": [
            {
                "remove": {
                    "index": "{{index}}_green",
                    "alias": "{{index}}"
                }
            }
        ]
    }
    
    ### Teardown: Delete the blue index
    DELETE {{host}}/{{index}}_blue
    
    ### Teardown: Delete the green index
    DELETE {{host}}/{{index}}_green
    
  • ๐Ÿ‡จ๐Ÿ‡ฆCanada mparker17 UTC-4

    Briefly, I found this worked on Elasticsearch 7 and OpenSearch 2, so I created โœจ Support Index Aliases and zero downtime mapping updates Active in the Search API OpenSearch queue

Production build 0.71.5 2024