Support Index Aliases and zero downtime mapping updates

Created on 31 October 2024, 3 months ago

Problem/Motivation

In πŸ› The whole index gets cleared/deleted when any change in the search index configuration is imported/synced Needs work , there was an effort to only clear the Index when necessary. Early in that issue ( comment #9 πŸ› The whole index gets cleared/deleted when any change in the search index configuration is imported/synced Needs work ), @longwave proposed a method based on blue-green deployments... but a different solution was committed to fix that issue.

I'm a maintainer of the Elasticsearch Connector module β†’ and when we were working on our own version of that problem ( πŸ“Œ The whole index gets cleared when any change in the search index configuration is imported Active ), @longwave's proposal really intrigued @sokru and I, so we created ✨ Support Aliases API and zero downtime mapping updates Active to try out @longwave's idea for Elasticsearch 8.

I'm happy to report that I've come up with a proof-of-concept of this for Elasticsearch 8 in ✨ Support Aliases API and zero downtime mapping updates Active ... and I was able to run my proof-of-concept on OpenSearch 2.17.1! I'll put a copy of the proof-of-concept in a comment below.

I'm filing this issue in the Search API OpenSearch issue queue so both projects can collaborate on the idea!

Proposed resolution

At a high level:

  1. For each Search API Index defined in Drupal's configuration (e.g.: machine name foo), we would need to work with (at least) 2 OpenSearch Indexes (e.g.: foo_blue and foo_green). We'd initially pick one of them to be "active" (e.g.: foo_green) with an alias, create it, and work with it normally. Later, if there was a configuration change that required us to re-index, then we would...
    1. create the other index (e.g.: foo_blue) with the changed configuration
    2. reindex the old index (foo_green) to the new one (foo_blue)
    3. set the new index (foo_blue) as the "active" index with an alias
    4. delete the old index (foo_green)
  2. To signify which is the "active" index, we should create at least 1 Index Alias, that points to the currently-"active" index.

Remaining tasks

  1. Discuss the proposed solution for both projects
  2. Write a patch
  3. Review and feedback
  4. RTBC and feedback
  5. Commit

User interface changes

To be determined.

API changes

To be determined.

Data model changes

To be determined.

✨ Feature request
Status

Active

Version

3.0

Component

Code

Created by

πŸ‡¨πŸ‡¦Canada mparker17 UTC-4

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @mparker17
  • πŸ‡¨πŸ‡¦Canada mparker17 UTC-4

    (copy of comment from #3248665-8: Support Aliases API and zero downtime mapping updates β†’ , but lightly edited to mention the version of OpenSearch I used and use an openseach URL in the variables β€” but note the request syntax is unchanged and therefore might not take full advantage of OpenSearch-specific features)

    I've done some prototyping with PHPStorm's HTTP Client and OpenSearch 2.17.1. If you have an IntelliJ IDE you can add all the code snippets below to a .http file, modify the variables, and try it out for yourself... but I'm going to break it up so I can explain what each section does.

    A brief note on these listings... the JSON is used in the request bodies. For the sake of brevity, I am not showing the response bodies, but you can test it for yourself if you want to see the results.

    The following code sets up some variables we will use throughout the demo... you'll probably want to modify them for your environment...

    ### Variables
    @host = https://opensearch:9200
    @index = zerodowntime
    

    I usually start by running a connection test to see if everything's okay (which is only useful for this demo).

    ### Connection test
    GET {{host}}/_cluster/health
    

    Set up an index for the first time

    Now, let's pretend that we're creating an index in the Search API settings...

    We start by setting up the indexes that will store the data.

    ### Index setup: Reserve the green index namespace
    PUT {{host}}/{{index}}_green
    
    ### Index setup: Reserve the blue index namespace
    PUT {{host}}/{{index}}_blue
    

    Let's arbitrarily pick the "green" index to start using (i.e.: the "active" index)...

    ### Index setup: Close the green index so we can set mappings
    POST {{host}}/{{index}}_green/_close
    
    ### Index setup: Set mappings on the green index
    PUT {{host}}/{{index}}_green/_mappings
    Content-Type: application/json
    
    {
        "properties": {
            "name": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    },
                    "suggest": {
                        "type": "completion"
                    }
                }
            },
            "author": {
                "type": "keyword",
                "ignore_above": 256
            },
            "release_date": {
                "type": "date",
                "format": "strict_date_optional_time||epoch_second"
            },
            "page_count": {"type": "integer"}
        }
    }
    

    Next, we create an alias that points to the "green" index...

    ### Alias setup: Open the green index so we can set an alias
    POST {{host}}/{{index}}_green/_open
    
    ### Alias setup: Create an alias for _green
    POST {{host}}/_aliases
    Content-Type: application/json
    
    {
        "actions": [
            {
                "add": {
                    "index": "{{index}}_green",
                    "alias": "{{index}}",
                    "is_write_index": true
                }
            }
        ]
    }
    

    Normal usage 1

    Now, let's use the index normally with the original configuration... I'm assuming "normal" usage is creating documents (i.e.: with Search API's tracker) and searching (i.e.: with a Search API front-end of some kind).

    ### Usage: Add Data 1 into the active index via its alias (pointing to green)
    POST {{host}}/{{index}}/_doc
    Content-Type: application/json
    
    {"name": "Ansible for DevOps", "author": "Jeff Geerling", "release_date": "2011-01-01", "page_count": 452}
    
    ### Usage: Add Data 2 into the active index via its alias (pointing to green)
    POST {{host}}/{{index}}/_doc
    Content-Type: application/json
    
    {"name": "The Design of Everyday Things", "author": "Don Norman", "release_date": "2013-01-01", "page_count": 180}
    
    ### Usage: Add Data 3 into the active index via its alias (pointing to green)
    POST {{host}}/{{index}}/_doc
    Content-Type: application/json
    
    {"name": "Drupal 8 Module Development", "author": "Daniel Sipos", "release_date": "2017-01-01", "page_count": 547}
    
    ### Test-only usage: Flush data after writing documents
    POST {{host}}/{{index}}/_flush
    
    ### Usage: Query the active index alias (pointing to green) for Data 1: expect 1 result
    GET {{host}}/{{index}}/_search
    Content-Type: application/json
    
    {
        "query": {
            "match": {
                "name": "DevOps"
            }
        }
    }
    

    Changing field mappings 1

    Now, let's say an administrator changes some field settings that would normally require reindexing all the data (in this case, "author" changes from type Keyword to type Text)...

    We start by deleting and re-creating the inactive (blue) index, then set the new mappings on it (note that we don't strictly have to delete and re-create the "blue" index before setting mappings in this particular case because we didn't set any mappings on it during the setup... but if we had used it before β€” as we do with "green" below in the "Changing field mappings 2" section β€” then we would have to delete and re-create it).

    ### Change settings: Delete the blue index
    DELETE {{host}}/{{index}}_blue
    
    ### Change settings: Create the blue index
    PUT {{host}}/{{index}}_blue
    
    ### Change settings: Close the blue index so we can set mappings
    POST {{host}}/{{index}}_blue/_close
    
    ### Change settings: Set new mappings on the blue index
    POST {{host}}/{{index}}_blue/_mappings
    Content-Type: application/json
    
    {
        "properties": {
            "name": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    },
                    "suggest": {
                        "type": "completion"
                    }
                }
            },
            "author": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    },
                    "suggest": {
                        "type": "completion"
                    }
                }
            },
            "release_date": {
                "type": "date",
                "format": "strict_date_optional_time||epoch_second"
            },
            "page_count": {"type": "integer"}
        }
    }
    
    ### Change settings: Open the blue index for reindexing
    POST {{host}}/{{index}}_blue/_open
    

    Now we can reindex from the old-active index to the new-active index

    ### Change settings: Reindex data from green to blue
    POST {{host}}/_reindex
    Content-Type: application/json
    
    {
      "source": {
        "index": "{{index}}_green"
      },
      "dest": {
        "index": "{{index}}_blue"
      }
    }
    

    Now we can update the alias...

    ### Change settings: Update the (active) index alias to point to the blue index
    POST {{host}}/_aliases
    Content-Type: application/json
    
    {
        "actions": [
            {
                "remove": {
                    "index": "{{index}}_green",
                    "alias": "{{index}}"
                }
            },
            {
                "add": {
                    "index": "{{index}}_blue",
                    "alias": "{{index}}",
                    "is_write_index": true
                }
            }
        ]
    }
    
    ### Change settings: Close the (now-inactive) green index for usage
    POST {{host}}/{{index}}_green/_close
    

    Normal usage 2

    Now, let's use the index normally with the new configuration...

    ### Usage: Add Data 4 into the active index via its alias (pointing to blue)
    POST {{host}}/{{index}}/_doc
    Content-Type: application/json
    
    {"name": "Linux Kernel in a Nutshell", "author": "Greg Kroah-Hartman", "release_date": "2007-01-01", "page_count": 182}
    
    ### Test-only usage: Flush data after writing documents
    POST {{host}}/{{index}}/_flush
    
    ### Usage: Query the active index via its alias (pointing to blue) for Data 1: expect 1 result
    GET {{host}}/{{index}}/_search
    Content-Type: application/json
    
    {
        "query": {
            "match": {
                "name": "DevOps"
            }
        }
    }
    
    ### Usage: Query the active index via its alias (pointing to blue) for Data 4: expect 1 result
    GET {{host}}/{{index}}/_search
    Content-Type: application/json
    
    {
        "query": {
            "match": {
                "author": "kroah"
            }
        }
    }
    

    Changing field mappings 2

    Now, let's say an administrator changes some more field settings changes, that β€” again β€” would require reindexing all the data (in this case, we change "author" from Text back to Keyword)...

    We start by deleting and re-creating the inactive (green) index, then set the new mappings on it (note that, this time, we must delete the green index first, otherwise we will get an error).

    ### Change settings: Delete the (inactive) green index
    DELETE {{host}}/{{index}}_green
    
    ### Change settings: Re-create the green index
    PUT {{host}}/{{index}}_green
    

    Set the new mappings, and reindex to green again...

    ### Change settings: Close the green index so we can set mappings
    POST {{host}}/{{index}}_green/_close
    
    ### Change settings: Set new mappings on the green index
    POST {{host}}/{{index}}_green/_mappings
    Content-Type: application/json
    
    {
        "properties": {
            "name": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    },
                    "suggest": {
                        "type": "completion"
                    }
                }
            },
            "author": {
                "type": "keyword",
                "ignore_above": 256
            },
            "release_date": {
                "type": "date",
                "format": "strict_date_optional_time||epoch_second"
            },
            "page_count": {"type": "integer"}
        }
    }
    
    ### Change settings: Open the green index for reindexing
    POST {{host}}/{{index}}_green/_open
    
    ### Change settings: Reindex data from blue to green
    POST {{host}}/_reindex
    Content-Type: application/json
    
    {
      "source": {
        "index": "{{index}}_blue"
      },
      "dest": {
        "index": "{{index}}_green"
      }
    }
    
    ### Change settings: Update the alias to point to the green index
    POST {{host}}/_aliases
    Content-Type: application/json
    
    {
        "actions": [
            {
                "remove": {
                    "index": "{{index}}_blue",
                    "alias": "{{index}}"
                }
            },
            {
                "add": {
                    "index": "{{index}}_green",
                    "alias": "{{index}}",
                    "is_write_index": true
                }
            }
        ]
    }
    
    ### Change settings: Close the (now-inactive) blue index for usage
    POST {{host}}/{{index}}_blue/_close
    

    Normal usage 3

    Now, let's use the index normally with the new-new configuration...

    ### Usage: Add Data 5 into the index alias (pointing to green)
    POST {{host}}/{{index}}/_doc
    Content-Type: application/json
    
    {"name": "Drupal 7 Module Development", "author": "Matt Butcher", "release_date": "2010-01-01", "page_count": 394}
    
    ### Test-only usage: Flush data after writing documents
    POST {{host}}/{{index}}/_flush
    
    ### Usage: Query the index alias (pointing to green) for Data 2: expect 1 result
    GET {{host}}/{{index}}/_search
    Content-Type: application/json
    
    {
        "query": {
            "match": {
                "name": "DevOps"
            }
        }
    }
    
    ### Usage: Query the index alias (pointing to green) for Data 4: expect 1 result
    GET {{host}}/{{index}}/_search
    Content-Type: application/json
    
    {
        "query": {
            "match": {
                "author": "kroah"
            }
        }
    }
    
    ### Usage: Query the index alias (pointing to green) for Data 3 and Data 5: expect 2 results
    GET {{host}}/{{index}}/_search
    Content-Type: application/json
    
    {
        "query": {
            "match": {
                "name": "Drupal"
            }
        }
    }
    

    Deleting the index

    If you want to re-run this test, then you'll have to clean up the alias and both indexes afterwards.

    Search API indexes also get deleted sometimes; we can use the same procedure when that happens too...

    ### Teardown: Delete the alias
    POST {{host}}/_aliases
    Content-Type: application/json
    
    {
        "actions": [
            {
                "remove": {
                    "index": "{{index}}_green",
                    "alias": "{{index}}"
                }
            }
        ]
    }
    
    ### Teardown: Delete the blue index
    DELETE {{host}}/{{index}}_blue
    
    ### Teardown: Delete the green index
    DELETE {{host}}/{{index}}_green
    
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    Looks great. I assume the re-index from one to another is relatively fast?

  • πŸ‡¨πŸ‡¦Canada mparker17 UTC-4

    @kim.pepper: to be honest, I haven't tested it with large data sets, so I don't know for sure.

    Both operations in my proof-of-concept took only a few milliseconds, but they're also only working with 3-4 pieces of very very simple data.

    That being said, I would assume that the OpenSearch _reindex operation would be faster than what we have to do now, which is to: (a) clear the index, and (b) walk through all the content in Drupal and re-post it into the now-empty index in OpenSearch. I would expect it to be faster because...

    1. the _reindex operation only involves one system (OpenSearch, vs. what we have to do now with MySQL+PHP+OpenSearch); and;
    2. the data doesn't have to be transformed during the _reindex operation (OpenSearch internal format -> OpenSearch internal format; vs. what we have to do now with RDBMS internal -> SQL result -> (network) -> PHP memory -> JSON -> (network) -> OpenSearch internal)
Production build 0.71.5 2024