Adding GIS and GIST indexes to PostgreSQL databases

Issue created by @daffie
Comment almost 2 years ago →
🇳🇱Netherlands daffie
For the bikeshedding part see 🐛 [PP-1] Performance issues with path alias generated queries on PostgreSQL Postponed
Comment almost 2 years ago →
🇺🇸United States bradjones1 Digital Nomad Life
Minor typo fix in IS and title.
Comment over 1 year ago →
🇺🇸United States bradjones1 Digital Nomad Life

🇺🇸United States bradjones1 Digital Nomad Life

A bit of rubber-ducking on the possibility of using an object implementing ArrayAccess for a value object that would allow for additional data on the index (beyond just an array of keys.) As proposed by @alexpott.

If I understand this properly, the idea is to be able to specify indexes something like (pseudocode):

$schema[$table]['indexes']['indexName'] = new DbIndexSpec($keys, $postgresIndexType);

If DbIndexSpec implements both ArrayAccess and Traversable, then it could be crafted in a BC-compatible way for any instances where it is foreach()'ed over currently. There is an instance of the Postgres driver doing a typecheck with is_array(), which would not be compatible here, but that code would be changed anyway because this is the whole reason for this conversation.

Setting the is_array() issue aside, the value object would need to be BC-compatible for nested foreach loops such as found in the mysql driver:

  protected function getNormalizedIndexes(array $spec) {
    $indexes = $spec['indexes'] ?? [];
    foreach ($indexes as $index_name => $index_fields) {
      foreach ($index_fields as $index_key => $index_field) {
        // Get the name of the field from the index specification.
        $field_name = is_array($index_field) ? $index_field[0] : $index_field;
        // Check whether the field is defined in the table specification.
        if (isset($spec['fields'][$field_name])) {
          // Get the MySQL type from the processed field.
          $mysql_field = $this->processField($spec['fields'][$field_name]);
          if (in_array($mysql_field['mysql_type'], $this->mysqlStringTypes)) {
            // Check whether we need to shorten the index.
            if ((!isset($mysql_field['type']) || $mysql_field['type'] != 'varchar_ascii') && (!isset($mysql_field['length']) || $mysql_field['length'] > 191)) {
              // Limit the index length to 191 characters.
              $this->shortenIndex($indexes[$index_name][$index_key]);
            }
          }
        }
        else {
          throw new SchemaException("MySQL needs the '$field_name' field specification in order to normalize the '$index_name' index");
        }
      }
    }
    return $indexes;
  }

That said, since there are contrib DB drivers out there, is this just "too much" to try and attempt?

Comment over 1 year ago →
🇺🇸United States bradjones1 Digital Nomad Life
Got some feedback and background in Slack on "ArrayPI" - the name appears to be a Drupalism (and a fun one at that). There are similar discussions around trying to chip away at particularly complex or DX-unfriendly structures, e.g. #3380145-19: ViewsData should not cache by language → . Seems @catch is optimistic about being able to refactor some of these with limited BC impact.

While poking around the related code I also discovered that an index's "key column specifier" can either be an array of column names OR arrays which contain a column name and prefix length. Database drivers can elect to respect the prefix or just use the full column value. (IOW, they must support the syntax but not necessarily create a functional index as a result.) I _think_ this potentially opens us up to using this array syntax to also express driver-specific options, e.g. index type. I don't love it, but it is potentially better than the BC concerns around a value object.

So we could rephrase that docblock to say something like:

* A key column specifier is either a string naming a column or an array of two * elements, column name and length, specifying a prefix of the named column. * Note that some DBMS drivers may opt to ignore the prefix length configuration * and still use the whole field value for the key. Code should therefore not * rely on this functionality. * A key column specifier may also contain a key, 'driver_options', which is an array * keyed by database module name (e.g., 'pgsql') which is an associative array of * keys and values to set driver-specific index options. Drivers supporting additional * options should respect a value of NULL for the prefix length, if it is unnecessary or * incompatible given the driver-specific options.
This shouldn't BC-break any code that accesses the numeric-indexed members.
Comment over 1 year ago →
🇺🇸United States bradjones1 Digital Nomad Life
Merge request !5839Postgres - Support GIN and GIST → (Open) created by bradjones1
Pipeline finished with Failed
over 1 year ago
Total: 201s
#64634
Pipeline finished with Success
over 1 year ago
#64638
Status changed to Needs review over 1 year ago8:24am 16 December 2023
Comment over 1 year ago →
🇺🇸United States bradjones1 Digital Nomad Life
So I _think_ I might have found a way to implement a value object for the index specification that will allow for this kind of configuration to be conveyed. The only issue would be if a non-core database driver does an explicit is_array() check on the index definitions. This would only pass that kind of very strict (and not very useful) type checking if it changed to is_iterable(), which the value object would pass.

Given the talk in Slack and on related issues on this subject, I think that should be OK? And we really, really don't seem to have any other way to include this outside of sticking it in a very awkward new schema definition key like indexes_config or something, which is not very ergonomic and shards this information from the index definitions.
Pipeline finished with Success
over 1 year ago
#64818
Comment over 1 year ago →
🇺🇸United States bradjones1 Digital Nomad Life
Comment over 1 year ago →
🇺🇸United States bradjones1 Digital Nomad Life
I've added a draft CR → to help explain how this works and start the conversation on how to message this, since it's the first time we're doing a value object inside ArrayPI.

I think it could be interesting to encapsulate this into the coding standards re: is_array() is is_iterable() on type checks for ArrayPI members.
Pipeline finished with Success
over 1 year ago
Total: 690s
#64822
Comment over 1 year ago →
🇮🇹Italy mondrake 🇮🇹
A comment on the general approach here if I may.

I think the idea of having value objects to represent DB objects is a great step forward. Doctrine DBAL has done something similar and dropped usage of arrays to describe DB structures.

That said, and given this will set a precedent in Drupal DB API, I would suggest:

1. To have a Schema subfolder where to store both the value object and the enum.
2. With that, the value object name could easily be Index instead of IndexSpecification.
3. Why final? The value object could be extended by drivers exactly to cover DB specific features like here.
4. Testing of DB specifics should be done in the driver extensions of DriverSpecificSchemaTestBase, not in the base class.
Comment over 1 year ago →
🇺🇸United States bradjones1 Digital Nomad Life
Thanks for the vote of confidence and feedback. On points 1-3 that's the kind of feedback I needed because there is no existing pattern to follow so I was just winging it.

As for #4, I do think that we need to have testing of this in the base class, to detect compatibility breaks. Basically, we need to ensure that all drivers work when they don't necessarily support the extra data and successfully just treat it as an array. Perhaps annotating the test to state as such could make it clearer why it's there. The alternative would be having a base test that has just the object and the pgsql version has the pgsql-specific semantics.
Pipeline finished with Success
over 1 year ago
#65073
Comment over 1 year ago →
🇺🇸United States bradjones1 Digital Nomad Life
I realize I missed addressing the question about making the value object final. My justification is that the index definition object itself is driver-agnostic; it's the configuration that is specific to each flavor. So there is no legitimate reason to fork this off as PostgresIndex or whatever because it's not a postgres index - it's an index definition with some postgres metadata attached.
Comment over 1 year ago →
🇮🇹Italy mondrake 🇮🇹
See #3410312-14: Flood database backend ::isAllowed() should call ::ensureTableExists() →
Status changed to Needs work over 1 year ago9:50am 28 December 2023
Comment over 1 year ago →
🇳🇱Netherlands daffie
@bradjones1: The solution looks good to me.

We are missing some CRUD operations with the GIN/GIST indexes. We should also add functionality and testing to:
- to check that the GIN/GIST index exists;
- to add GIN/GIST index is created on an existing field;
- to remove a GIN/GIST index;
- when we change an existing field we can also add/remove a GIN/GIST index.

Add some testing that when we create a new table with a GIN/GIST index that the index exists and is a GIN/GIST index. Please create a query with the explain to check that we have created a GIN/GIST index and that the query is using it.
Pipeline finished with Failed
over 1 year ago
#157889
Pipeline finished with Success
over 1 year ago
Total: 560s
#157892
Status changed to Needs review over 1 year ago11:07pm 26 April 2024
Comment over 1 year ago →
🇺🇸United States bradjones1 Digital Nomad Life
Rebased.

Re: The requests in #16 -

to check that the GIN/GIST index exists

GIN/GIST indexes are cataloged by Postgres similar to other indexes, so checking they exist is already possible. To determine their type, you must dig a little deeper, so there is now an additional column of information about indexes (using pg_get_indexdef()) which we're also using in the new tests.

to add GIN/GIST index is created on an existing field
to remove a GIN/GIST index

Test coverage added.

when we change an existing field we can also add/remove a GIN/GIST index

I'm not sure we need to do this, and if we do, this amounts to a new feature for the driver/DBAL overall and nothing specific to GIN/GIST. There is a lot of logic already in ::changeField() but none of it currently touches indexes. In that respect GIN/GiST aren't special, and I think it's incumbent upon the caller to determine if the existing indexes will be compatible. I suggest addressing this in a separate issue. (GIN/GIST do introduce some interesting new considerations around index type support for changing field types, but I still don't think that means we must build a new feature into ::changeField() to get support in.)

Add some testing that when we create a new table with a GIN/GIST index that the index exists and is a GIN/GIST index.

This is in the test coverage. Note that we are in a cart-and-horse situation with respect to testing GIN - its test is almost identical to GIST, we just need a supported field type to test with. So I put in a skipped test as a placeholder and once we commit this, we can enable that test coverage with/after JSON data types go in.
Pipeline finished with Failed
over 1 year ago
Total: 166s
#157937
Pipeline finished with Canceled
over 1 year ago
Total: 54s
#157942
Pipeline finished with Failed
over 1 year ago
Total: 199s
#157944
Pipeline finished with Success
over 1 year ago
Total: 475s
#157946
Pipeline finished with Success
about 1 year ago
Total: 736s
#164372
Status changed to Needs work about 1 year ago3:06pm 8 May 2024
Comment about 1 year ago →
🇳🇱Netherlands daffie
The testbot is failing on PostgreSQL on your added code.

The CR needs examples on how the use the added GIN and GIST indexes.
Status changed to Needs review about 1 year ago3:57pm 8 May 2024
Comment about 1 year ago →
🇺🇸United States bradjones1 Digital Nomad Life
Ah, how embarassing, I forget you need to manually trigger the PgSQL tests. Fixed with some updated coverage of the introspection method.

Updating the CR now.
Pipeline finished with Canceled
about 1 year ago
Total: 521s
#167594
Status changed to Needs work about 1 year ago4:15pm 8 May 2024
Comment about 1 year ago →
🇺🇸United States bradjones1 Digital Nomad Life
Missed that there are also new MR comments.
Pipeline finished with Success
about 1 year ago
Total: 708s
#167604
Comment about 1 year ago →
🇸🇰Slovakia poker10
Thanks for great work @bradjones1! I have added some other comments to the MR.
Comment about 1 year ago →
🇮🇹Italy mondrake 🇮🇹
Some comments inline.
Pipeline finished with Failed
about 1 year ago
Total: 175s
#167781
Pipeline finished with Failed
about 1 year ago
Total: 176s
#167795
Pipeline finished with Failed
about 1 year ago
Total: 181s
#167806
Pipeline finished with Success
about 1 year ago
Total: 725s
#167810
Status changed to Needs review about 1 year ago8:26pm 8 May 2024
Comment about 1 year ago →
🇺🇸United States bradjones1 Digital Nomad Life
Replied to MR comments and made a few changes as a result. Back to NR.
Pipeline finished with Success
about 1 year ago
Total: 3103s
#168371
Pipeline finished with Success
about 1 year ago
Total: 568s
#168396
Pipeline finished with Success
about 1 year ago
Total: 708s
#169135
Comment about 1 year ago →
🇺🇸United States mradcliffe USA
I think this pretty good to commit. Everything makes sense based on the changes from daffie's review.

I updated the change record.

I think we can update the issue summary with the proposed resolution now. Maybe moving some of the initial resolution into the problem/motivation?
Comment about 1 year ago →
🇺🇸United States bradjones1 Digital Nomad Life
Comment about 1 year ago →
🇺🇸United States bradjones1 Digital Nomad Life
Comment about 1 year ago →
🇺🇸United States bradjones1 Digital Nomad Life
@mradcliffe would you mind RTBC'ing per your review?
Status changed to RTBC about 1 year ago11:32pm 9 May 2024
Comment about 1 year ago →
🇺🇸United States mradcliffe USA
Agreed. We'll see if we need to do anything else :-)
Status changed to Needs work about 1 year ago5:10am 23 May 2024
Comment about 1 year ago →
🇳🇿New Zealand quietone
I read the Issue Summary, the CR and then the MR. I reviewed the comments and the MR. I left comments in the MR and suggestions. At least one of the suggestions is incomplete, that is it still needs work.

The change record reads well but I was surprised to find that new enum is only mentioned at the end. I think that is important information and should be at the start. There is a code example for the 'after' situation. Can one be added for the 'before' case? And I don't follow this sentence, "The only backwards-compatibility break would be if a driver is explicitly checking the index definition with is_array(), which is currently unnecessary.' It is not clear to me when this is unnecessary, is it before this change or after. Maybe it is better to change this to something like this, "Sites that are using a driver that is explicitly checking the index definition with is_array() will need to ....'..
Comment 8 months ago →
🇦🇺Australia larowlan 🇦🇺🏝.au GMT+10
Comment about 1 month ago →
🇦🇺Australia larowlan 🇦🇺🏝.au GMT+10

Adding GIS and GIST indexes to PostgreSQL databases

Problem/Motivation

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

Merge Requests

!5839Adding GIS and GIST indexes to PostgreSQL databases
Open

Comments & Activities

Adding GIS and GIST indexes to PostgreSQL databases

Problem/Motivation

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

Merge Requests

!5839Adding GIS and GIST indexes to PostgreSQL databasesOpen

Comments & Activities

!5839Adding GIS and GIST indexes to PostgreSQL databases
Open