[PP-1] Performance issues with path alias generated queries on PostgreSQL

Created on 25 July 2018, over 6 years ago
Updated 16 May 2024, 7 months ago

Problem/Motivation

Queries generated by AliasStorage::preloadPathAlias() taking huge amount of time to execute. A good example is the /admin/content page which takes 10 seconds and more to load for around 75k nodes.

SQL-Query example below:

2018-07-24 11:02:11 NZST [4923-1] drupal-site@drupal-site LOG:  duration: 10990.918 ms  statement: SELECT url_alias.source AS source, url_alias.alias AS alias, langcode AS langcode, pid AS pid
        FROM 
        url_alias url_alias
        WHERE ((source::text ILIKE '/node/28') OR (source::text ILIKE '/node/29') OR (source::text ILIKE '/node/30') OR (source::text ILIKE '/node/31') OR (source::text ILIKE '/node/24') OR (source::text ILIKE '/node/25') OR (source::text ILIKE '/node/26') OR (source::text ILIKE '/node/27') OR (source::text ILIKE '/node/23') OR (source::text ILIKE '/node/211996') OR (source::text ILIKE '/user/13') OR (source::text ILIKE '/node/87382') OR (source::text ILIKE '/node/134983') OR (source::text ILIKE '/user/25222') OR (source::text ILIKE '/node/216888') OR (source::text ILIKE '/node/149705') OR (source::text ILIKE '/node/216524') OR (source::text ILIKE '/node/209767') OR (source::text ILIKE '/user/24341') OR (source::text ILIKE '/node/216536') OR (source::text ILIKE '/node/135065') OR (source::text ILIKE '/user/53277') OR (source::text ILIKE '/node/173142') OR (source::text ILIKE '/user/30085') OR (source::text ILIKE '/node/188349') OR (source::text ILIKE '/user/69670') OR (source::text ILIKE '/node/147160') OR (source::text ILIKE '/node/147245') OR (source::text ILIKE '/node/174863') OR (source::text ILIKE '/user/59491') OR (source::text ILIKE '/node/216527') OR (source::text ILIKE '/node/174530') OR (source::text ILIKE '/node/178984') OR (source::text ILIKE '/node/154121') OR (source::text ILIKE '/node/172192') OR (source::text ILIKE '/node/216869') OR (source::text ILIKE '/user/93926') OR (source::text ILIKE '/node/216876') OR (source::text ILIKE '/node/216881') OR (source::text ILIKE '/user/93929') OR (source::text ILIKE '/node/215571') OR (source::text ILIKE '/user/58627') OR (source::text ILIKE '/taxonomy/term/50463') OR (source::text ILIKE '/node/202776') OR (source::text ILIKE '/user/47599') OR (source::text ILIKE '/node/210401') OR (source::text ILIKE '/node/215338') OR (source::text ILIKE '/node/87208') OR (source::text ILIKE '/node/190312') OR (source::text ILIKE '/node') OR (source::text ILIKE '/user/1')) AND (langcode IN ('en', 'und'))
        ORDER BY langcode ASC NULLS FIRST, pid ASC NULLS FIRST

Not sure what part of Drupal this issue is actually coming from path alias or PostgreSQL driver.

Proposed resolution

Add an index on the field "path" and add an index on the field "alias". The index should be created with the option "USING GIN" or "USING GIST". The "pg_trgm" extension should be installed and created, before the indexes can be created.

For more info over "GIN" vs "GIST" see: https://www.postgresql.org/docs/10/textsearch-indexes.html
For more info over this solution see: https://ferfebles.github.io/2018/04/16/Improving-large-Drupal-Postgres-p...

Remaining tasks

User interface changes

None

API changes

None

Data model changes

🐛 Bug report
Status

Postponed

Version

11.0 🔥

Component
PostgreSQL driver 

Last updated 6 days ago

No maintainer
Created by

🇳🇿New Zealand RoSk0 Wellington

Live updates comments and jobs are added and updated live.
  • Performance

    It affects performance. It is often combined with the Needs profiling tag.

  • PostgreSQL

    Particularly affects sites running on the PostgreSQL database.

Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • Status changed to RTBC almost 2 years ago
  • Status changed to Needs review almost 2 years ago
  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    Why did we chose GIST over GIN when the docs say that GIN is preferred? I read all the comments from when the GSIT functionality was introduced and I can't see it.

    I wish there was a way to implement this without using magic index names. Magic like this always results in surprises. Can't we add something like another key to the schema that has the same structure as 'indexes' but is 'gists'?

    Regardless of whether we do the magic index naming or add the new key to the schema we need to document this somewhere.

    Actually I have a really good reason to add a new key to schema for this. If you create an index in a module that expects all databases to perform the same and it has _gist it will do different things if we make the change as is in #93. Whereas if we add a special 'gist' key that is only supported by postgres then a module can add a gist that will, as expected, only affect postgres and does not have to have a special postgres adapter module to override it's own storage implementation. It could just have something like:

        // Add GIST indexes to make queries on the columns 'alias' and 'source'
        // faster on PostgreSQL.
        // @see https://www.postgresql.org/docs/12/textsearch-indexes.html
        $schema[$this->storage->getBaseTable()]['gists'] += [
          'path_alias__alias' => ['alias'],
          'path_alias__path' => ['path'],
        ];
        if ($revision_table = $this->storage->getRevisionTable()) {
          $schema[$revision_table]['gists'] += [
            'path_alias_revision__alias' => ['alias'],
            'path_alias_revision__path' => ['path'],
          ];
        }
    
  • 🇳🇱Netherlands daffie

    @alexpott: Moving the GIST/GIN indexes to their own key in the schema API, will result in PostgreSQL to have to override the class Drupal\Core\Entity\Sql\SqlContentEntityStorageSchema. As that class creates the entity tables and indexes. If we do that we will break contrib module on PostgreSQL that have content entities that override the base storage schema class.
    What we can do is create a new service called Drupal\Core\Entity\Sql\SqlContentEntityStorageSchemaService that do everything that the class Drupal\Core\Entity\Sql\SqlContentEntityStorageSchem is now doing. The old class will than become a thin wrapper on top of the service. Database driver modules can than override the service and do what they need to do. Every contrib/custom module that changes content entity storage schema class will keeps working and there will be no BC break.

  • 🇬🇧United Kingdom alexpott 🇪🇺🌍

    I think we might not need to override SqlContentEntityStorageSchema. Looking at addIndex we have the full table spec there - so maybe

        // Add GIST indexes to make queries on the columns 'alias' and 'source'
        // faster on PostgreSQL.
        // @see https://www.postgresql.org/docs/12/textsearch-indexes.html
        $schema[$this->storage->getBaseTable()]['indexes'] += [
          'path_alias__alias' => ['alias'],
          'path_alias__path' => ['path'],
        ];
        $schema[$this->storage->getBaseTable()]['gists'] = array_merge($schema[$this->storage->getBaseTable()]['gists'] ?? [], ['path_alias__alias', 'path_alias__path']);
        if ($revision_table = $this->storage->getRevisionTable()) {
          $schema[$revision_table]['indexes'] += [
            'path_alias_revision__alias' => ['alias'],
            'path_alias_revision__path' => ['path'],
          ];
          $schema[$revision_table]['gists'] = array_merge($schema[$revision_table]['gists'] ?? [], ['path_alias_revision__alias', 'path_alias_revision__path'];
        }
    

    And we can use this info in the addIndex() in postrgres. At least it is explicit and not magical. In order to make this less painful we could implement a helper static method in Postgres's schema to add a gist index. So the code becomes something like:

    $schema[$this->storage->getBaseTable()] = Schema::addGistsToSchema([
          'path_alias__alias' => ['alias'],
          'path_alias__path' => ['path'],
        ], $schema[$this->storage->getBaseTable()]);
    
    // Add then in the \Drupal\pgsql\Driver\Database\pgsql\Schema class...
    
    public static function addGistsToSchema(array $gists, array $table_spec) {
      $table_spec['indexes'] += $gists;
      $table_spec['gists'] = array_merge($table_spec['gists'] ?? [], array_keys($gists));
      return $table_spec;
    }
    

    Yet another way would be to use an object with array access to denote it is a gist index - but this is a way bigger API change. That said at some point table specifications should move away from ArrayPI and this at least gives us a requirement.

  • Status changed to Needs work over 1 year ago
  • The Needs Review Queue Bot tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".

    This does not mean that the patch needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.

    Consult the Drupal Contributor Guide to find step-by-step guides for working with issues.

  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 8.0 & MySQL 5.7
    last update over 1 year ago
    Patch Failed to Apply
  • 🇮🇹Italy kopeboy Milan

    Is this only related to Postgres? I see the same type of (non-performant) path alias queries in Views with MariaDB

  • 🇳🇱Netherlands daffie

    @kopeboy: To test out if you are using an index, you should add "EXPLAIN " in front of the query and run it in something like "phpMyAdmin". For more info: https://mariadb.com/kb/en/explain/

  • 🇮🇹Italy kopeboy Milan

    Not sure.. are you saying I should add an index to taxonomy term tables that is not already in core?!

  • 🇳🇱Netherlands daffie

    @kopeboy Something like:

    EXPLAIN SELECT "base_table"."path" AS "path", "base_table"."alias" AS "alias"
    FROM path_alias base_table
    WHERE (base_table.status = 1) AND (base_table.path LIKE '/taxonomy/term/1423' ESCAPE '\')
    
  • 🇺🇸United States bradjones1 Digital Nomad Life

    Dropping in here from Add "json" as core data type to Schema and Database API Needs work , where we need GIN index creation support for JSON path query-based indexes.

    I hope this helps sway the discussion here regarding whether to use GIN or GiST; the former also allows for JSON data and would enable broader functionality in the DB API.

    Updating the issue title to better reflect that this is an index-creation issue first, and also includes a specific performance improvement in path aliasing.

    For what it's worth, GIN support for JSON is limited to certain operators, mostly when it comes to object/array/scalar "containment" and key-exists type queries. But, if we're going to support these types of indexes in Postgres, it would be great to also leverage GIN for power users/if we end up supporting containment queries at some point in the future.

  • 🇳🇱Netherlands daffie

    As the GIN indexes do to working for path aliases, the solution will be adding a GIST index. As the best solution for JSON fields is a GIN index, we should add them both to Drupal. Each having its own use case.

  • 🇺🇸United States bradjones1 Digital Nomad Life

    Re: #105 yes that's a good plan, however I had renamed this issue with the intent that a similar index-creation logic could be extended to GIN as well. I don't think it makes sense to have a separate issue that addresses pgsql index creation if the approach (special case by name) is going to be the same?

  • Open in Jenkins → Open on Drupal.org →
    Environment: PHP 8.1 & pgsql-14.1
    11:42
    8:35
    Running
  • 🇺🇸United States KoolAidGuy

    I've attached a patch re-rolled for Drupal 10.1.5 as the other patches on this thread weren't applying.

  • 🇸🇰Slovakia poker10

    @KoolAidGuy The patch #107 is missing important parts from the last working patch #93 and it does not seems to be based on the patch #93 either. When doing rerolls please be carefull that the new patch does not loose any code, otherwise such rerolls cause only distraction. Also it is preffered to use merge requests these days (which could be a good idea also here, as it seems that this issue will require some additional changes based on #97). Thanks!

    I am hiding the patch #107.

  • Status changed to Postponed about 1 year ago
  • 🇺🇸United States bradjones1 Digital Nomad Life

    Index creation (for both GIN and GIST, as there is now a concrete use case for each) is broken out into 📌 Adding GIN and GIST indexes to PostgreSQL databases RTBC .

  • 🇺🇸United States bradjones1 Digital Nomad Life

    I have an MR for handling creation of both GIN and GIST indexes over at 📌 Adding GIN and GIST indexes to PostgreSQL databases RTBC - please take a look.

  • 🇺🇸United States bradjones1 Digital Nomad Life

    📌 Adding GIN and GIST indexes to PostgreSQL databases RTBC has test coverage and is in Needs Review if anyone here wants to move that along to unblock this.

  • 🇨🇦Canada mparker17 UTC-4

    I've attempted to re-roll the patch from #93 onto 11.x at commit f560c83a1c (from 2024-05-15).

  • 🇨🇦Canada mparker17 UTC-4

    @bradjones1 would it be helpful to update this patch to use the API in 📌 Adding GIN and GIST indexes to PostgreSQL databases RTBC , given that it is RTBC?

    Anything else I could help with?

  • 🇺🇸United States mradcliffe USA

    @mparker17 thanks for asking. I think the best thing is to start an issue fork and apply the patch you re-rolled into the issue fork branch and start a merge request.

    Merge requests will soon be the only way to test and accept changes to Drupal core and contributed projects on drupal.org

  • 🇺🇸United States bradjones1 Digital Nomad Life

    Re: #13 you could apply the patch from that other issue to core using composer-patches and then use it in this issue, yes.

  • Pipeline finished with Failed
    7 months ago
    Total: 201s
    #174571
  • 🇨🇦Canada mparker17 UTC-4

    > Re: #113 you could apply the patch from that other issue to core using composer-patches and then use it in this issue, yes.

    I merged in the commits, but that idea sounds be a lot easier to review, so I'm going to try that shortly.

  • Pipeline finished with Failed
    7 months ago
    Total: 727s
    #174577
  • Pipeline finished with Success
    7 months ago
    Total: 693s
    #174589
  • 🇨🇦Canada mparker17 UTC-4

    Re: #113 you could apply the patch from that other issue to core using composer-patches and then use it in this issue, yes.

    @bradjones1 Hmm... could you explain how to do this, or point me towards some documentation?

    Aside: in order to change this for !8101, I'd have to drop all the commits in the 📌 Adding GIN and GIST indexes to PostgreSQL databases RTBC branch (or re-create the branch), update composer.json to apply the patch, then re-apply the commits specific to this branch... which is fine, but I don't want to force-push until I know it's going to work, and local testing doesn't seem to work. For simplicity below, the logs will assume I'm re-creating the branch.

    I've set up a dev environment using justafish/ddev-drupal-core-dev...

    $ git clone --branch 11.x https://git.drupalcode.org/project/drupal.git d10-core
    $ cd d10-core
    $ ddev config --project-type=drupal10
    $ ddev start
    $ ddev corepack enable
    $ ddev get justafish/ddev-drupal-core-dev
    $ ddev restart
    $ ddev composer install
    

    ... as I expected, it didn't seem like cweagans/composer-patches was required by anything...

    $ ddev composer depends 'cweagans/composer-patches'
    
    In BaseDependencyCommand.php line 100:
    
      Could not find package "cweagans/composer-patches" in your project  
    
    

    ... so I tried installing cweagans/composer-patches, but that failed because, I guess, core references itself in its own composer.json?

    $ ddev composer require 'cweagans/composer-patches:^1'
    ./composer.json has been updated
    Running composer update cweagans/composer-patches
    > Drupal\Composer\Composer::ensureComposerVersion
    Loading composer repositories with package information
    Updating dependencies
    Your requirements could not be resolved to an installable set of packages.
    
      Problem 1
        - Root composer.json requires drupal/core 1.0.0 (exact version match: 1.0.0 or 1.0.0.0), found drupal/core[dev-main] but it does not match the constraint.
      Problem 2
        - Root composer.json requires drupal/core-project-message 1.0.0 (exact version match: 1.0.0 or 1.0.0.0), found drupal/core-project-message[dev-main] but it does not match the constraint.
      Problem 3
        - Root composer.json requires drupal/core-vendor-hardening 1.0.0 (exact version match: 1.0.0 or 1.0.0.0), found drupal/core-vendor-hardening[dev-main] but it does not match the constraint.
    
    Use the option --with-all-dependencies (-W) to allow upgrades, downgrades and removals for packages currently locked to specific versions.
    
    Installation failed, reverting ./composer.json and ./composer.lock to their original content.
    Composer [require cweagans/composer-patches:^1] failed, composer command failed: exit status 2. stderr=
    
    

    So I tried simply adding the patch to composer.json...

    $ ddev composer config --json extra.patches.'drupal/core' '{"#3397622, !5839 Adding GIN and GIST indexes to PostgreSQL databases": "https://git.drupalcode.org/project/drupal/-/merge_requests/5839.diff"}'
    $ git diff
    diff --git a/composer.json b/composer.json
    index b56c8cec3e..e30ad42bdf 100644
    --- a/composer.json
    +++ b/composer.json
    @@ -96,6 +96,11 @@
                     "               and not intended to be used for production sites.",
                     "               See: https://www.drupal.org/node/3082474"
                 ]
    +        },
    +        "patches": {
    +            "drupal/core": {
    +                "#3397622, !5839 Adding GIN and GIST indexes to PostgreSQL databases": "https://git.drupalcode.org/project/drupal/-/merge_requests/5839.diff"
    +            }
             }
         },
         "autoload": {
    

    ...seems fine so far, so lets apply the patch...

    $ ddev composer update drupal/core
    > Drupal\Composer\Composer::ensureComposerVersion
    Loading composer repositories with package information
    Updating dependencies
    Your requirements could not be resolved to an installable set of packages.
    
      Problem 1
        - Root composer.json requires drupal/core 1.0.0 (exact version match: 1.0.0 or 1.0.0.0), found drupal/core[dev-main] but it does not match the constraint.
      Problem 2
        - Root composer.json requires drupal/core-project-message 1.0.0 (exact version match: 1.0.0 or 1.0.0.0), found drupal/core-project-message[dev-main] but it does not match the constraint.
      Problem 3
        - Root composer.json requires drupal/core-vendor-hardening 1.0.0 (exact version match: 1.0.0 or 1.0.0.0), found drupal/core-vendor-hardening[dev-main] but it does not match the constraint.
    
    Use the option --with-all-dependencies (-W) to allow upgrades, downgrades and removals for packages currently locked to specific versions.
    Composer [update drupal/core] failed, composer command failed: exit status 2. stderr=
    

    ... same as before.

    What about just updating the lock file?

    $ ddev composer update --lock            
    > Drupal\Composer\Composer::ensureComposerVersion
    Loading composer repositories with package information
    Updating dependencies
    Your requirements could not be resolved to an installable set of packages.
    
      Problem 1
        - Root composer.json requires drupal/core == 11.9999999.9999999.9999999-dev (exact version match), it is satisfiable by drupal/core[11.x-dev] from composer repo (https://repo.packagist.org) but drupal/core[dev-main] from path repo (core) has higher repository priority. The packages from the higher priority repository do not match your constraint and are therefore not installable. That repository is canonical so the lower priority repo's packages are not installable. See https://getcomposer.org/repoprio for details and assistance.
      Problem 2
        - Root composer.json requires drupal/core-project-message == 11.9999999.9999999.9999999-dev (exact version match), it is satisfiable by drupal/core-project-message[11.x-dev] from composer repo (https://repo.packagist.org) but drupal/core-project-message[dev-main] from path repo (composer/Plugin/ProjectMessage) has higher repository priority. The packages from the higher priority repository do not match your constraint and are therefore not installable. That repository is canonical so the lower priority repo's packages are not installable. See https://getcomposer.org/repoprio for details and assistance.
      Problem 3
        - Root composer.json requires drupal/core-vendor-hardening == 11.9999999.9999999.9999999-dev (exact version match), it is satisfiable by drupal/core-vendor-hardening[11.x-dev] from composer repo (https://repo.packagist.org) but drupal/core-vendor-hardening[dev-main] from path repo (composer/Plugin/VendorHardening) has higher repository priority. The packages from the higher priority repository do not match your constraint and are therefore not installable. That repository is canonical so the lower priority repo's packages are not installable. See https://getcomposer.org/repoprio for details and assistance.
    
    Composer [update --lock] failed, composer command failed: exit status 2. stderr=
    

    ... apparently not!

    What if I try running GitLab CI tests locally ?

    $ alias drupal-ci-local='gitlab-ci-local --remote-variables git@git.drupal.org:project/gitlab_templates=includes/include.drupalci.variables.yml=main --variable="_GITLAB_TEMPLATES_REPO=project/gitlab_templates"'
    $ drupal-ci-local
    ...
     FAIL  📦️ Composer                                   
      >   Invalid version string "HEAD-dev"  
      >                                      
      > validate [--no-check-all] [--check-lock] [--no-check-lock] [--no-check-publish] [--no-check-version] [-A|--with-dependencies] [--strict] [--] [<file>]
    

    ... so I'm not sure what to do next.

  • 🇨🇦Canada mparker17 UTC-4

    On the plus side, though, it seems like tests are passing in the merge request as it is now.

Production build 0.71.5 2024