Improve database performance when computing ModerationStateFieldItemList

Created on 14 May 2024, 6 months ago
Updated 21 May 2024, 6 months ago

Problem/Motivation

\Drupal\content_moderation\Plugin\Field\ModerationStateFieldItemList provides a computed field for content entities with the current moderation state of a content entity. When a piece of code calls $moderatedRevisionableTranslatableEntity->get('moderation_state'), Drupal performs a database query similar to the following (note the following snippet uses PostgreSQL syntax, but this issue applies to all database engines)...

SELECT
  "base_table"."revision_id" AS "revision_id",
  "base_table"."id" AS "id"
FROM
  "content_moderation_state_revision" "base_table"
    INNER JOIN "content_moderation_state_field_revision" "content_moderation_state_field_revision" ON "content_moderation_state_field_revision"."revision_id" = "base_table"."revision_id"
WHERE
  ("content_moderation_state_field_revision"."content_entity_type_id"::text ILIKE :entity_type)
  AND ("content_moderation_state_field_revision"."content_entity_id" = :entity_id)
  AND ("content_moderation_state_field_revision"."content_entity_revision_id" = :revision_id)
  AND ("content_moderation_state_field_revision"."workflow" = :workflow_name)
  AND ("content_moderation_state_field_revision"."langcode" = :current_langcode)
ORDER BY
  "base_table"."revision_id" DESC NULLS LAST;

If you ask MySQL, PostgreSQL, and/or SQLite to EXPLAIN how they would run the query, they all report that they need to perform what PostgreSQL refers to as a "Seq[uential] Scan" (SQLite calls it a "SCAN"; MySQL/MariaDB calls it a "data row scan") on the content_moderation_state_field_revision table: essentially, the query engine has to loop through every row in the table at query-run-time. A sequential scan is necessary because the schema for content_moderation_state_field_revision doesn't define any indexes on the content_entity_revision_id column by itself (even though it does define other indexes on other combinations of columns).

Anyway, the performance impact of performing a sequential scan on the table rows is magnified when there is a lot of content on the site: for example, a client site running PostgreSQL with ~90,000 nodes and ~5 revisions per node has nearly ~450,000 rows in content_moderation_state_field_revision, taking nearly 500ms (half a second) on an optimized, load-balanced environment, and nearly 11 seconds on local environments.

Proposed resolution

Add an index to the content_moderation_state_field_revision table, on the content_entity_revision_id column.

For the aforementioned client site with ~450,000 rows in content_moderation_state_field_revision, adding an index reduced the time needed to complete the query on an unoptimized local environment running PostgreSQL from around ~11 seconds to around ~17 milliseconds, a performance improvement of around ~647%.

Remaining tasks

  1. Review and feedback
  2. RTBC and feedback
  3. Commit

User interface changes

None.

API changes

None.

Data model changes

Add an index named content_moderation_state__content_entity_revision_id to the content_moderation_state_field_revision table, on the content_entity_revision_id column

Release notes snippet

To be determined.

✨ Feature request
Status

Needs work

Version

11.0 πŸ”₯

Component
Content moderationΒ  β†’

Last updated 2 days ago

  • Maintained by
  • πŸ‡¦πŸ‡ΊAustralia @Sam152
Created by

πŸ‡¨πŸ‡¦Canada mparker17 UTC-4

Live updates comments and jobs are added and updated live.
  • Performance

    It affects performance. It is often combined with the Needs profiling tag.

  • PostgreSQL

    Particularly affects sites running on the PostgreSQL database.

  • Needs tests

    The change is currently missing an automated test that fails when run with the original code, and succeeds when the bug has been fixed.

Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @mparker17
  • Status changed to Needs review 6 months ago
  • πŸ‡¨πŸ‡¦Canada mparker17 UTC-4

    I don't think it's possible to write tests for this, but if you have ideas, I would very much appreciate some mentorship on how to write them!

    Going to leave this as "Needs review". Feedback is welcome!

  • Pipeline finished with Success
    6 months ago
    Total: 653s
    #172600
  • Status changed to Needs work 6 months ago
  • πŸ‡¨πŸ‡¦Canada mparker17 UTC-4

    I should add an update hook for existing sites...

  • Status changed to Needs review 6 months ago
  • πŸ‡¨πŸ‡¦Canada mparker17 UTC-4

    Added an update hook. Feedback welcome!

    Tagging with "Performance" because it affects performance. Tagging with "PostgreSQL" because it particularly affects sites running on the PostgreSQL database. See the Issue tags -- special tags documentation β†’ for more information.

  • Pipeline finished with Success
    6 months ago
    Total: 674s
    #173375
  • Status changed to Needs work 6 months ago
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Believe we will need test coverage for the update hook/change.

    Number for the update probably needs to be updated as well but not sure what the standard is now with 11 and 10.3 out, TBD.

  • πŸ‡¬πŸ‡§United Kingdom catch
  • πŸ‡¨πŸ‡¦Canada mparker17 UTC-4

    Believe we will need test coverage for the update hook/change.

    @smustgrave, are there any docs or examples on how to do this?

    Number for the update probably needs to be updated as well but not sure what the standard is now with 11 and 10.3 out, TBD.
    [...]
    Related issue: πŸ“Œ Handling update path divergence between 11.x and 10.x Fixed

    FWIW, I expect to have to re-roll this patch before committing, i.e.: to keep up with the latest hook_update_N() number.

    Perhaps I misread, but I didn't see anything in #3108658 that I can act on for this patch at this time?

    Thanks for the feedback, everyone!

  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Believe the update hook number will have to be 10401 according to @catch in slack https://drupal.slack.com/archives/C03L6441E1W/p1716139451323549

  • Pipeline finished with Failed
    6 months ago
    Total: 959s
    #178212
  • πŸ‡¨πŸ‡¦Canada mparker17 UTC-4

    @smustgrave, updated!

    If anyone has some examples or docs on how to write tests for update hooks for the content_moderation module, I'd appreciate it! Asking because...

    1. I don't see any fixtures inside core/modules/content_moderation
    2. The fixtures in core/modules/system/tests/fixtures/update/ don't have content_moderation pre-installed
    3. Importing a fixture from core/modules/system/tests/fixtures/update/ and then installing content_moderation in the setUp create the table with the new index (i.e.: wouldn't actually test the update hook)
    4. Copying-and-modifying a content-fixture blob seems like it could be hard for reviewers to review, and would add 584 MB to the size of core, roughly equivalent to 7.3 minified jQuery libraries (80 MB as reported by du -h core/assets/vendor/jquery/jquery.min.js at time-of-writing)

    (aside, is measuring filesize in terms of "minified jQuery libraries" the Drupal equivalent of describing lengths in "football fields"?)

Production build 0.71.5 2024