Performance: Consider custom table rather than key_value for pathauto state storage

Created on 17 February 2024, over 1 year ago

Problem/Motivation

Currently, pathauto stores its `pathauto_state.*` collection data in the key_value table. On our site, that means that we have:

collection key count
pathauto_state.media 1931155
pathauto_state.node 1137720

over 3 million rows in the key_value table used by pathauto. Since key_value is heavily used by Drupal, this leads to a sag in performance hits for any meaningfully sized web site.

Steps to reproduce

1. Create a Drupal site. Install pathauto.
2. Use devel:generate to generate 10,000 items of content. Measure page load speed both cached and uncached.
3. Use devel:generate to generate 3,000,000 items of content (this will take a long long time). Measure page load speeds.

Proposed resolution

Give pathauto its own data table to use so other processes have less work to do. This would also benefit pathauto directly in that we would no longer have to use serialization to read/write the data, but, given control over the schema of this new table, could massively reduce drag within pathauto itself by storing the data in discrete, normalized rows instead.

Remaining tasks

Design a pathauto table
Add an update hook to pathauto.install to install the new table and convert the data in key_value to rows in that table.
Update code the writes or reads to or from the pathauto_state.* collections to use the new table instead.

User interface changes

None.

Data model changes

New database table to store pathauto_state.* data.

✨ Feature request
Status

Active

Version

1.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States apotek

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @apotek
  • πŸ‡ΊπŸ‡ΈUnited States apotek
  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    I've been pondering whether using key value was the right decision, but the performance impact of this shouldn't be that big.

    key_value queries should be well optimized and pathauto state should be acessed only when editing/generating aliases.

    Also, have a look at πŸ“Œ [PP-1] Use cache collector for state Needs work , I've been working on that for years and I'm hopeful that it will finally land soon.

    This is a fairly big change with a possibly slow migration path, so it would need to be really worth doing so.

  • πŸ‡¨πŸ‡³China lawxen

    πŸ“Œ [PP-1] Use cache collector for state Needs work has landed on Drupal10.3
    How we can use it to solve the performance of this issue?

  • πŸ‡¨πŸ‡³China lawxen

    From my understanding,
    Just set $settings['state_cache'] config, then drupal will cache the whole key_value table
    Then the performance solved, nothing need do in this module.
    So, can we make this issue as fixed?
    @berdir

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    No, path auto does not use state

  • πŸ‡ΊπŸ‡ΈUnited States moshe weitzman Boston, MA

    I'm seeing these entries taking up quite a lot of space in the key_value table. Are these entries a sort of cache? If so, maybe we could communicate that better by using key_value_expirable? I dont care how long the expiration time is - just that these rows are safe to truncate.

    SELECT SUM(LENGTH(value)) as sum,collection FROM key_value GROUP BY collection ORDER BY sum DESC;
    
    1689040	pathauto_state.user
    1225681	entity.definitions.installed
    952891	entity.storage_schema.sql
    883792	pathauto_state.media
    607280	pathauto_state.node
    158688	media
    133320	pathauto_state.taxonomy_term
    76369	entity.update_backup
    48466	state
    46668	update_fetch_task
    36336	entity.definitions.bundle_field_map
    31820	config.entity.key_store.field_config
    31354	post_update
    
  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    They are not a cache, this is the canonical storage. A separate table or proper field storage would have advantages, but especially field storage would come with major headaches too, same for the migration path.

Production build 0.71.5 2024