Timeout when removing a webform field

Created on 28 October 2024, 3 months ago

Problem/Motivation

I have a Webform with over 200k submissions. When attempting to remove a field from this form, a timeout error occurs. Upon reviewing the code, I observed that when a field is removed from the Webform, the associated data in the `webform_submission_data` table is also deleted. This operation takes a significant amount of time, hitting either the PHP max execution time or the proxy request timeout.

Steps to reproduce

  1. Install the `webform` and `devel_generate` modules
  2. Generate 200k submissions on the Contact webform with the command: `drush webform:generate contact 200000`
  3. Go to manage the Contact webform and delete the `message` field

Proposed resolution

  • Add a table index on the `webform_id` and `name` columns;
  • Replace the `IN` clause in the `DELETE` query with a `foreach` loop;

Original Execution Time: 140 seconds

With Index + `IN` Query: 38 seconds

With Index + Foreach Loop: 25 seconds

πŸ› Bug report
Status

Active

Version

6.2

Component

Code

Created by

πŸ‡§πŸ‡·Brazil hfernandes

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @hfernandes
  • Pipeline finished with Failed
    3 months ago
    Total: 314s
    #323511
  • Pipeline finished with Failed
    3 months ago
    Total: 1221s
    #324085
  • Pipeline finished with Success
    3 months ago
    Total: 1964s
    #324103
  • πŸ‡ΊπŸ‡ΈUnited States jrockowitz Brooklyn, NY

    I would lean towards committing ✨ [Performance] Add an index on webform_submission_data name and property Needs review and then make this performance improvement.

  • πŸ‡§πŸ‡·Brazil hfernandes

    @jrockowitz, would it make sense to merge both developments into a single MR? The ✨ [Performance] Add an index on webform_submission_data name and property Needs review issue introduces an index on the 'name' and 'property' values, while this MR adds an index on 'name' and 'webform_id.'.

  • πŸ‡ΊπŸ‡ΈUnited States jrockowitz Brooklyn, NY

    It is very possible the creating the index proposed in ✨ [Performance] Add an index on webform_submission_data name and property Needs review will address this issue. My understanding is the name part of the name,property will be used by the delete query

    I am not sure removing the SQL IN condition will be a significant improvement.
    @see https://stackoverflow.com/questions/31500/do-indexes-work-with-in-clause

  • Pipeline finished with Success
    about 2 months ago
    Total: 358s
    #352916
  • πŸ‡¨πŸ‡¦Canada Liam Morland Ontario, CA πŸ‡¨πŸ‡¦
  • Pipeline finished with Failed
    8 days ago
    Total: 732s
    #395397
  • Pipeline finished with Success
    8 days ago
    #395408
  • Issue was unassigned.
  • Status changed to Needs review 7 days ago
  • πŸ‡§πŸ‡·Brazil hfernandes

    Hi @jrockowitz,

    Thank you for your suggestion regarding the index on name,property. However, I believe that a composite index including both webform_id and name would provide better performance for the query in question. Here’s my reasoning:

    Query Condition Analysis:
    Based on the query you referenced, the WHERE clause includes both webform_id and name. This means that both columns are involved in filtering the results.
    While an index on name,property may help in some scenarios, it does not fully cover the webform_id filter condition.
    A composite index on webform_id and name allows the database engine to use both columns together for filtering. This reduces the need for additional scans and increases the speed of the query, especially in large datasets where webform_id is often a primary filter.
    Without this composite index, the database would either:

    • Perform a less efficient scan using a partial index (e.g., on name or property alone).
    • Fall back to a full table scan if no suitable index exists.

    Performance Testing Evidence:
    To demonstrate the impact, I tested the query on a sample dataset. Below are the results:

    Query without a composite index:

    Query with a composite index on name and property:

    Query with a composite index on webform_id and name:

    These tests show a clear improvement in query performance when using the composite index on webform_id and name and no real impact - the performance got worse - when using a composite index on name and property.

    Database Optimization Best Practices:
    It’s a best practice to align indexes with the columns used in the WHERE clause to minimize row scans.
    Adding the composite index on webform_id and name would allow the query planner to leverage it effectively and deliver faster results.

    I also agree that the use of the IN clause in this case is unlikely to cause a significant impact on query performance. However, I’d like to point out that the post you referenced pertains to SQL Server, which uses a different query engine than MySQL. While some general indexing principles may overlap, it’s important to account for differences in how these databases optimize queries and use indexes. In MySQL, aligning indexes with the columns in the WHERE clause is a crucial factor in achieving optimal performance on large datasets.

  • Pipeline finished with Failed
    7 days ago
    Total: 2117s
    #396866
  • Pipeline finished with Failed
    7 days ago
    Total: 524s
    #396964
  • Pipeline finished with Failed
    7 days ago
    Total: 431s
    #396977
Production build 0.71.5 2024