Field check out of memory

Created on 28 June 2021, over 3 years ago
Updated 23 January 2023, almost 2 years ago

When executing a review of content the batch process fails silently with a batch error but nothing in the error log. This only happens on live environments and not on local development environments. This was narrowed down to the fetchAll() method being used in /src/Checks/Field.php to retrieve the entire field table into an array. Some of our tables have 300k+ entries containing long formatted text fields so the assumption is that this fetchAll is exceeding memory limitations in the live environments.

Removing fetchAll() does not affect the operation since it can iterate over the result set rather than over the result array although it might lead to a slower execution time. It also appears to use less memory as this modification allows the live environments to work.

Patch to follow.

πŸ› Bug report
Status

Fixed

Version

2.0

Component

Code

Created by

πŸ‡¬πŸ‡§United Kingdom dippers

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • Status changed to Needs review almost 2 years ago
  • πŸ‡ΊπŸ‡ΈUnited States ednark

    I wanted to try a different approach than the 3221065-7 patch.

    Problem: This type of check cannot rely on Drupal's internal dynamically built queries. Drupal Core Database's StatementPrefetch call relies on a full download of all result data from the database. This means there will always be a resource limit imposed by the Drupal host from this one call in core via:

    $this->data = $statement->fetchAll(\PDO::FETCH_ASSOC);

    Possible Solutions: In this type of situation you either need to run separate small queries per item you want to check, or push the storage of the result set into the database itself and only ask to download one result at at time.

    Solution: Running a separate query to load each node in the system seems like a waste so I will try generating one query per field instead and attempt leave the data on the storage server as long as possible. To do this, I will use a static query instead of a dynamic one. A drawback to this method will be that since the table and column names are dynamic we will have to construct a new query for every field we want to check. Normally you should avoid string concatenation when dealing with queries, but in this specific situation all column names are being pulled directly from Drupal in the first place and so they should be safe in a practical sense from any possible corruption, even if we have to use a bad practice.

    This patch reorders the loops so the id and data columns are calculated first which allows us to generate a new query for each field needing to be checked. We then also need to be sure to unset the $query variable each loop to free any temporary memory reserved by the underlying libraries.

  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Add to tweak the patch in #8 as it wasn't picking up on nodes that I added a script tag to.

  • Status changed to RTBC almost 2 years ago
  • πŸ‡ΊπŸ‡ΈUnited States ednark

    I have tested this and it functions as expected without errors.

    Tested on 10.1.x-dev with a standard install
    module drupal/security_review:2.0.x-dev@dev
    patch 3221065-9.patch

    Test procedure:
    start with base 2.0.x module code
    add an unreasonable amount of content items via devel
    drush devel-generate-content --bundles=page 200001
    run test to see Memory error generated
    apply patch
    run test to see it completes and has no Fields errors
    add script tag to any piece of content
    run test to see a Field error in the piece of content
    add the errored content's reported Hash to the ignore list
    run test to see no Field errors

  • @smustgrave opened merge request.
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave
  • Status changed to Fixed almost 2 years ago
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave
  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024