Cron update either exceeding time-out or exhausting server memory

Created on 1 May 2023, over 1 year ago
Updated 13 June 2023, over 1 year ago

Problem/Motivation

We've set our site to a conservative caching strategy, however cron was crashing, causing it to lock. I attempted to run cron via drush without a timeout while watching top via a separate session and watched as the server memory started filling at an exponential rate until it exhausted all 64 GB. I can only presume a memory leak where memory consumed is doubling in size as it goes.

Steps to reproduce

Not clear. Make a repo with lots of items, set a conservative strategy, then run con via drush.

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

πŸ› Bug report
Status

Fixed

Version

2.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States seth.e.shaw

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @seth.e.shaw
  • πŸ‡ΊπŸ‡ΈUnited States seth.e.shaw

    Oddly enough, this OOME happened on dev while testing, but not production. Running drush cron on production has memory holding steady at 10.3 GB with the CPU sitting ~85% of that core for ~44 minutes now.

  • πŸ‡³πŸ‡ΏNew Zealand code_brown

    We are also experiencing this issue on a site, it appears to run out of memory at the line:

    $entities = $entity_storage->loadMultiple(array_unique($flat_ids));
    

    when trying to load ~65k node ids, in web/core/modules/views/src/Plugin/views/query/Sql.php:1625

    Digging into it a bit further, it seems to happen when processing field values for an entity_reference field that references other nodes, in our case when $table equals node__field_member_of in the following loop, it runs out of memory:

    web/core/lib/Drupal/Core/Entity/Sql/SqlContentEntityStorage.php:1229

    I don't have a solution to this yet, just letting you know I'm also seeing it and this is where it borks for me. This also started happening after upgrading Drupal from 9.4.14 to 9.5.9.

    This happens when:
    * REST OAI-PMH Queue worker fires on cron
    * Viewing the feed at /oai/request?verb=ListRecords&metadataPrefix=oai_dc
    * Rebuilding the feed at /admin/config/services/rest/oai-pmh/queue

  • πŸ‡³πŸ‡ΏNew Zealand code_brown

    It looks to me like there should be a limit set when executing the view inside RestOaiPmhViewsCacheBase::processItem($data), similar to how it is done in rest_oai_pmh_cache_views() - because when the processItem() method is called we already know the offset and limit we're working with, so we shouldn't have to load all results for the view - if I'm understanding this correctly?


    I've got a small patch I'll upload shortly that gets things working again for me, would be keen to get some feedback on whether I've understood this correctly or am missing how the queue/processItem system works?

    Steps I used to reproduce the issue

    1. Create a Drupal 10 Lando development environment
    2. Composer install drupal/rest_oai_pmh, drupal/devel_generate
    3. Generate 100,000 nodes of type Article
    4. Create an entity reference source view of those articles
    5. Configure rest_oai_pmh to use that view and set cache strategy to Conservative (also set metadata format to oai_dc)
    6. Rebuild feed and observe out of memory error in SqlContentEntityStorage
    7. Apply patch, rebuild feed and it rebuilds successfully
  • @code_brown opened merge request.
  • πŸ‡³πŸ‡ΏNew Zealand code_brown

    I think the reason the entity reference view display tries to load all results by default is that the entity_reference_options has limit set to 0 in https://git.drupalcode.org/project/drupal/-/blob/9.5.x/core/modules/view... and so it has to be limited by a call to:

        $view->getDisplay()->setOption('entity_reference_options', ['limit' => $view->getItemsPerPage()]);
    
  • Status changed to RTBC over 1 year ago
  • πŸ‡ΊπŸ‡ΈUnited States seth.e.shaw

    I applied the MR to our site and cron is no longer crashing and I can see the records and set membership increasing in record counts each time. πŸ‘

    Can we get this merged?

  • Status changed to Fixed over 1 year ago
  • πŸ‡ΊπŸ‡ΈUnited States joecorall

    Thank you Seth for reporting and @code_brown for the patch. The patch has been merged and beta12 released with the fix.

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024