Function yieldItem in SimplesitemapQueue allocates too much memory

Created on 3 September 2021, over 3 years ago
Updated 7 March 2022, about 3 years ago

Problem/Motivation

When you have a huge numbers of queue's after full regenerate for simple_sitemap memory limit can be reached.
For our case it occurs for 300k+ items in queue.

SELECT count(*) FROM queue where name LIKE '%simple_sitemap%' 
======
count(*)
334491

Steps to reproduce

1. Generate the content to have a 300k+ records for queue
2. Run drush ssr to regenerate entities for sitemap
3. Run drush ssg

Proposed resolution

Load queue items in chunks

Remaining tasks

User interface changes

API changes

Data model changes

📌 Task
Status

Needs review

Version

4.0

Component

Code

Created by

🇵🇱Poland dmitry.korhov Poland, Warsaw

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.4 + Environment: PHP 8.1 & MySQL 5.7
    last update over 1 year ago
    Fetch Error
  • Status changed to Closed: cannot reproduce 5 days ago
  • 🇷🇺Russia walkingdexter

    This is not an issue anymore (tested locally with performance_test.php). Yeah, the memory usage grows with the amount of data in the sitemap, but I couldn't find any correlation with yieldItem(), even if we select one element at a time from the queue. It must be related to something else.

  • 🇵🇱Poland dmitry.korhov Poland, Warsaw

    I disagree with closing.
    The problem is still there - loading all items at once.
    It should be split into chunks or batches.

  • 🇵🇱Poland dmitry.korhov Poland, Warsaw
  • 🇷🇺Russia walkingdexter

    I disagree with closing.
    The problem is still there - loading all items at once.
    It should be split into chunks or batches.

    @dmitry.korhov Ok. Please provide a detailed explanation of how to see the benefit of the proposed changes. Also keep in mind that with these changes half of the links will not be included in the sitemap.

    If you look closely at the current implementation of yieldItem(), you will see that the items are not loaded all at once, because we use fetchObject(). In the case of MySQL, the important thing is buffered queries. However, in my testing I found no correlation with memory consumption (Drupal 10.4, PHP 8.1, MySQL 8.0). The proposed changes also have no effect.

    It's possible that the improvement has already been implemented at the core level. See 🐛 Implement statement classes using \Iterator to fix memory usage regression and prevent rewinding Fixed for details.

Production build 0.71.5 2024