Massively improve batch performance by reducing the number of queries

Issue created by @alexpott
Merge request !105Improved batch processing → (Merged) created by alexpott
Pipeline finished with Failed
5 months ago
Total: 441s
#409244
Pipeline finished with Success
5 months ago
Total: 357s
#409247
Comment 5 months ago →
🇬🇧United Kingdom alexpott 🇪🇺🌍
This has a massive impact - processing 2000 paragraph revisions on site went from 50+ seconds to 1.3 seconds. There are 2,000,000+ paragraph revisions on the site.
Comment 5 months ago →
🇩🇪Germany chr.fritsch 🇩🇪🇪🇺🌍
Just incredible @alexpott 🥳
Comment 5 months ago →
🇬🇧United Kingdom alexpott 🇪🇺🌍
So we need to be careful here. 4 million integers in an array is going to take up about 64 MB.... and 1 million is 16 MB - I think we should assume that Drush is being run in an environment with 512 MB of memory - I think we should load 500000 in one go and maybe make it configurable.
Pipeline finished with Success
5 months ago
Total: 216s
#409796
Pipeline finished with Canceled
5 months ago
Total: 74s
#409798
Pipeline finished with Success
5 months ago
Total: 249s
#409800
Pipeline finished with Success
5 months ago
Total: 216s
#409823
Comment 5 months ago →
🇬🇧United Kingdom alexpott 🇪🇺🌍
First commit to issue fork.
Pipeline finished with Skipped
5 months ago
#410956
Comment 5 months ago →
System Message

marcoscano → committed a9019b3e on 8.x-2.x authored by alexpott →
Issue #3503070 by alexpott, marcoscano: Massively improve batch...
Comment 5 months ago →
🇪🇸Spain marcoscano Barcelona, Spain
This is indeed a great idea. Thanks for contributing! 👍

/** * The number of IDs to load when in bulk mode. */ const BULK_ID_LOAD = 1000000;
@alexpott in you testing, did you see a significant impact in changing this value? I did a quick check in my local and it does seem to have a meaningful change in how fast the batch processes, so I am wondering if it makes sense to make this configurable. I don't think we need to expose this on the UI, but a settings value that people could define/override per environment (falling back to the constant) seems to make sense for me. In any case, if that's the case we can make that as an improvement in a follow-up.

For now this looks good to go from me. Thanks again!
Comment 5 months ago →
🇬🇧United Kingdom alexpott 🇪🇺🌍
I selected defaults that worked in an environment with millions of revisions and only 512 MB available to PHP and ran drush to rebuild the table. I agree that these could be configurable - it'd be nice if it was config and drush because then if you rebuild via the UI you can tweak stuff and you can also tweak on the command level. Feel free to open an issue about that - I'll get round to it but not sure when as it won't be a priority :)
Comment 5 months ago →
🇬🇧United Kingdom alexpott 🇪🇺🌍
@marcoscano I created the follow-up - see 📌 Make batch performance constants configurable Active
Comment 5 months ago →
System Message
Automatically closed - issue fixed for 2 weeks with no activity.

Massively improve batch performance by reducing the number of queries

Problem/Motivation

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Merge Requests

!105Massively improve batch performance by reducing the number of queries
Merged

Comments & Activities

Massively improve batch performance by reducing the number of queries

Problem/Motivation

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Merge Requests

!105Massively improve batch performance by reducing the number of queriesMerged

Comments & Activities

!105Massively improve batch performance by reducing the number of queries
Merged