Parallel indexing using concurrent drush processes

Created on 23 July 2024, 9 months ago
Updated 8 August 2024, 8 months ago

I was annoyed from the duration of indexing a complex site with a lot of custom PHP code that affects the indexing of each item. It took about 2 hours per run and I had to do it multiple times.
Unfortunately Search API is not prepared in any way to perform multiple indexing tasks in parallel. But my test system has enough CPU cores and Solr itself would be able to accept a much higher indexing load.
So I created a new drush command to run multiple index processes in parallel:
drush search-api-solr:index-parallel --threads=10
drush search-api-solr:index-parallel YOUR_INDEX_ID --threads=10 --batch-size=100

The approach is a bit hackish, but it works :slightly_smiling_face:
Using 10 “threads” the same indexing as before is done in 20 minutes! I think that this is a signifikant improvement :grin:

By default, different indexes get indexed in parallel using this new command.
But if you additionally change the tracker of an index to the new “Index parallel”, the different items of that index get indexed in parallel!

I think I’ll merge it into search_api_solr soon, but if you’re already interested:
https://github.com/mkalkbrenner/search_api_solr/commit/bdd44de65270d3e0e...

Feature request
Status

Fixed

Version

4.0

Component

Code

Created by

🇩🇪Germany mkalkbrenner 🇩🇪

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Production build 0.71.5 2024