Improve performance of LinkExtractorBatch & allow to skip unpublished content

Created on 8 March 2024, 3 months ago
Updated 14 May 2024, 19 days ago

Problem/Motivation

I have request to skip unpublished content in linkchecker for entity types implementing EntityPublishedInterface. That might be a nice extension. I also noticed, that LinkExtractorBatch::processEntities() can be improved, which might have impact to performance.

Proposed resolution

I'd like to propose solution to:

  1. Skip checking links in unpublished content. If selected in linkchecker configuration, links in unpublished content will be ignored (with status 0).
  2. I see that by default batch size is set to 20. I'd like to propose to use this Settings::get('entity_update_batch_size', 50) to determine batch size. This is something which is used by core. And by default batch size will be 50. This should be applied on LinkExtractorBatch and LinkCleanUp.
  3. Improve performance of LinkExtractorBatch::processEntities() by loading all entities at once. So, changing:
    $storage = $this->entityTypeManager->getStorage($entityType->id());
    foreach ($ids as $id) {
      $entity = $storage->load($id);
      ...

    to:

    $entities = $this
      ->entityTypeManager
      ->getStorage($entityType->id())
      ->loadMultiple($ids);
    foreach ($entities as $entity) {
    ...
  4. LinkExtractorBatch::processEntities uses SQL LEFT JOIN, which is expensive query. And this function is called every cron run. So itโ€™s good to check first if there is anything to process (which is fast). And call query with LEFT JOIN only if you know that there is data to process (because itโ€™s slow). So at the beginning of mentioned function add this:
    public function processEntities($numberOfItems = NULL) {
      $numberOfProcessedItems = 0;
      // This function is used in batch to extract all links on demand and it's
      // also called on every cron run (see linkchecker_cron()). Because it uses
      // SQL LEFT JOIN, it's quite expensive. So, first check if there is anything
      // to process. If yes, then use query with LEFT JOIN to retrieve entities to
      // be processed.
      if ($this->getTotalEntitiesToProcess() <= $this->getNumberOfProcessedEntities()) {
        return $numberOfProcessedItems;
      }
    
    // Call SQL LEFT JOIN and process entities
    ...
โœจ Feature request
Status

RTBC

Version

2.0

Component

Code

Created by

๐Ÿ‡ต๐Ÿ‡ฑPoland gugalamaciek

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Production build 0.69.0 2024