[3.x] Duplicates report page does not scale

Created on 18 August 2021, over 3 years ago
Updated 23 May 2023, almost 2 years ago

Problem/Motivation

The reports page page controller, ReportController does this:

    $checksum_results = $this->checksumStatistics->checksumsWithDuplicates();

    // SNIP

    $media_results = $this->database
      ->select('media_field_data', 'mfd')
      ->fields('mfd', ['mid', 'name', 'duplicates_checksum'])
      ->condition('duplicates_checksum', array_keys($checksum_results), 'IN')
      ->execute();

This query won't scale with large number of checksums, because the concatenated checksums in the IN condition will make the query exceed mysql's max_allowed_packet.

The fix would be a single query that self-joins the table. Something like this:

SELECT m2.mid 

FROM media_field_data m2 
JOIN

(SELECT duplicates_checksum, count(mid) FROM media_field_data
GROUP BY duplicates_checksum HAVING count(*) > 1) AS m1

USING (duplicates_checksum)

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

🐛 Bug report
Status

Postponed

Version

2.0

Component

Code

Created by

🇬🇧United Kingdom joachim

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.71.5 2024