admin/content/files is slow with many files

Created on 6 November 2015, about 9 years ago
Updated 7 August 2024, 5 months ago

Problem/Motivation

admin/content/files is very slow for us on a site that has lots of files.

mysql> explain SELECT file_managed.filename AS file_managed_filename, file_managed.created AS file_managed_created, file_managed.changed AS file_managed_changed, file_managed.fid AS fid, SUM(file_usage_file_managed.count) AS file_usage_file_managed_count, MIN(file_managed.fid) AS fid_1
    -> FROM 
    -> file_managed file_managed
    -> LEFT JOIN file_usage file_usage_file_managed ON file_managed.fid = file_usage_file_managed.fid
    -> GROUP BY file_managed_filename, file_managed_created, file_managed_changed, file_managed.fid
    -> ORDER BY file_managed_created DESC
    -> LIMIT 51 OFFSET 0;
+----+-------------+-------------------------+------+------------------------------+---------+---------+-----------------------------+--------+---------------------------------+
| id | select_type | table                   | type | possible_keys                | key     | key_len | ref                         | rows   | Extra                           |
+----+-------------+-------------------------+------+------------------------------+---------+---------+-----------------------------+--------+---------------------------------+
|  1 | SIMPLE      | file_managed            | ALL  | NULL                         | NULL    | NULL    | NULL                        | 184075 | Using temporary; Using filesort |
|  1 | SIMPLE      | file_usage_file_managed | ref  | PRIMARY,fid_count,fid_module | PRIMARY | 4       | file_managed.fid |      1 | NULL                            |
+----+-------------+-------------------------+------+------------------------------+---------+---------+-----------------------------+--------+---------------------------------+

mysql>  SELECT /* sql_no_cache */ file_managed.filename AS file_managed_filename, file_managed.type AS file_managed_type, file_managed.created AS file_managed_created, file_managed.changed AS file_managed_changed, file_managed.fid AS fid, SUM(file_usage_file_managed.count) AS file_usage_file_managed_count, MIN(file_managed.fid) AS fid_1 FROM  file_managed file_managed LEFT JOIN file_usage file_usage_file_managed ON file_managed.fid = file_usage_file_managed.fid GROUP BY file_managed_filename, file_managed_type, file_managed_created, file_managed_changed, file_managed.fid ORDER BY file_managed_created DESC LIMIT 51 OFFSET 0;
...
51 rows in set (2.77 sec)

I thought it could be the order by on created which doesn't have an index, but changing to changed, which has doesn't change anything.

Proposed resolution

I guess it's join to the file usage table and then the optimization gets confused. IIRC, we had a similar problem with authors on admin/content that now loads the users separately. I think we should do the same with file usage. Stop trying to load the info but have a custom field plugin that just uses the file usage API to load that information for the files that are being displayed.

There is no API to get that information for multiple files, so we either have to live with doing 50 fast queries or querying the table directly (or extending the API). Since we already join the table directly, doing a direct query won't be a regression in terms of DB abstraction..

Without the SUM and LEFT join, the query is 0.06s instead of 2.7 even on created, with changed it's obviously 0.00. so not a big issue anymore. Explain:

mysql> explain SELECT file_managed.filename AS file_managed_filename, file_managed.created AS file_managed_created, file_managed.changed AS file_managed_changed, file_managed.fid AS fid FROM  file_managed file_managed ORDER BY file_managed_created DESC LIMIT 51 OFFSET 0;
+----+-------------+--------------+------+---------------+------+---------+------+--------+----------------+
| id | select_type | table        | type | possible_keys | key  | key_len | ref  | rows   | Extra          |
+----+-------------+--------------+------+---------------+------+---------+------+--------+----------------+
|  1 | SIMPLE      | file_managed | ALL  | NULL          | NULL | NULL    | NULL | 184075 | Using filesort |
+----+-------------+--------------+------+---------------+------+---------+------+--------+----------------+

mysql> explain SELECT file_managed.filename AS file_managed_filename, file_managed.created AS file_managed_created, file_managed.changed AS file_managed_changed, file_managed.fid AS fid FROM  file_managed file_managed ORDER BY file_managed_changed DESC LIMIT 51 OFFSET 0;
+----+-------------+--------------+-------+---------------+---------------------+---------+------+------+-------+
| id | select_type | table        | type  | possible_keys | key                 | key_len | ref  | rows | Extra |
+----+-------------+--------------+-------+---------------+---------------------+---------+------+------+-------+
|  1 | SIMPLE      | file_managed | index | NULL          | file_field__changed | 4       | NULL |   51 | NULL  |
+----+-------------+--------------+-------+---------------+---------------------+---------+------+------+-------+
1 row in set (0.00 sec)

Remaining tasks

User interface changes

API changes

Data model changes

๐Ÿ› Bug report
Status

Needs work

Version

11.0 ๐Ÿ”ฅ

Component
File moduleย  โ†’

Last updated 1 day ago

Created by

๐Ÿ‡จ๐Ÿ‡ญSwitzerland berdir Switzerland

Live updates comments and jobs are added and updated live.
  • Performance

    It affects performance. It is often combined with the Needs profiling tag.

  • Triaged core major

    There is consensus among core maintainers that this is a major issue. Only core committers should add this tag.

Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • ๐Ÿ‡ฆ๐Ÿ‡บAustralia acbramley

    This popped up as part of BSI.

    Is this still an issue on 11.x? If so, how many files does it take? Some steps to reproduce would be good

  • ๐Ÿ‡ฎ๐Ÿ‡ณIndia sukr_s

    The issue persists in 11.x
    1. updated the steps the reproduce
    2. created file_managed with 524288 and file_usage with 482306 records
    3. initial page load takes upto 6 seconds and every pagination click takes about the same time.

  • ๐Ÿ‡ฎ๐Ÿ‡ณIndia sukr_s
  • ๐Ÿ‡ฎ๐Ÿ‡ณIndia sukr_s

    further experimentation. Creating a index
    alter table file_managed add index groupidx(fid, filename, filemime,filesize,status, created, changed);
    reduces the initial load to 3 seconds and subsequent pagination to 1.5secs

  • Merge request !9129Resolve #2610126 "Admincontentfiles is slow" โ†’ (Open) created by sukr_s
  • Pipeline finished with Failed
    5 months ago
    Total: 2105s
    #247611
  • Pipeline finished with Success
    5 months ago
    Total: 590s
    #247668
  • Status changed to Needs review 5 months ago
  • ๐Ÿ‡ฎ๐Ÿ‡ณIndia sukr_s
  • Status changed to Needs work 5 months ago
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States smustgrave

    Major bug seems like some form of test coverage will be needed

  • ๐Ÿ‡ฎ๐Ÿ‡ณIndia sukr_s

    @smustgrave: I thought about this and did not add a test case as adding a test case would take up at least 6-8 secs of test time on each run, which would increase the gitlab cost without a significant benefit. There is no logic change per se. It's a typical database admin task. Not a major issue in my perspective. So is a test coverage really needed?

  • Status changed to Needs review 5 months ago
  • ๐Ÿ‡ฎ๐Ÿ‡ณIndia sukr_s
  • Status changed to Needs work 4 months ago
  • ๐Ÿ‡ฌ๐Ÿ‡งUnited Kingdom catch

    The update hook is updating the entity to whatever the schema definition is of the current code base, whereas it should use the API in https://www.drupal.org/node/2554097 โ†’ to update just the thing it actually wants to change so that there aren't side-effects.

  • ๐Ÿ‡ง๐Ÿ‡ทBrazil dungahk Balneรกrio Camboriรบ

    For what is worth it, I've tried the index mentioned in #34 โ†’ and it improved a lot the performance of the query.

    Before the index:

    51 rows in set (57.57 sec)

    After the index:

    51 rows in set (17.26 sec)

    The index query for reference: ALTER TABLE file_managed ADD INDEX groupidx(fid, filename, filemime, filesize, status, created, changed);

    And this is the select query: SELECT file_managed.fid AS fid, file_managed.filename AS file_managed_filename, file_managed.filemime AS file_managed_filemime, file_managed.filesize AS file_managed_filesize, file_managed.status AS file_managed_status, file_managed.created AS file_managed_created, file_managed.changed AS file_managed_changed, SUM(file_usage_file_managed.count) AS file_usage_file_managed_count, MIN(file_managed.fid) AS fid_1 FROM file_managed file_managed LEFT JOIN file_usage file_usage_file_managed ON file_managed.fid = file_usage_file_managed.fid GROUP BY file_managed.fid, file_managed_filename, file_managed_filemime, file_managed_filesize, file_managed_status, file_managed_created, file_managed_changed ORDER BY file_managed_changed DESC LIMIT 51 OFFSET 0;

  • ๐Ÿ‡ฌ๐Ÿ‡ทGreece vensires
Production build 0.71.5 2024