admin/content/files is slow with many files

Created on 6 November 2015, about 9 years ago
Updated 14 August 2024, 3 months ago

Problem/Motivation

admin/content/files is very slow for us on a site that has lots of files.

mysql> explain SELECT file_managed.filename AS file_managed_filename, file_managed.created AS file_managed_created, file_managed.changed AS file_managed_changed, file_managed.fid AS fid, SUM(file_usage_file_managed.count) AS file_usage_file_managed_count, MIN(file_managed.fid) AS fid_1
    -> FROM 
    -> file_managed file_managed
    -> LEFT JOIN file_usage file_usage_file_managed ON file_managed.fid = file_usage_file_managed.fid
    -> GROUP BY file_managed_filename, file_managed_created, file_managed_changed, file_managed.fid
    -> ORDER BY file_managed_created DESC
    -> LIMIT 51 OFFSET 0;
+----+-------------+-------------------------+------+------------------------------+---------+---------+-----------------------------+--------+---------------------------------+
| id | select_type | table                   | type | possible_keys                | key     | key_len | ref                         | rows   | Extra                           |
+----+-------------+-------------------------+------+------------------------------+---------+---------+-----------------------------+--------+---------------------------------+
|  1 | SIMPLE      | file_managed            | ALL  | NULL                         | NULL    | NULL    | NULL                        | 184075 | Using temporary; Using filesort |
|  1 | SIMPLE      | file_usage_file_managed | ref  | PRIMARY,fid_count,fid_module | PRIMARY | 4       | file_managed.fid |      1 | NULL                            |
+----+-------------+-------------------------+------+------------------------------+---------+---------+-----------------------------+--------+---------------------------------+

mysql>  SELECT /* sql_no_cache */ file_managed.filename AS file_managed_filename, file_managed.type AS file_managed_type, file_managed.created AS file_managed_created, file_managed.changed AS file_managed_changed, file_managed.fid AS fid, SUM(file_usage_file_managed.count) AS file_usage_file_managed_count, MIN(file_managed.fid) AS fid_1 FROM  file_managed file_managed LEFT JOIN file_usage file_usage_file_managed ON file_managed.fid = file_usage_file_managed.fid GROUP BY file_managed_filename, file_managed_type, file_managed_created, file_managed_changed, file_managed.fid ORDER BY file_managed_created DESC LIMIT 51 OFFSET 0;
...
51 rows in set (2.77 sec)

I thought it could be the order by on created which doesn't have an index, but changing to changed, which has doesn't change anything.

Steps to reproduce

1. Upload one file using any node type
2. run the following SQL insert into file_managed (uuid, langcode, uid, filename, uri, filemime, filesize, status, created, changed) select UUID(), langcode, uid, filename, uri, filemime, filesize, status, created, changed from file_managed; couple of times
3. run the following SQL once insert into file_usage (fid,module, type, id,count) select fid, 'file','node',fid,1 from file_managed where fid > 1;
4. create some missing entries in file_usage to make sure LEFT JOIN makes sense. use SQL delete from file_usage where fid > 23323 and fid < 89880;

Proposed resolution

Create index used in the Group by clause.

Older Proposed resolution

I guess it's join to the file usage table and then the optimization gets confused. IIRC, we had a similar problem with authors on admin/content that now loads the users separately. I think we should do the same with file usage. Stop trying to load the info but have a custom field plugin that just uses the file usage API to load that information for the files that are being displayed.

There is no API to get that information for multiple files, so we either have to live with doing 50 fast queries or querying the table directly (or extending the API). Since we already join the table directly, doing a direct query won't be a regression in terms of DB abstraction..

Without the SUM and LEFT join, the query is 0.06s instead of 2.7 even on created, with changed it's obviously 0.00. so not a big issue anymore. Explain:

mysql> explain SELECT file_managed.filename AS file_managed_filename, file_managed.created AS file_managed_created, file_managed.changed AS file_managed_changed, file_managed.fid AS fid FROM  file_managed file_managed ORDER BY file_managed_created DESC LIMIT 51 OFFSET 0;
+----+-------------+--------------+------+---------------+------+---------+------+--------+----------------+
| id | select_type | table        | type | possible_keys | key  | key_len | ref  | rows   | Extra          |
+----+-------------+--------------+------+---------------+------+---------+------+--------+----------------+
|  1 | SIMPLE      | file_managed | ALL  | NULL          | NULL | NULL    | NULL | 184075 | Using filesort |
+----+-------------+--------------+------+---------------+------+---------+------+--------+----------------+

mysql> explain SELECT file_managed.filename AS file_managed_filename, file_managed.created AS file_managed_created, file_managed.changed AS file_managed_changed, file_managed.fid AS fid FROM  file_managed file_managed ORDER BY file_managed_changed DESC LIMIT 51 OFFSET 0;
+----+-------------+--------------+-------+---------------+---------------------+---------+------+------+-------+
| id | select_type | table        | type  | possible_keys | key                 | key_len | ref  | rows | Extra |
+----+-------------+--------------+-------+---------------+---------------------+---------+------+------+-------+
|  1 | SIMPLE      | file_managed | index | NULL          | file_field__changed | 4       | NULL |   51 | NULL  |
+----+-------------+--------------+-------+---------------+---------------------+---------+------+------+-------+
1 row in set (0.00 sec)

Remaining tasks

User interface changes

API changes

Data model changes

🐛 Bug report
Status

Needs work

Version

11.0 🔥

Component
File module 

Last updated 1 day ago

Created by

🇨🇭Switzerland berdir Switzerland

Live updates comments and jobs are added and updated live.
  • Performance

    It affects performance. It is often combined with the Needs profiling tag.

  • Triaged core major

    There is consensus among core maintainers that this is a major issue. Only core committers should add this tag.

  • Needs tests

    The change is currently missing an automated test that fails when run with the original code, and succeeds when the bug has been fixed.

Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇦🇺Australia acbramley

    This popped up as part of BSI.

    Is this still an issue on 11.x? If so, how many files does it take? Some steps to reproduce would be good

  • 🇮🇳India sukr_s

    The issue persists in 11.x
    1. updated the steps the reproduce
    2. created file_managed with 524288 and file_usage with 482306 records
    3. initial page load takes upto 6 seconds and every pagination click takes about the same time.

  • 🇮🇳India sukr_s

    further experimentation. Creating a index
    alter table file_managed add index groupidx(fid, filename, filemime,filesize,status, created, changed);
    reduces the initial load to 3 seconds and subsequent pagination to 1.5secs

  • Merge request !9129Resolve #2610126 "Admincontentfiles is slow" → (Open) created by sukr_s
  • Pipeline finished with Failed
    4 months ago
    Total: 2105s
    #247611
  • Pipeline finished with Success
    4 months ago
    Total: 590s
    #247668
  • Status changed to Needs review 4 months ago
  • Status changed to Needs work 4 months ago
  • 🇺🇸United States smustgrave

    Major bug seems like some form of test coverage will be needed

  • 🇮🇳India sukr_s

    @smustgrave: I thought about this and did not add a test case as adding a test case would take up at least 6-8 secs of test time on each run, which would increase the gitlab cost without a significant benefit. There is no logic change per se. It's a typical database admin task. Not a major issue in my perspective. So is a test coverage really needed?

  • Status changed to Needs review 3 months ago
  • Status changed to Needs work 3 months ago
  • 🇬🇧United Kingdom catch

    The update hook is updating the entity to whatever the schema definition is of the current code base, whereas it should use the API in https://www.drupal.org/node/2554097 to update just the thing it actually wants to change so that there aren't side-effects.

Production build 0.71.5 2024