Saving media entities very slow after update to rc1

Created on 24 October 2023, about 1 year ago
Updated 15 April 2024, 9 months ago

After updating the module from beta7 to rc1, we noticed a severe slowdown in saving media entities. This is most noticeable on Pantheon, where saving (creating or editing) a media entity can take up to 30 seconds. It's not noticeable on local dev environments, so may have to do with the distributed file system.

Slow query logs point to `webp_flush_webp_derivatives()`, and in particular the use of `file_exists()`.

The rc1 release added the `webp_flush_webp_derivatives()` function and calls it on hook_insert() and hook_update().

https://git.drupalcode.org/project/webp/-/compare/8.x-1.0-beta7...8.x-1....

If I understand correctly, the function is looping through every single image style and calling `file_exists()` to see if a webp derivative exists. This can be a significant performance hit on sites with lots of image styles.

Thoughts on how we can optimize this?

πŸ› Bug report
Status

Needs review

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States aasarava

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @aasarava
  • πŸ‡©πŸ‡ͺGermany spuky

    I landed here after seeing webp_flush_webp_derivatives in my performance traces after complaints about timeout messages when editing Paragraphs with media Entitys ... site has about 60 Imagestyles.

    I linked the issue that introduced that function ... my site does not use any focus Plugins...

    did a Rollback to beta-7 to have happy editors

  • First commit to issue fork.
  • πŸ‡ΊπŸ‡ΈUnited States generalredneck

    Hey there,
    I was working with aasarava to get through this one. So here's the deal:

    The solution in πŸ› Using Image Widget Crop in responsive images does not refresh webp image Fixed is even inadequate, and here is why. if there are any changes made to specific images using methods that DON'T alter an entity with a file field, then we still have issues. In addition, ANY changes made to the media entity (even so much as just hitting the save button) rebuilds all the webp derivatives for that image.

    That results into a call like this on some filesystems:

    Another example where it falls short, lets say any specific module calls ImageStyle::flush("1234.jpg"); Well then in that instance, webp is out of date and shows up incorrectly, because while all the 1234.jpg images got removed, 1234.webp still exists.

    This potentially happens when crop entities are updated as well independently of the media entity.

    So in order to handle that, we need to be using hook_image_style_flush.

    The challenge here though is πŸ› hook_image_style_flush doesn't get the $path passed to Drupal\image\Entity\ImageStyle::flush() method Fixed which is committed to Drupal Core 11, and only immediately available by applying the patch β†’ .

    That said, I'm still pushing a MR for this.

  • Status changed to Needs review 11 months ago
  • πŸ‡ΊπŸ‡ΈUnited States generalredneck
  • Status changed to Needs work 11 months ago
  • A couple comments on the MR.

  • Status changed to Needs review 11 months ago
  • Comments from @generalredneck address concerns. Moving this back to needs review, but I don't know if this can go to RTBC without the core commit in place (until there's a 10.3 release). End even at that point, perhaps a minimum core constraint in .info.yml is needed as well, but will leave that up to maintainers.

  • πŸ‡ΊπŸ‡ΈUnited States generalredneck

    I do want to point out that there are likely ways to make this even more performant, but I'm kinda at a loss. The biggest kicker is that currently the crop module calls ImageStyle::flush() every time a crop entity is saved... including created.

    See /media/add/image call on this site on pantheon here

    I'm assuming this is because you have things like IMCE where you can browse for existing files on the disk and there MAY be existing derivatives for the styles that exist from the before times or something, and this ensures that we have a clean slate.

    I wish however there was a way we could check without it being murder to performance on filesystems like Pantheon's, or S3fs, as running a file_stat call is costly. With the 125 image styles we are using, running this hook on Image Create is costing something like 15 seconds. It was worse with the other code at twice that (which you can see in the new relic in #5.

  • πŸ‡ΊπŸ‡ΈUnited States generalredneck

    Adding a related issue on Crop that's been open for a while.

    That doesn't mean that this patch isn't still needed. In order for Webp to correctly and always remove derivatives at the same time the other image derivatives are removed, besides when a full image style is nuked, This is the way to go.

    but #3200311: Performance issues due to multiple calls of the image_path_flush() β†’ may help others that are facing a similar problem I'm hitting that is compounding this issue.

  • πŸ‡ΊπŸ‡ΈUnited States generalredneck

    So to follow up. The patch I suggested we needed to add was actually released in the 10.2.0 release of drupal. see https://www.drupal.org/node/3373248 β†’

  • Status changed to RTBC 10 months ago
  • +1 for RTBC since the updated core hook is available in 10.2.

  • First commit to issue fork.
  • Status changed to Needs review 10 months ago
  • πŸ‡ͺπŸ‡ΈSpain Peacog

    Thanks for this fix. I've found a problem with it when the image file is stored in a subdirectory. Currently the webp derivative is stored in the root of the style directory, even if the original file is stored in a subdirectory. For example if the directory is configured as [date:custom:Y]-[date:custom:m] the uploaded file will be files/2024-03/image.jpg but the webp derivative will be files/styles/my-style/public/2024-03image.jpg. The code in webp_image_style_flush looks for the derivative in the subdirectory but doesn't find it, so it doesn't get refreshed.

    We can solve the problem by storing the derivative in the same directory structure as the original file. To do that change the template in calls to getWebpDestination() to "@directory/@filename"

  • πŸ‡ͺπŸ‡ΈSpain Peacog

    I've reverted my commit because the patch would not apply. I think the MR branch needs to be rebased.

  • Status changed to Needs work 10 months ago
  • πŸ‡ΊπŸ‡ΈUnited States generalredneck

    So just to be clear, the issue isn't with the changes that were made with this patch, and there's yet another issue that's compounding this one which is πŸ› Do not regenerate the webp file each time it is requested Needs review ?

    It looks like to me the code you are adding is throwing another bug on the fire for this issue (which I'm not opposed to if it gets this out the door), as we didn't introduce or make changes to that code previously.

    what you have changed in b8811c60 doesn't seem to be a bad change, but also highlights the need that godotislate mentioned at consolidating to the service method.

    I say for now though we revert the revert and put it back into review. We can fix the issue with the merge conflict of #3177103 if it gets merged... or they can fix the conflict with ours if ours gets merged first.

  • πŸ‡ͺπŸ‡ΈSpain Peacog

    That's right @generalredneck. The bug I found is a new one that's not directly related to speeding up the image flush. The fix could be included in this patch, or in #3177103, or even in a new issue of it's own if you think that's appropriate.

  • πŸ‡ΊπŸ‡ΈUnited States generalredneck

    Most appropriate would be it's own issue. That would allow this one to go back to the state it was in prior. It is good to know though.

  • πŸ‡ͺπŸ‡ΈSpain Peacog

    I've create a separate issue for the directory structure problem: πŸ› Webp derivatives not refreshed when files stored in sub-directory Needs review

  • Status changed to Needs review 9 months ago
  • πŸ‡©πŸ‡ͺGermany spuky

    putting this back to needs review.. after reading through #17 to #20

Production build 0.71.5 2024