Aggregated URL hashes for assets can mismatch due to different order of library assets

Created on 30 October 2023, about 1 year ago

Problem/Motivation

While working on some Webform admin pages, I noticed that the JS seems slow to load. Turns out that one of the JS aggregates is not saved to disk. Drupal is regenerating it on-demand every page request and not storing it.

This seems similar to 🐛 Aggregation URL hashes should be built from normalized list of libraries Fixed but the solution there doesn't fix this problem.

Some context:
The new asset aggregator generates a hash for each aggregate file when a page is loaded and uses it in the filename for the asset. When the file is requested, Drupal's AssetControllerBase receives it attempts to recreate the hash based on the encoded library data passed in the query string. It then creates a new hash from this decoded library data and if it doesn't match the original hash from the filename, the aggregate CSS or JS is created and delivered, but it's not written to the filesystem.

Steps to reproduce

  1. Clean install of Drupal 10.1.5 using Standard profile
  2. Ensure JS aggregation is turned on
  3. Install Webform 6.2.0
  4. Create a simple webform (just add a text field to it) and submit an entry
  5. As an admin, view the webform submission
  6. Observe that the JS can be slow to load. You can see that one of the footer aggregates is always hitting the asset controller

To help debug this, I disabled the cache bins for the render, dynamic page, and data caches. I added xdebug breakpoint to the AssetGroupSetHashTrait::generateHash() method's return statement. The AssetControllerBase will hit this when it's trying to generate the hash of the JS aggregate it receives a request for. I used xdebug to save the var_export of $normalized. I then stepped through this same process but for the generation side of things (when the filename hash is created on the main page request). I saved the output and compared the values.

The assets are the same, but the order of some of them is slightly different, resulting in a different generated hash value for the asset group. Attached are text files containing the noramalized assets array from the main page request generation, and from request the controller receives for the library, as well as the diff output of them.

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

🐛 Bug report
Status

Active

Version

11.0 🔥

Component
Asset library 

Last updated 1 day ago

No maintainer
Created by

🇺🇸United States bkosborne New Jersey, USA

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @bkosborne
  • 🇬🇧United Kingdom catch

    The diff shows webform.states.js, webform.filter.js and webform.form.js out of order.

    Took a look at the library definitions:

    webform.filter:
      version: VERSION
      css:
        theme:
          css/webform.filter.css: {}
      js:
        js/webform.filter.js: {}
      dependencies:
        - core/jquery
        - core/drupal
        - core/drupal.announce
        - core/drupal.debounce
    
    
    webform.states:
      version: VERSION
      js:
        js/webform.states.js: {}
      dependencies:
        - webform/webform.behaviors
        - core/drupal.states
    
    webform.form:
      version: VERSION
      css:
        component:
          css/webform.form.css: {}
      js:
        js/webform.form.js: {}
      dependencies:
        - core/drupal
        - core/drupal.form
        - core/jquery
        - core/once
        - webform/webform.states
    

    So webform.form always depends on webform.states and looks like they're in the right order each time in respect to each other, but there is no dependency with webform.filter.

    So the problem is that from the point of view of dependencies, it doesn't matter what order webform.filter and webform.form appear in, but from the point of view of the aggregate validation it does. On top of this, even if we didn't do the hash validation, or cached despite it failing, the order being different each time would mean an extra aggregate for no reason.

    I think the actual, robust, fix for this would be 📌 Replace custom weights with dependencies in library declarations; introduce "before" and "after" for conditional ordering Needs work .

    In 🐛 Aggregation URL hashes should be built from normalized list of libraries Fixed I briefly tried ordering the libraries in the main request (and by extension, the aggregate request) alphabetically, and that massively broke things, but maybe that's worth exploring again too.

  • 🇺🇸United States bkosborne New Jersey, USA

    I'm also finding a possibly related issue where a CSS aggregate URL 404s every time. I haven't yet figure out if it's the same issue as this. Here's my rough notes for the issue so far from debugging the affected page:

    HtmlResponseAttachmentSubscriber:

    1. Gets 36 libraries to load from the page response object
    2. Creates the "styles" twig template variable by asking AssetResolver->getCssAssets for the actual CSS assets to include in the page
    3. AssetResolver reduces the libraries to a miminal representative subset via ->getLibrariesToLoad(). This reduces the list from 36 libraries to 32. Then it expands it back out to include all the library dependencies, which expands the set to 62.
    4. Next, AssetResolver gets all the CSS assets that should be loaded from those libraries.
    5. Finally, since CSS aggregation is on, AssetResolver asks CssCollectionOptimizerLazy->optimize() to optimize the set of CSS files into aggregates.
    6. CssCollectionOptimizerLazy groups the assets into aggregate groups via CssCollectionGrouper, then returns a replacement list of aggregate filenames. In my example, this creates 4 separate "groups". The 3rd group is just a single external asset. (external assets force a "break" in a group). Each aggregate file gets an "include" query string param which lists out all the libraries the page needs. Note the list of 62 libraries is not used. Instead it's passed though LibraryDependencyResolver->getMinimalRepresentativeSubset(). This reduces it down to 38 libraries.
    7. Note that this list of 38 libraries is different than the original list of 36 libraries that the page needed. This is because the original page's libraries have been reduced, expanded, and then reduced again, which did not produce the same list.

    So, at this point, the HTML page has 4 CSS aggregate files output on the page. Deltas 0 through 3.

    Now, the request is made for the aggregate files when the page loads. It passes the list of 38 libraries for each aggregate. The request for the last aggregate fails with a 404.

    1. CssAssetController tries to validate the request. It does this by expanding the list of assets into groups and verifying the requested group delta is good.
    2. This is done by first asking AssetResolver->getCssAssets() for the assets associated with the libraries. AssetResolver reduces the list the list via ->getMinimalRepresentativeSubset(), which first reduces the list from 38 to 31, then expands it to include all dependencies for those to 62.
    3. AssetResolver then finds all the CSS files needed for all 62 libraries, sorts them by weight.
    4. CssAssetController takes the sorted assets then groups them via CssCollectionGrouper. It verifies the group delta requests actually exists. But the number of asset groups that CssCollectionGrouper returns is 3, not 4!
    5. Request fails for the last aggregate as Drupal thinks that only 3 aggregate groups should be needed for the requested list of libraries.
  • 🇬🇧United Kingdom catch

    @bkosborne this sounds like an ordering/weights issue - are you able to check whether the external library is somehow at the begining or end of the groups in CssAssetController?

    Or is it that the external asset is somehow not counted when calculating the delta?

  • 🇺🇸United States effulgentsia

    @bkosborne: I've heard reports from coworkers seeing something very similar to #3, but haven't found steps to reproduce it. If you (or anyone else seeing this) can figure out a way to get from a newly installed Standard Profile site into a configuration (which magic combo of contrib/custom modules is needed?) that puts you into that 404-every-time state, that would be super helpful to debugging and fixing it!

  • 🇺🇸United States effulgentsia

    Actually, when I wrote #5, I misunderstood what my coworkers were seeing. In fact, they're seeing the same as the original issue summary, where each response is a 200 (not a 404), but it's a PHP request every time, because the hash in the src attribute of the script tag on the page doesn't match the hash that gets generated in AssetControllerBase::deliver, so in this case each page view keeps requesting the former hash, and each request hits PHP instead of a file on disk.

    I can also now confirm that I'm able to reproduce this with the same steps as in the issue summary and for the same reason as explained in #2.

    Retitling for clarity (I missed that this was the pattern when I first read the issue, despite the issue summary actually explaining it quite clearly), and bumping to Major (frankly, I think Critical could even be justified since this can be awful for high traffic sites).

  • Status changed to Closed: duplicate about 1 year ago
  • 🇺🇸United States effulgentsia

    I can also now confirm that I'm able to reproduce this with the same steps as in the issue summary

    That's because I followed the exact steps, including installing Drupal 10.1.5. I think this got fixed in 10.1.7 and 10.2.0 in 🐛 LibraryDependencyResolver::getMinimalRepresentativeSubset() calculates dependencies incorrectly Fixed . Please re-open this if there's still a bug related to this in those newer versions.

  • Status changed to Active about 1 year ago
  • 🇬🇧United Kingdom catch

    I just tried to reproduce this with 10.2.x and webform 6.2.2 and was unable to - can the steps to reproduce be updated please?

  • Status changed to Closed: duplicate about 1 year ago
  • 🇬🇧United Kingdom catch

    Crossposted - OK that confirms what I'm seeing locally too.

  • 🇺🇸United States bkosborne New Jersey, USA
  • Status changed to Needs work 6 months ago
  • 🇺🇸United States djdevin Philadelphia

    Seeing this in 10.3.1.

    Adding a breakpoint on AssetGroupSetHashTrait::generateHash() shows a differently weighted list when it is called on the current page vs. from the .js file, so the hash_equals() in AssetControllerBase fails and it doesn't cache the file.

    I don't think generateHash() is the problem because the weights are already wrong for some reason.

    Trying to reproduce outside of our specific site, but it's a list of 56 JS files and the middle of the list is out of order, resulting in a different hash.

    We do have a lot of library dependencies so something is not taking those into account.

    Reverting to 10.2.6 caches it to disk as expected.

  • 🇺🇸United States djdevin Philadelphia

    I just added a print_r(array_column($group['items'], 'data')); right before it gets passed to generateHash and the order is wrong causing the hash mismatch.

  • First commit to issue fork.
  • 🇩🇪Germany harkonn

    We are also having problems since the change in 🐛 LibraryDependencyResolver::getMinimalRepresentativeSubset() calculates dependencies incorrectly Fixed .

    TLDR: We have a fix for sorting the AssetResolver so that the groups and hashes in the initial page request and the subsequent asset requests are the same.

    Details:
    We experience HTTP 400 errors from js-assets with the error:

    Symfony\Component\HttpKernel\Exception\BadRequestHttpException: Invalid filename. in Drupal\system\Controller\AssetControllerBase->getGroup() (line 225 of /var/www/projectname/docroot/core/modules/system/src/Controller/AssetControllerBase.php)
    

    The problem in our cases is that some of the libraries have preprocess = false in their libraries description which leads to different sizes of JsCollectionOptimizerLazy->optimize on the page request and AssetControllerBase->getGroups on asset requests.

    The different sizes are a result of the sorting in AssetResolver. Currently this class sorts the same weights slightly different, depending on their place in getLibrariesToLoad. Our approach is ordering by weight as before but ignoring the array index and ordering by name if weight is the same.

    I opened an issue-fork for this change:
    https://git.drupalcode.org/issue/drupal-3397713

  • 🇩🇪Germany harkonn

    Sadly the fix in my issue-fork is having other side-effects.

    Our main problem is the different sorting of groups between requests, leading to different array sizes. The underlaying method explicitly warns about the different sortings in AssetResolver->getLibrariesToLoad().

    The problem is also descriped in #3370930 and not fixed for us. It is really hard to reproduce. We use a lot of modules, including field_group, paragraphs, gin, ckeditor_abbreviation and lots more. The problem cant be dumped to a specific set for me.

    We helped ourself with switching the weight of modernizr from -21 to -22 in drupal 10.3, but i have the feeling it is just a lucky fix for us. Both modernizr and touchevents-test have weight -21 in core.libraries.yml and they switched indexes and then array sizes this way in our projects.

    Since modernizr is gone in Drupal 11, i cant apply this change to the issue fork. This is the patch we use for drupal 10.3:

    10.3.x-core-reorder_modernizr_library.patch

    Can someone remove my issue-fork for 11.x please? :)

  • 🇬🇧United Kingdom catch

    Two files with the same weight is worth investigating. We might need to explicitly sort files that have identical weight between themselves to make it reliable.

  • 🇬🇧United Kingdom catch

    Also this is likely to be fixed by 📌 Replace custom weights with dependencies in library declarations; introduce "before" and "after" for conditional ordering Needs work which removes weights entirely and relies on dependencies.

  • 🇺🇸United States Nuuou Lincoln, NE

    Our team started encountering this after updating to 10.3.1 (and 10.3.2).
    It seems fairly sporadic on when it happens so it's been tough to narrow down what combination of conditions happens to trigger this.

    Reverting #3400485 seems to have fixed this for us. Hoping someone smarter than I can figure out what's happening here!

    In any case, here's a patch reverting #3400485.

  • 🇬🇧United Kingdom catch

    🐛 LibraryDependencyResolver::getMinimalRepresentativeSubset() calculates dependencies incorrectly Fixed was introduced to fix the same symptom reported here, so it looks like it fixed some cases, but broke others in the meantime.

    The underyling reason for this is that library/asset ordering is unpredictable. To make it more predictable, we should do 🐛 LibraryDependencyResolver::getMinimalRepresentativeSubset() calculates dependencies incorrectly Fixed . It would be great if someone who can consistently reproduce this issue (it looks like @djdevin can) could try that MR locally and see whether it helps or not. There's still some work to do over there, but the main logic change is done.

    If it doesn't fix the problem, it's possible that the modules declaring weights in their libraries that are causing the mismatch, would also need those weights removed and switch to before/after + dependencies as well.

  • 🇧🇪Belgium swentel

    Bitten by this too - and I thought 🐛 Logic error in Drupal's lazy load for asset aggregation Active was the actual fix, but it ain't (yet). Reverting fixes some things for us (in our case, gin admin theme which breaks stuff in some cases for webmasters on node edit forms where the JS completely broken) .. will try to figure out if I can create a patch for gin - hints welcome as I'm not to familiar yet with the aggregation system)

  • 🇧🇪Belgium swentel

    Follow up: changing the weight of modernizer in 10.3.x fixes (as in #15) it for us as well, but also not sure if this is the 'right' fix :)

  • 🇺🇸United States djdevin Philadelphia

    I was unable to test the MR because we haven't updated to 11.x yet and it doesn't apply to 10.3.x The modernizr tweak didn't seem to work for me, but we have a ton of libraries so it could easily be another library with weights.

    Something else I'm also seeing related to performance is a ton of these queries (1000s) under heavy traffic:

    Very likely old cached JS/CSS calls.

    Seems I can just make up some random hash and pass all the query parameters, and it will make several DB calls. Could easily DDOS the site as 4xx errors are usually not cached. Since no JS file is ever created it keeps crunching.

    https://drupal/sites/default/files/css/anything.css?delta=0&language=en&...

  • 🇬🇧United Kingdom catch

    @swentel given modernizer isn't in 11.x at all, it seems fine to adjust the weight of it in 10.x to mitigate this even if it doesn't fix it for everyone. It might help us narrow down the remaining cases too. We probably won't be able to backport 📌 Replace custom weights with dependencies in library declarations; introduce "before" and "after" for conditional ordering Needs work to 10.x so a less intrusive fix would be good.

  • 🇧🇪Belgium swentel

    @catch : I spoke to soon (again), weirdly enough, the fix works on my local installation, but not on production, so back to debugging :)

  • 🇧🇪Belgium swentel

    Looks like the modernizer 'fix' was just a red herring. It's becoming weirder if I revert the weight change because I now have a scenario locally where I can consistently break and fix the node edit screen (gin admin theme with gin toolbar, field groups)

    - clear cache
    - load node/x/edit as user 1: everything is fine
    - load node/x/edit as webmaster: everything is fine
    - clear cache
    - load node/x/edit as webmaster: js broken (drupalsettings not found at all)
    - load node/x/edit as user 1: everything ok
    - load node/x/edit as webmaster again: everything is fine! (WUT!)

    It's crazy. I suspected 🐛 Library information alter should not be context-aware Closed: outdated at first, but returning FALSE in _gin_toolbar_gin_is_active() as a test doesn't do the tricky, so now looking at gin theme itself

  • 🇬🇧United Kingdom catch

    - load node/x/edit as webmaster: js broken (drupalsettings not found at all)

    This is very different to a hash/ordering mis-match in the assert controller, is it on top of that?

  • 🇧🇪Belgium swentel

    Not sure if it's a mismatch or not, but the call to an aggregated JS URL (which is identical for the webmaster whether it fails or not) throws the 'Invalid filename. in Drupal\system\Controller\AssetControllerBase->getGroup() (lne 225)) exception as webmaster, but not anymore after reloading, AFTER I reloaded as user 1. Fun right? :)

    If I can set a breakpoint somewhere to compare things, let me know, I can try and dig further then!

  • 🇬🇧United Kingdom catch

    hmm maybe compare the results of

    JsCollectionGrouper::group() in the parent/HTML request.

    with JsAssetController::getGroups() in the asset controller request.

    If there's a group missing in the js asset controller, then ought to be different assets arrays being passed in between the two groups.

    The best bet then is to dump those to one file each in PHP export format of choice, and compare the contents. Something would either be missing, or be in a different order.

    Also would be good to know if you're on Drupal 10.2 or 10.3. Drupal 10.3 has 📌 Only send libraries with aggregate URLs that have the aggregate type included Needs review which excludes CSS-only libraries when calculating the js library hash, which ought to at least partly help with situations like this.

  • 🇧🇪Belgium swentel

    Ok, I'll be honest, I'm a bit lost at what exactly I'm looking at, but at some point I started looking at modules which have dependencies on modernizr and field_group still has one for horizontal tabs. Now, given that Modernizr is deprecated in 10.3 (this project runs on 10.3.0) and is removed in 11, I tried removing the references and everything started working.

    I've opened 📌 Remove dependency on Modernizr Active to start removing that. The part which confuses me is that everything is fine when using Claro as admin theme, but I'm lost while debugging, and I'm fine fixing it in field group :)

  • 🇺🇸United States effulgentsia

    Adding 🐛 Aggregated asset generation causes uncacheable assets Active as a related issue. It doesn't fix the root cause being discussed in this issue, but it possibly provides a workaround that changes repeated hits to origin into just one extra hit to origin. For anyone encountering this issue, if you test out the MR in that issue, please comment over there with what you find.

  • 🇩🇪Germany jocowood Kamp-Lintfort

    I am a colleague of harkonn . For us, the new field_group version 3.6 fixes the problem, because it drops the dependency on modernizr with the issue 📌 Automated Drupal 11 compatibility fixes for field_group - Fixed PHPUnit on GitlabCI RTBC .

    We were able to break the problem down to the fact that JS library definitions with the same weight lead to the following behavior:

    1. the function AssetResolver::getJsAssets uses the function AssetResolver::getLibrariesToLoad to determine the list of libraries to be loaded in this request

    2. using debug and the documentation, it becomes clear that the returned list does not have a stable sort order. This means that the order can vary per request.

    3. in the further course of AssetResolver::getJsAssets, a small value is added to the later weight of the library, which depends on the position in the original list:

    // Always add a tiny value to the weight, to conserve the insertion
    // order.
    $options['weight'] += count($javascript) / 30000;

    4. as a result, the libraries that previously had the same value now have a slightly different value, which is a good idea at first. The whole thing becomes problematic if the original list now has different sequences in different requests and the final sorting is therefore not stable.

    5. later in AssetResolver::getJsAssets, JsCollectionOptimizerLazy::optimize is called which calls JsCollectionGrouper::group at the very beginning. The division into groups is in turn dependent on the order of the input list, as the current entry is always compared with the previous entry. If an entry now jumps to a different position, this changes the groups and, in the worst case, the number of groups. (Please see the *example)

    6. it can happen that a URL parameter is used in a subsequent request to access a js asset group (index) that does not even exist there.

    7. broken

    (*) An example.
    The documentation for the grouping states the following

    // Group file items if their 'preprocess' flag is TRUE.
    // Help ensure maximum reuse of aggregate files by only grouping
    // together items that share the same 'group' value.

    If we now have the following sequence of JS assets for the first request
    A (preprocess: true)
    B: (preprocess: false)
    C: (preprocess: true)
    then we have three groups.

    If we now have the following changed sequence in the subsequent request
    A (preprocess: true)
    C: (preprocess: true)
    B: (preprocess: false)
    we only have two groups.

  • 🇬🇧United Kingdom catch

    @jocowood so 📌 Replace custom weights with dependencies in library declarations; introduce "before" and "after" for conditional ordering Needs work changes all this logic significantly and allows us to deprecate asset weights altogether. However even if we're able to backport that to Drupal 10.4, it won't be backported to 10.3 (and it might only get into 11.1 or later).

    I think the question for this issue is whether we can find a shorter-term bugfix within the existing logic.

  • 🇩🇪Germany jocowood Kamp-Lintfort

    Yes, I completely agree with you @catch. We have an eye on that issue, too :-)

  • Pipeline finished with Canceled
    5 months ago
    Total: 254s
    #267649
  • Pipeline finished with Failed
    5 months ago
    #267653
  • Pipeline finished with Failed
    5 months ago
    Total: 502s
    #267658
  • Pipeline finished with Failed
    5 months ago
    #267669
  • 🇬🇧United Kingdom catch

    I tried a new branch based on what you would think would be the easiest fix here - just order the incoming list of libraries alphabetically so that page insertion order doesn't matter. The problem is that because we don't already do that, page insertion order actually does matter, even just in core.

    There was only one actual test failure in js tests, but it shows the problem with the current logic: https://git.drupalcode.org/project/drupal/-/jobs/2583199

    drupal.vertical-tabs specifies a weight so that it comes before drupal.collapse but that in turn can place it before drupal.form which it depends on. Adding a negative weight to drupal.form in turn sets off a whole chain of failures https://git.drupalcode.org/project/drupal/-/jobs/2583239

    Removing the weight allows functional js tests to pass, but still fails AssetOptimizationTestUmami which is designed to catch (at least some of) the problems described here https://git.drupalcode.org/project/drupal/-/jobs/2583358

    I wouldn't want to discourage anyone from working on a quick fix here, but:

    1. If contrib modules like field_group dropping modernizer dependencies fixes this for a lot of sites that sounds good.

    2. I think we should just try to get 📌 Replace custom weights with dependencies in library declarations; introduce "before" and "after" for conditional ordering Needs work in, it's in decent shape and mostly needs reviews and a good outline of next steps.

  • Pipeline finished with Failed
    5 months ago
    Total: 549s
    #267688
  • 🇬🇧United Kingdom catch

    🐛 Logic error in Drupal's lazy load for asset aggregation Active is working on the same logic and a bit further along than where I was above, marking postponed on that issue.

  • Status changed to Postponed 5 months ago
  • 🇬🇧United Kingdom catch
  • 🇬🇧United Kingdom catch

    There is a green MR on 🐛 Logic error in Drupal's lazy load for asset aggregation Active now and it would be great if people experiencing the issue reported here could test it.

  • Status changed to Closed: duplicate 3 months ago
  • 🇺🇸United States moshe weitzman Boston, MA

    That got merged into 11 and 10.4 so closing this. Please reopen if I am mistaken.

Production build 0.71.5 2024