[Meta] Allow StreamWrapper's to provide cacheability metadata

Created on 4 May 2023, over 1 year ago
Updated 25 July 2023, over 1 year ago

Problem/Motivation

TLDR

TL;DR: The lack of cacheability metadata in StreamWrappers make it impossible to create links to files that are not either "static unsafe URLs without access-time check" or "static URLs with mandatory access-time check". Hybrid systems with expirable URLs are useful in many circumstances but create conflicts with Drupal's page caches because a URL currently can't have cacheability metadata.

Background

By default Drupal forces installations to choose between a public file system or a private file system. The public file system has as advantage that links can be served by a webserver rather than Drupal but this offers no protection. The private file system uses static links for the files which requires Drupal to check access for every request, making the files uncacheable.

For sites that have a large amount of users and served files that require protection, the public file system is undesirable because any leaked link never expires, but the private file system is also undesirable because it can put a huge strain on Drupal. This can be seen in community platforms like built with Open Social where a majority of the traffic is authenticated but will still see similar pages.

StreamWrappers provide a good way to build a hybrid system where the access of the field containing the image proofs access, allowing the generation of a static but expiring link. This provides a good combination of public and private file system where a leaked link will stop working at some point but all requests in the meantime can be served by a CDN.

However Drupal does not currently allow providing cacheability information from a StreamWrapper which means that a link with an expiration date could get stuck in a page cache and be served to future users leading to a broken page.

flysystem β†’ could be another benefactor of this with the underlying library supporting "Temporary URLs" for many cloud providers, but the Drupal module not currently being able to support those types of URLs.

Drupal Core Obstacles

Drupal works mostly with Url objects to handle information about Urls. From that it generates a GeneratedUrl class which contains the Url that should be output as well as cacheability information. It's important that these are separate because the GeneratedUrl is essentially a string + cacheability information. That cacheability information depends on Url parameters (e.g. an absolute URL needs to change if the site.url changes).

However it should be possible to provide Urls with their own cacheability information too (for example they might be depending on a parameter that expires, such as a signed access token), this is currently not possible.

LinkGenerator::generate currently seems to assume that external URLs can have no cacheability metadata, but the above examples proof that to be false.

Proposed resolution

With those things in mind, three steps are needed to give StreamWrappers the ability to provide cacheability information:

Remaining tasks

Review or create issues.

User interface changes

API changes

Data model changes

Release notes snippet

🌱 Plan
Status

Active

Version

11.0 πŸ”₯

Component
File systemΒ  β†’

Last updated about 3 hours ago

Created by

πŸ‡³πŸ‡±Netherlands kingdutch

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @kingdutch
  • πŸ‡³πŸ‡±Netherlands kingdutch

    Adding a reference to the Flysystem β†’ module which would also greatly benefit from this addition.

    Linking πŸ“Œ [meta] Modernize File/StreamWrapper API Active which considered using the Flysystem library as base for a modernised Drupal filesystem.

  • πŸ‡ΊπŸ‡ΈUnited States cmlara

    Linking
    πŸ› [PP-1] S3 Access Denied Request has expired - getExternalUrl() cached beyond presign timeout Postponed and ✨ Add support for private uploads / presigned URLs Postponed as real-world issues where this has been encountered.

    🌱 [meta] Use FileUrlGenerator::generate() everywhere, then deprecate generateString() and generateAbsoluteString() Active suggests we should just use URL objects everywhere instead of strings, its a larger break though maybe its better if URL objects are extended to carry capability data themselves and we have streamWrappers return URL objects? Could possibly be done with a method_exists call to a new getExternalUrlObject() method inside the FileUrlGenerator making it BC compatible with old streamWrappers.

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    We already have a generatedUrl class. But yes, we really should return Url objects so we don't have to switch back and forth

  • πŸ‡³πŸ‡±Netherlands kingdutch

    I wasn't aware of the GeneratedUrl class, that's basically my CacheableString proposal :D So that issue would indeed solve this one.

    If we implemented the __toString magic method on GeneratedUrl then we could walk the same backwards compatibility path as outlined in the IS (with the same caveats).

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    I think GeneratedUrl deliberately doesn't support __toString(). They way it works is that Url::toString() has an optional argument where you pass TRUE you get a GeneratedUrl object back.

    But as I said, we should instead find a way to transition to return Url objects, so we don't need to do the kind of hacks we currently need to in \Drupal\Core\File\FileUrlGenerator::generate() and can deprecate generateString() and generateAbsoluteString() in the long-term.

    Will require some optional interface shenanigans with a new method and then deprecate the old method and deprecate on stream wrappers not yet implementing that. It's unfortunately already called getExternalUrl(). I guess the good thing is that it's very, very rarely called directly. there are no calls to it in core outside of FileUrlGenerator() as far as I see and only 2 in the contrib projects that I have in this project. So we could do a double deprecation and then in D12 go back to the old name if we care enough, or we just live with something like getExternalUrlObject() for the future.

  • πŸ‡ΊπŸ‡ΈUnited States cmlara

    I'll note that just switching to Url objects may not actually solve the lower issue of cacheable metadata on its own.

    I believe unless the scheme is 'base' (and maybe entity,internal, or route) Url::class will tag the path as External and locations in core will assume this means no cacheable metadata exists.

    For example in in LinkGenerator.php:

        // External URLs can not have cacheable metadata.
        if ($url->isExternal()) {
          $generated_link = new GeneratedLink();
          $attributes['href'] = $url->toString(FALSE); // <-- Request to generate the string without metadata
          return $this->doGenerate($generated_link, $attributes, $variables);
        }
    

    I wonder if the more correct solution should be to remove the assumption that External URL's do not have cacheable metadata and allow the UnroutedUrlAssembler to process outbound route processors to add the data. This would work without changing the streamWrapper getExternalUrl() or modifying the Url class to support cacheable metadata.

  • πŸ‡­πŸ‡ΊHungary GΓ‘bor Hojtsy Hungary

    Updating title, tags and version number based on recent announcement at https://www.drupal.org/about/core/blog/new-drupal-core-branching-scheme-... β†’

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    FWIW, I don't think this is actually a D11 issue, instead, this will need to be done in a backwards compatible way in a minor release.

  • πŸ‡­πŸ‡ΊHungary GΓ‘bor Hojtsy Hungary

    Yay! Keeping the branch then but removing the title prefix and tag :)

  • πŸ‡³πŸ‡±Netherlands kingdutch

    I'm trying to figure out how to rewrite the issue summary for the new direction proposed by Berdir and cmlara. However one thing I'm struggling with is that from what I can see, the metadata is not actually stored on the Url object.

    From what I can tell the caching metadata is instead provided by Url processors. So after moving to Url objects in the stream wrapper and removing the assumption that external URLs don't have metadata we would need to either:

    1. Apply existing path processors to external URLs too (which might confuse them and break sites)
    2. Create a parallel path processing system for external URLs
    3. Do something I'm missing

    I'm somewhat surprised that Url classes can't have metadata themselves, since in the example cases the cacheability info would already by known when the StreamWrapper creates the Url class (e.g. for time limited URLs) but it might be more difficult to figure that information out later in a path processor.

    What are your thoughts here?

  • πŸ‡¨πŸ‡¦Canada Charlie ChX Negyesi 🍁Canada

    The Url object stores information like "the route name is entity.node.canonical and route parameter node is 123". Now, when you generate the actual path from that, the result does need caching information on it, the path can change when the node changes. So what we need here is, to speak JavaScript is string.prototype.getCacheMetadata but since this is PHP we have a GeneratedUrl class which has a getGeneratedUrl() method to get the generated path string and then the three cache metadata methods. Not elegant but what can one do? The day may come when PHP becomes a Smalltalk derivative but it hasn't yet.

    Currently StreamWrapperInterface::getExternalUrl() returns a string, what Berdir suggests wisely is a StreamWrapperInterface::getExternalUrlObject() method which instead return GeneratedUrl objects which would allow collecting caching metadata. Right now in core only the FileUrlGenerator calls getExternalUrl() so the impact would be small. But, once you have a GeneratedUrl object which is an instance of BubbleabbleMetadata you can do real neat tricks with it like make a CacheableResponseInterface out of it and so forth.

    So the proposal is this: add getExternalUrlObject() to StreamWrapperInterface returning GeneratedUrl.

    Now comes the version dance... Since this would break existing stream wrappers, I think it would need to be 11.x. We could deprecate getExternalUrl() in 11.x and in 12.x rename getExternalUrlObject () to getExternalUrl() because by then there should be no calls to the old methods. Or we can just live with getExternalUrlObject(). We could even do some interface shenanigans where we introduce an ExternalUrlObjectCapableStreamWrapperInterface , deprecate, remove etc etc etc my head hurts :D

  • πŸ‡ΊπŸ‡ΈUnited States cmlara

    I would agree, CHX has summed up the solution well. The best long term solution is having the streamWrapper return a GeneratedUrl object. This is where we are generating the URL so this is where it makes the most sense to catalog the related dependencies.

    I would suggest the method could be getGeneratedUrl() if we want something that actually describes what it does (I'm not fond of getExternalUrl it name misleads what it does) to avoid a version dance around renaming. No need to make this more complicated than it needs to be, lets make this a simple Deprecate a method while adding a replacement method routine (along with the associated new interface for new methods/interface deprecation routine)

    After that than we can solve the problems of getting that metadata to where it needs to go in a followup issue.

  • πŸ‡¨πŸ‡­Switzerland berdir Switzerland

    > Currently StreamWrapperInterface::getExternalUrl() returns a string, what Berdir suggests wisely is a StreamWrapperInterface::getExternalUrlObject() method which instead return GeneratedUrl objects which would allow collecting caching metadata.

    That is not what I suggested. GeneratedUrl doesn't help to solve 🌱 [meta] Use FileUrlGenerator::generate() everywhere, then deprecate generateString() and generateAbsoluteString() Active . I suggested a URL object, but wasn't aware that it doesn't support cacheability metadata. That is unfortunate. We either need a version of Url that does that or we need a second by-reference argument to collect cacheability metadata, like we do elsewhere.

  • πŸ‡³πŸ‡±Netherlands kingdutch

    Thanks for the additional insights.

    Should we create a separate issue to discuss adding cacheability metadata to the Url class? #3 already suggested it and #14 also indicates it would help.

    I have a week of holiday starting, after that I'd be happy to dive into this during Drupal Dev Days and see if I can work on a fix. However, there seem to be some other moving parts, like the references issues, that I'll have to familiarize myself with before then. Unless anyone beats me to it, I'll also use that time to provide an updated issue summary, since the current proposed solution is based on incorrect assumptions.

  • πŸ‡³πŸ‡±Netherlands kingdutch

    I created ✨ Allow links to external routes to gather metadata in LinkGenerator::generate Needs work to solve #7 which seemed like an easier step to start with.

  • πŸ‡³πŸ‡±Netherlands kingdutch

    I'm converting this into a meta issue since I think there's no single patch that can solve this issue, but instead we can fix this as various sub-issues that together solve the problem.

Production build 0.71.5 2024