- Issue created by @ericgsmith
- 🇳🇿New Zealand ericgsmith
Adding a patch for skipping the thumbnail as a starter.
- Status changed to Needs review
over 1 year ago 7:18pm 13 March 2023 - 🇳🇿New Zealand RoSk0 Wellington
Skipping a thumbnail is not an option image , if I understand this correctly, - it can lead to data leak as thumbnail, when not moved to the private storage.
Leaving NR for now to get feedback from maintainers.
- 🇳🇿New Zealand ericgsmith
The file field on the media is still altered, which triggers the thumbnail to be set / updated.
But yes, there is potential disclosure here - further testing I found the thumbnail of oembed media entities should not be excluded here. Possible other scenarios as well.
Another option is to either check the uri starts with media settings icon_base_uri config value - and skip if it does.
Or introduce configuration to the module to specify uris or Uri patterns that can be skipped.
- 🇳🇿New Zealand ericgsmith
Added a test to show the issue - its a bit rough as I couldn't find a lot of examples moving files around - I suspect there is a more elegant way to do all the setup.
Second patch has alternative to #2 using the configured generic icon path to skip the file rather than skipping thumbnails all together.
The last submitted patch, 7: file_access_fix-skip-thumbnail-field-3347701-7-test-only.patch, failed testing. View results →
- codesniffer_fixes.patch Interdiff of automated coding standards fixes only.- 🇩🇪Germany geek-merlin Freiburg, Germany
Thanks for the briliant issue summary, test case and fix. Made it easy for me to grok.
Let me think aloud about the general picture here first.Yes, this may be solved by a general mechanism to opt out of file moving, let's open that issue (and maybe later implement that on behalf of media module).
Also, the point is correct that files with very many references can cause performance issues is correct, let's open an issue for this.I do not completely grok the media thumbnail system, afaik it copies over thumbnails from modules and other sources to the public filesystem. Should it rely on the thumbnails staying where they are (= what is the contract after all, and who is violating it).
Or maybe our module should (in a new major version) change its behavior from managing all fields to only manage fields that are explicitly opted in?
Thoughts?
- 🇩🇪Germany geek-merlin Freiburg, Germany
Interesting: Media thumbnail access is part of current SA.
https://www.drupal.org/sa-core-2023-002 →
https://git.drupalcode.org/project/drupal/-/commit/1412919ccab30a83ca238...This fixes access in a formatter.
I wonder: How is media thumbnail being private / public determined in the first place? - 🇳🇿New Zealand ericgsmith
Thank you for the quick reply - and yes very interesting timing with the security update.
I have added 2 related issues as suggested.
The media thumbnails are a very interesting one to digest as they are very variable. I spent a while debugging this so will share how I read what is happening. There are probably a lot of additional things done in contrib that could effect this, but this is how I understand core.
Media thumbnails are the responsibility of the media source plugin. From the docs for the source interface:
* Their responsibilities are:
* ....
* - Providing thumbnails. Media sources that are responsible for remote
* media will generally fetch the image from a third-party API and make
* it available for the local usage. Media sources that represent local
* media (such as images) will usually use some locally provided image.
* Media sources should fall back to a pre-defined default thumbnail if
* everything else fails.As far as using only core for media - the thumbnails are a readonly field and so don't get exposed in the UI.
The answer to "How is media thumbnail being private / public determined in the first place?" depends on the media type.
When media is installed it sets icon_base_uri in config entity media.settings to public://media-icons/generic and moves a bunch of files there. From that point, a site builder could if they wanted to specify that this should be a private or other scheme / directory.
File, AudioFile and VideoFile specify a default icon file, then look for an icon in that folder (the icon base folder from config) to match their mime type when they create or update the thumbnail.
E.g. for File https://git.drupalcode.org/project/drupal/-/blob/10.1.x/core/modules/med...
AudioFile and VideoFile extend file, and just set a different default in their annotation: https://git.drupalcode.org/project/drupal/-/blob/10.1.x/core/modules/med...
Image is different - image will set the URI as the same URI as the source image field https://git.drupalcode.org/project/drupal/-/blob/10.1.x/core/modules/med...
This is good, as it means if the source is private, then the thumbnail will be too as its the same URI.
OEmbed is different again - as it fetches and saves the thumbnails, it exposes the thumbnail directory as part of the media source configuration (on the media type bundle) - https://git.drupalcode.org/project/drupal/-/blob/10.1.x/core/modules/med...
So that is up to the website to configure if they want the directory to be public or private.
There are many contrib modules doing things with media, such as pdf thumbnail generation for documents or custom source plugins who would define their own thumbnail logic. Who know what mysteries may lie out there in the contrib world!
I am probably biased towards a solution that fits my current issue, but my summary of those behaviors is:
- file/ audio file / video file feel like they should not move the icon, as once moved the original config is used for the next created entity, and that is now missing. Moving the file is corrupting existing configuration as there is no long a file there to use.
- image, is not hugely effected by this conversation as it syncs the uri to the source field
- oembed, I view this behavior similar to how this module works with regular fields - I would want the thumbnail image to moved to private file system of the host entity is not accessible anonymously. - 🇩🇪Germany geek-merlin Freiburg, Germany
Yes i always try to take time for good energy. And thanks for the issues!
Added implementation idea to 🐛 Files with very many references can cause performance issues Active ,
and re-purposed 📌 Add field instance setting to opt in to file moving Active so that it should fix this too.What u think?
- 🇳🇿New Zealand ericgsmith
Thanks.
Replying here as its mostly relevant still to the generic thumbnails - but I think per field instance would still be problematic for oembed source plugin and other plugins that mix generating thumbnails and using the fallback.
If the oembed thumbnail is opted in to moving files, there is a possibility that it will fall back to the generic icon, which would then break other media types not moving the generic icon.
It also could introduce complications where generated thumbnails can be queued - I missed this earlier, but if the thumbnail is generated and queued it will uses the generic icon file id when created, and then on cron a thumbnail is generated and saved. In this case, we have a situation where we want to still ignore the generic icon (as its in public://media-icons/generic) and then when the actual thumbnail is generated we want to apply the logic to that file.
A field instance approach could also lead to complications with generic icons being shared across multiple media types which have different settings.
I'm not confident field level settings can resolve this issue and I think a feature to opt out based on a file URI / pattern / prefix or some configurable way to target the actual files we care about is the only solution my brain can think of at the moment.
- 🇩🇪Germany geek-merlin Freiburg, Germany
My thought was that sitebuilders should actively choose to opt-in, being warned about the downsides like performance and not working in some circumstances. And yes, not sure if per-instance is the best choice.
As of media thumbnails: I thought that the media thumbnail logic relies on their files staying in place, so the move logic should simply not be enabled for that field. This means, if media thumbnail is on private filesystem, these files are served by php, which has its cost, but is better than a broken system.
What do you think and propose?
- 🇳🇿New Zealand ericgsmith
I thought that the media thumbnail logic relies on their files staying in place, so the move logic should simply not be enabled for that field
I think this is true only for the default / shared thumbnail icons. There are cases (outlined in #12 and #14) where I think it is not desirable to skip them. (Apologies for confusion, I don't know if there is a proper term or name to distinguish the generic / shared icons from the per other thumbnails).
I think the patch in #7 covers this scenario but is limited just to this edge case.
The other alternative if it needs to be more generic would be to add configuration for a blacklist / deny list of URIs to not move.
That way somebody could add
public://media-icons/generic/*
to that list, and then when the file is being checked the module can do an additional check to see if the file URI matches something in the deny list and leave it alone if it matches. Essentially doing what #7 does more flexible / configurable. - 🇳🇿New Zealand RoSk0 Wellington
I believe that, to properly resolve this issue, we need to introduce entity type based handlers:
generic
- which will handle file field operations in the same way as the module currently doesmedia
- extendinggeneric
to add additional set of handlers based on media source plugin type:default
- which would handle File, AudioFile and VideoFile and do nothing with thumbnailsimage
- which would handle Image and check if a file URI in thethumbnail
property matches default one configured in media.settings:icon_base_uri than it would skip action , because those are default thumbnails shared by many media itemsoembed
- which would handleOEmbed
source type. This one should not handle thumbnails at all, so might bere-usingdefault
here asOEmbed
source type exposes thumbnail directory configuration to a user. If userplaces thumbnail directory in the private file system we definitely don't want to move those files into public. Ifthe thumbnail directory is placed in public directory than there is no point to of moving thumbnails into private asthose resources , video from YouTube/Vimeo , are already available to public from respective websites.
And issue 📌 Add field instance setting to opt in to file moving Active should be closed.
I'm happy to start implementation if we have an agreement on the proposed resolution.
- @rosk0 opened merge request.
- 🇳🇿New Zealand RoSk0 Wellington
I went ahead and started implementing suggested in the #17 approach.
It ended up a little bit simpler - I don't think there is a need for two types of handlers, only one per entity type. There are three handler plugins defined :
File
- no-op handler that dropsFile
entities from the scopeGeneric
- handles all fieldable entity typesMedia
- for the media entities
Also I dropped deprecated
file_move()
and marked module as Drupal 10 compatible. Probably too ambitious , but hey, there are no released versions (which we need to address as well) and it would make it easier to test for compatibility.This is not a final revision. There are still lots of @todo's and more tests are needed probably.
Also adding a patch version of the merge request.
Bumping priority as it appears as such to me - this adds a proper behaviour, prevents data corruption and adds actual testing of the behaviour, which is quite important for access/security type of modules.
- 🇳🇿New Zealand RoSk0 Wellington
Have test passing locally on Drupal 10 and PHP 8.1, lets see what it will be with Drupal 10.1-dev and PHP 8.2 here https://www.drupal.org/pift-ci-job/2635602 → .
- Status changed to Postponed: needs info
over 1 year ago 8:51pm 9 April 2023 - 🇩🇪Germany geek-merlin Freiburg, Germany
Thanks to work on this and let the code speak.
FTR, what this code does is essentially this:
+final class Media extends ContentEntityHandlerPluginBase { + + protected function isApplicable(FieldableEntityInterface $entity, FieldItemListInterface $field): bool { + + /** @var \Drupal\media\MediaInterface $entity */ + $source = $entity->getSource(); + + switch ($source->getPluginId()) { + case 'file': + case 'audio_file': + case 'video_file': + case 'oembed': + if ($field->getName() === 'thumbnail') { + return FALSE; + } + + break; + + case 'image': + if ($field->getName() === 'thumbnail') { + $default_icons_base = \Drupal::config('media.settings')->get('icon_base_uri'); + $thumbnailUri = $field->first()->entity->getFileUri(); + // If thumbnail URI starts from default icons base URI + // skip it. + if (strpos($thumbnailUri, $default_icons_base) === 0) { + return FALSE; + } + else { + return TRUE; + } + } + + break; + } + + return parent::isApplicable($entity, $field); + }
In other words, for thumbnails the media icon_base_url is excluded from moving. Correct?
This does not affect the need for 📌 Add field instance setting to opt in to file moving Active , which we need for e.g. performance.
You say that simply opting out for the thumbnail field is NOT good enough for you, but you need this path exclusion, i.e. ALL thumbnails should be auto-moved, EXCEPT those in the configured icon_base_url directory.
IIRR, the problem was that media module assumed that files that it placed in that directory are not moved by someone else.
Yes, i get the idea now. But for maintenance reasons i want this as simple as possible.
I want this by the principle of path exclusion. Paths to exclude are collected by some collector. Are there any reasons not to pursue this path?
PS: Kudos for the tests!
- Status changed to Needs review
over 1 year ago 5:02am 11 April 2023 - 🇳🇿New Zealand RoSk0 Wellington
In other words, for thumbnails the media icon_base_url is excluded from moving. Correct?
Yes.
This does not affect the need for #3348333: Add field instance setting to opt in to file moving, which we need for e.g. performance.
Potentially. I was proposing to close it only because it was originally proposed as a fix for this issue. At least this is how I understood.
You say that simply opting out for the thumbnail field is NOT good enough for you, but you need this path exclusion, i.e. ALL thumbnails should be auto-moved, EXCEPT those in the configured icon_base_url directory.
Not really. And the problem is not our specific use case. Proposed implementation is generic. Thumbnails are source type specific and this is implemented. By default and in scope of the core, no thumbnails should be moved, except thumbnails for the media with the
image
source.IIRR, the problem was that media module assumed that files that it placed in that directory are not moved by someone else.
I don't think you remember correctly, or maybe the initial analysis resulted in wrong conclusions. Media module has no assumptions, right or wrong, but this module has wrong handling of specific field -
thumbnail
for media entity.Yes, i get the idea now. But for maintenance reasons i want this as simple as possible.
Let's do this by the principle of path exclusion. ...
I was thinking a lot about maintainability and simplicity, and it's represented in the proposed code. In my opinion, from maintainer point of view, for this code to be merged it should have all the
@todo
s addressed, which are primarily around correct dependency injection. It's not perfect off course. Naming can be improved and with good review from experienced community members I pretty sure there would be other things identified that could be done better. The most important thing for me is that it solves the problem and makes the modules usable by the community. It proves itself with tests.Commit this and release to community to consume. I'm pretty sure there is a need for this module from users who are waiting for at least some type of release. Many people would not want to include a module without any release in the code base. There was a good article about releasing the code early from community, but I can't find it right now. It's a good one and a good suggestion.
- 🇩🇪Germany geek-merlin Freiburg, Germany
This is difficult communication. Let me ask again: I made a reasoned proposal on how to proceed. Do you have any points on that?
- 🇳🇿New Zealand RoSk0 Wellington
It looks like over reacted a bit, and also misunderstood your proposal - sorry.
I want to see this issue fixed. Lets do it as you proposing.
- 🇩🇪Germany geek-merlin Freiburg, Germany
OK i see. No hard feelings on my side. I'll try puzzling together a skeleton for this.
- Status changed to Needs work
over 1 year ago 8:25pm 18 April 2023 - 🇩🇪Germany geek-merlin Freiburg, Germany
I made a skeleton for the architecture: A cache ID collector, PSR-16 style collector, an immutable checker with proper caching.
What u think? Is this clear enough as a way forward? Did i miss sth?final class ExcludedPathsChecker implements CacheableDependencyInterface { public function isInExcludedPath(string $filePath): bool { // @todo Implement. } } final class ExcludedPathsCollector { public function addExcludedPath(CacheableString $path): void { // CacheableString & CacheableBool from drupal/cacheable_types // @todo Implement. } public function freeze(): ExcludedPathsChecker { // @todo Implement. } } final class ExcludedPathsProviderCollector { public function getExcludedPathsChecker(): ExcludedPathsChecker { // @todo Make it a service ID collector for ExcludedPathsProviderInterface services. // @todo Leverage DCache API to cache in MemoryBackend and respect cacheability. } } interface ExcludedPathsProviderInterface { public function provideExcludedPaths(ExcludedPathsCollector $excludedPathsCollector): void; } final class MediaExcludedPathsProvider implements ExcludedPathsProviderInterface { // @todo Implement on behalf of Media module. }
- 🇳🇿New Zealand RoSk0 Wellington
Mostly, I don't see
CacheableString
definition in Drupal 9.4 , so it's unclear how caching part would work and the purpose of theExcludedPathsCollector::freeze()
method. - 🇩🇪Germany geek-merlin Freiburg, Germany
>and the purpose of the ExcludedPathsCollector::freeze() method.
This is the builder / immutable design pattern (google it). Once you get used to it, you can't live without, promised.>Mostly, I don't see CacheableString definition in Drupal 9.4 ,
Find CacheabltTypes at https://www.drupal.org/project/cacheable_types → .
Find DCache at https://www.drupal.org/project/dcache →>so it's unclear how caching part would work
Do not hesitate to bump me and ask to work on some part of this, I now understand how this makes sense, so let's get it in. - last update
about 1 year ago 2 pass - 🇳🇿New Zealand ericgsmith
Rebased the MR after the Drupal 10 compatibility changes we landed.
Noting that this is still using the old approach in from #19
We are still keen on getting the approach agreed to in #29 in place to resolve this issue - however we are not actively working on this at this point in time.