Add a reliable entity-usage system to core

Issue created by @larowlan
Comment about 2 years ago →
🇦🇺Australia larowlan 🇦🇺🏝.au GMT+10
Comment about 2 years ago →
🇦🇺Australia larowlan 🇦🇺🏝.au GMT+10
Comment about 2 years ago →
🇺🇸United States phenaproxima Massachusetts
+1 for this. It would help solve some tricky, long-standing problems with media.
Comment about 2 years ago →
🇪🇸Spain marcoscano Barcelona, Spain
Adding 📌 Track media usage and present it to the site builder (in the media library, media view, on media deletion confirmation, etc.) Active since it has some background discussion that might be helpful here.
Comment about 2 years ago →
🇪🇸Spain marcoscano Barcelona, Spain
After re-reading the related issue I see that back in the day I expressed that this was a "hard problem" to solve in core but didn't expand too much on that.

I still think it's non-trivial, especially if we want a solution that would both 1) register the data, and 2) present the data in a meaningful manner to end users. To me the trickiest aspects are:
- Depending on the content model, there may be "intermediate entities" (eg paragraphs, inline blocks, etc) that don't have a standalone representation (in other words, they "don't mean anything on their own to end users"), but on a technical level they are just as important as any other entity. So we need to track them the same way because they are part of the chain, but possibly display them differently (for example omitting them) on the UI. This causes complexity and overhead in the code that display usages to users.
- In content models that have a very large relationship tree (nested entities), calculating usage on entity save is very expensive. In entity usage we introduced a mechanism to allow updating the usage table as a background process (using a @destructable service) but that introduces its own complexities and inaccuracies to handle in certain scenarios.

Having said that, I too face this need in almost every project I work on, so despite the challenges, I am +1 for trying to find the most reasonable way to have this type of functionality in core.
Comment about 2 years ago →
🇦🇺Australia larowlan 🇦🇺🏝.au GMT+10
At the risk of going into implementation details, I think we can resolve some of those issues with entity handlers.
I had a similar issue with filter format audit and have a default handler but special cases for other entity types like paragraphs and inline blocks
Comment about 2 years ago →
🇬🇧United Kingdom catch
I'm not familiar with entity_usage module at all, but a general +1 to ripping the current file_usage system out and replacing it with something completely different - IMO the file usage API as it currently stands is unfixable.

Added 🌱 Dealing with unexpected file deletion due to incorrect file usage Active to the issue summary which is the current meta for how broken it currently is.

- Depending on the content model, there may be "intermediate entities" (eg paragraphs, inline blocks, etc) that don't have a standalone representation (in other words, they "don't mean anything on their own to end users"), but on a technical level they are just as important as any other entity.

This seems a similar issue to usages for entities that you don't have view access to. Doesn't seem insurmountable.

In content models that have a very large relationship tree (nested entities), calculating usage on entity save is very expensive. In entity usage we introduced a mechanism to allow updating the usage table as a background process

This is definitely worth an implementation issue if we decide to add this to core, I'm sure we could figure something out with an overall 'system is catching up to itself' flag that could indicate data is being rebuilt - whether a full rebuild or in queue/destruct etc. We'd need some ability to disable it happening directly on save for migrations (i.e. you'd often want the initial migration to run as fast as possible, then rebuild the usage tables when all the entities and their revisions are in).
Comment about 2 years ago →
🇦🇺Australia kim.pepper 🏄‍♂️🇦🇺Sydney, Australia
Sounds like a great idea.
Comment about 2 years ago →
🇨🇭Switzerland berdir Switzerland
Generally +1, entity usage tracking does seem to fit into core, also as a replacement for file_usage. being able to track revisions would vastly simplify issues current limitations there.

And yes, composite entities like paragraphs and inline blocks are a challenge. No, it's not the same as inaccessible entities. You still want and need to see them, but in a way that's actually useful as an editor, which is how you view and edit them. So you want to see the node that contains the paragraph/inline block, with maybe some information on which element it is. This is especially an issue with revisions, if you stop using an inline block or paragraph in a new revision, the composite entity will remain the default revision and look relevant but it's not. There is an issue/plan for a 3.x version of entity_usage that would explicitly track usage on its host entity.

Another challenging issue is that entity_usage needs to deal with are string ids both on the source and target, which result in annoyingly large indexes and complicated queries (entity_usage tends to be one of the biggest tables in our projects).
Comment about 2 years ago →
🇦🇺Australia kim.pepper 🏄‍♂️🇦🇺Sydney, Australia
Comment about 2 years ago →
🇳🇱Netherlands Lendude Amsterdam
Just want to give a general +1 on this, it's something that is needed or being requested on most projects we do these days.
Comment over 1 year ago →
🇨🇭Switzerland berdir Switzerland
Replying partially to the comment from @catch in issues like 🐛 Deleting an entity with revisions and file/image field does not release file usage of non-default revisions, causing files to linger Postponed :
> This should really be postponed on (this issue), the file_usage system is broken beyond repair, there is no way to rebuild file usage, so as soon as it's off by one or more, it's like that forever.

I'm not entirely convinced that we should postpone those issues. True, most of them haven't moved in years. But having them fixed in some capacity would also result in test coverage that we can build on.

Also, there are a few things to consider in regards to the rebuild usage scenario and also using it for replacing file_usage:

* entity_usage is limited to entity to entity usages, file_usage is not. There are valid cases where files are used in non-entities, for example in core that's the theme logo. Rebuliding with entities is one thing, they are a known thing and we can loop over them all and eventually our data is rebuilt. But rebuilding those other things is going to be more complex if we want to keep support for that. This would need to be built into the plugin system and data model in way.
* On sites that are large enough, with millions of entities, the rebuild feature becomes so slow that it becomes almost theoretical and unusuable. At least the current implementation in the entity_usage module, better options may be possible. Since it tracks both revisions targets *and* sources, it means looping over every revision of every entity type you have it enabled on, load them and run every plugin through each revision. The module can do it either on-the-fly on batch (but you have to restart if it fails for any reason) or through the queue.
* The current functionality is opt-in for both source and target entity type, which makes a lot of sense as there are plenty of references you don't want to track, for example orders/items/profiles on ecommerce sites are most likely not useful/needed. However, file_usage decides on whether or not files get deleted, so tracking for it has to be mandatory for all source entity types that might use it as well.

Again, I think entity_usage is awesome, we use it on all projects, but it it's a complex problem space and entity_usage in it's current form isn't really designed to replace file_usage.
Comment over 1 year ago →
🇦🇺Australia acbramley
Big +1 to get something like this into core.

Wrt. the entity_usage → module it seems there are currently 2 different rewrites in progress in the 8.x-3.x and 8.x-4.x branches so it'd be good to figure out which of those is likely to be the solution going forward.
Comment over 1 year ago →
solideogloria
Comment over 1 year ago →
solideogloria
I think this is pretty important if it's required to fix file usage and files not getting deleted when a node is deleted if the files are used by a non-current node revision.
Comment over 1 year ago →
solideogloria
Comment over 1 year ago →
🇬🇧United Kingdom catch
I'm not entirely convinced that we should postpone those issues. True, most of them haven't moved in years. But having them fixed in some capacity would also result in test coverage that we can build on.

Yeah the test coverage and knowing that we need to cover the case is I think fair enough, however we're never, ever going to be able to turn automatic file deletion based on file_usage on again, which makes the data in there essentially worthless, and it's going to be like that regardless of how many issues we fix due to the impossibility of an upgrade path. So as an exercise yes, but only at best indirectly leading to fixing the actual issue we want to fix.
Comment over 1 year ago →
solideogloria
however we're never, ever going to be able to turn automatic file deletion based on file_usage on again, which makes the data in there essentially worthless

It would be possible to have automated deletion for files created after the fixes. For other files, though, you would need to manually review the files and determine if they can be deleted or not. It would be really awesome if we could have an update hook "rebuild" usage data.

In any case, that's all the more reason to fix it and have data correct going forward, even if all legacy files have incorrect data. That way, new websites will at least be creating correct data.
Comment over 1 year ago →
solideogloria
The problem is that the file_usage table only tracks the entity ID, not the revision ID for that entity. So if you remove the file for the current version of the entity, it just decrements the count column value, but there is no way to know from the file_usage table what revision is referencing it.

Ideally, we would remove the count column and use a separate row for each reference and revision. However, this could cost a lot of space if an entity has a lot of files F and a lot of revisions R. There would be F times R rows for that entity. Maybe there's a better solution?
Comment over 1 year ago →
solideogloria
Honestly, this seems like a "revisioning" problem. Currently, nodes save the entire node for each revision. Ideally, only the "diff" would be saved. The same could be done for file usage, noting which revision(s) added/removed a reference to the file. It's sort of a Git-like problem.
Comment over 1 year ago →
🇬🇧United Kingdom catch
It would be really awesome if we could have an update hook "rebuild" usage data.

That's what this issue is about - replacing the entire system with entity_usage that tracks each individual usage in a rebuildable way instead of a count.
Comment over 1 year ago →
solideogloria
I took a look at it, however it doesn't fix the issue of files not being marked temporary because it doesn't integrate at the level yet. It also doesn't have a nice report or anything to view all entity usage for each entity type, which I would want. I think that for now I might just have to use a workaround 🐛 File not marked temporary and usage not updated if only used in past revisions when node is deleted Active to delete entity usage for revisions.
Comment 11 months ago →
🇷🇴Romania claudiu.cristea Arad 🇷🇴
And yes, composite entities like paragraphs and inline blocks are a challenge.

As I remember, there was a plan to solve this issue in entity_usage module with the concept of Top/Middle/Bottom entity type, avoiding middle relations and storing only the source and final target in the table. But the idea seems dead for a long time
Comment 5 months ago →
🇩🇪Germany geek-merlin Freiburg, Germany
Yes, i'd like that a lot too.
But no, the current scope is way too big.

How i see it reasonable:
- Take DynamicEntityReference (entityTypeId, entityId) out of DER module → , including views integration
- Add an extended DynamicEntityReferenceSource (entityTypeId, entityId, sourceFieldName) field type like paragraps parent does, add views integration
- Add a source / target table, and a way to opt reference fields into usage tracking
- Do NOT support for non-integer IDs in the first step (see below)
- Do NOT support non field usages in the first step (to be added later via special sourceFieldName values)
- Do NOT support displaying the usages (it can be done via views)
- Do NOT support transitive relations (e.g. node -> media -> file) in the first step (see below)
Comment 5 months ago →
🇫🇷France fgm Paris, France
And then beyond paragraphs, which are always embedded, you have entities which can be either standalone of embedded, like media
Comment 4 months ago →
🇳🇿New Zealand quietone
The Ideas project is being deprecated. This issue is moved to the Drupal project. Check that the selected component is correct. Also, add the relevant tags, especially any 'needs manager review' tags.

Thanks
Comment 2 months ago →
🇷🇴Romania claudiu.cristea Arad 🇷🇴
I badly needed Entity Usage module but with the refactoring described in 📌 Refactor module architecture in a simpler, opinionated and more performant approach Needs review . That issue is stuck for a long time, also I didn't need the UI part. For this reason, I've created the Track Usages → module which, I think, covers most of the aspects described in 📌 Refactor module architecture in a simpler, opinionated and more performant approach Needs review .

I needed such a functionality to build the File Visibility → module. Maybe some ideas could help here.

Add a reliable entity-usage system to core

Problem/Motivation

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Comments & Activities