Refactor module architecture in a simpler, opinionated and more performant approach

Created on 11 June 2019, almost 5 years ago
Updated 13 February 2024, 4 months ago

Problem/Motivation

Some issues such as

🐛 Entity usage list does not scale Needs review
Allow usage records to be registered in background Needs review
#3056026: Remove orphan handling of paragraphs when parent exists
#2985265: Cache viewable results when generating usage page
#3002332: Track composite (e.g. paragraph) entities directly and only as their host
#2971131: Improve label handling on usage page
#2949952: Make it easier to retrieve full relationship chain

are examples that show how our current approach is not scalable, and also hacky in some scenarios.

Storing all 1-to-1 relationships in DB is great, but not using the same rules to display them on the UI is challenging. On the other hand, arguably most users of this module are not really interested in that flexibility, and only interested in "in what node is this piece of media being used (regardless of all paragraphs/blocks in between them)".

This issue is to explore a new architecture of the module to simplify things, and hopefully allow better scaling for large sites.

Note: This will obviously be done in a new 3.x branch, since important API-level aspects would change.

Proposed resolution

  • We introduce the concepts of top/middle/bottom level for entities
  • Only Top-level entities are tracked as source (by default only nodes, sites could override that)
  • Middle-level entities are disregarded entirely, but traversed when looking for targets
  • Bottom-level entities are all entities configured to have the "Usage" local task (tab) on their page. These are the only "target entities" we store information for in DB
  • We no longer care to store information about the fieldname, relationship method (plugin id), or count in DB. Now all a DB row tells us is that: "a (source, top-level) entity of type: A, with id: B, on its revision C and language D points to a (target, bottom-level) entity of type E with id: F". This "reference" from source and target may be direct or through any number of intermediate non-top / non-bottom entities.
  • All tracking calculation is deferred to a (usually) background process using DestructableInterface
  • We can now build the usage page on the UI using direct paged queries. The page will show usages grouped by source entity type (so we can easily join the entity table on the default revision) and only show records that point to the default revision
  • We no longer need to provide any specific views integration

Remaining tasks

Should be fixed here:

  1. We need to implement new instances of hook_entity_delete() and similar, to: a) Remove records from the DB when target (bottom) entities are deleted, b) Set the "needs regeneration" flag/warning when middle-level entities are deleted

Can be fixed in follow-ups:

  1. Having dropped the views integration, we might consider a custom field handler that would either show "Not being used", or a link to "Check usages"
  2. When a field / field storage is deleted, and also when any middle-level entity is deleted, we may end up with stale info in the DB. There may be several ways to approach this, it's probably best to discuss / test pros and cons of each in a follow-up.
  3. When an entity is used only in past revisions of a top-level entity, those "hidden" usages will not show up in the usage list. We need a mechanism to let users discover the past usages through the UI, if needed.
  4. Currently to expose new entity types to be tracked as source, sites need to write custom code and override \Drupal\entity_usage\EntityUsageSourceLevel::TOP_LEVEL_TYPES. It would be better to expose this to be configurable on the UI.

User interface changes and modifications on default behavior

  1. The usage page (when visitors click on the "Usage" tab) now only displays rows for top-level source entities.
  2. The usage page no longer displays columns for "Field Name", or "Used in".
  3. The usage page will now group source entities by their type, with a common pager for all groups. For example, if the "Number if items per group" (defined in the settings form) is set to 10, and Nodes and Users are configured to be tracked as top-level entities, then the first page will display a group of 10 rows for node sources, and another group of 10 rows for user sources. The next page would fetch the next 10 rows for both groups, and so on.
  4. When the module is first installed, node and media entities (if they exist) will have the "Usage" tab enabled by default. This will mean they are automatically tracked as targets (bottom-level) by default. Note: This does not apply to existing sites.
  5. By default only node entities are tracked as source.
  6. In some situations (for example after a field has been deleted), a warning in the status report page will be displayed, informing users that usage re-generation is needed. In order to do so and remove the message, users need to go to the Batch Update form and trigger a batch update of usage statistics.

API changes

No change is needed in tracking plugins, as long as they extended the \Drupal\entity_usage\EntityUsageTrackBase base class.

The changes below might affect custom or contrib code interacting with this module:

  1. The setting option usage_controller_items_per_page is now called usage_controller_items_per_group.
  2. The hook hook_entity_usage_block_tracking() no longer receives method, field_name, or count as parameters.
  3. The entity_usage DB table no longer has columns for method, field_name, or count
  4. The system now uses a state flag entity_usage_needs_regeneration to display a warning on the status report page when we detect stale data might exist
  5. The Drupal\entity_usage\EntityUpdateManager service now implements DestructableInterface, and during CRUD hooks we only register the operations that happened during the current request. All real usage tracking is deferred to the end of the request (normally in background), inside the \Drupal\entity_usage\EntityUpdateManager::destruct() method.
  6. The module no longer provides specific views integration. In other words, we no longer implement hook_views_data() or hook_views_data_alter().
  7. The methods: ::trackUpdateOnCreation(), ::trackUpdateOnEdition(), and ::trackUpdateOnDeletion() from EntityUpdateManager are now protected instead of public.
  8. The method \Drupal\entity_usage\EntityUsageInterface::registerUsage() is now only intended to _adding new records_ (instead of adding/updating existing). Its signature has changed since it now receives less arguments.
  9. A new \Drupal\entity_usage\EntityUsageInterface::deleteUsage() method is created to allow deleting a specific record from the DB.
  10. The method \Drupal\entity_usage\EntityUsageInterface::deleteByField() is removed, since we no longer have field information in DB
  11. All events dispatched by this module have been adjusted, since we no longer pass information about field_name, method, etc. Also, a new event is created when a specific DB record is deleted.
  12. The \Drupal\entity_usage\EntityUsageInterface::listSources() return value now no longer includes information about the field name, method, or count for the retrieved records.
  13. The \Drupal\entity_usage\EntityUsageInterface::listTargets() return value is now a simple associative array where keys are target entity types, and values are indexed arrays of target entity IDs.
  14. The (already) deprecated methods \Drupal\entity_usage\EntityUsageInterface::listUsage() and \Drupal\entity_usage\EntityUsageInterface::listReferencedEntities() were removed.
📌 Task
Status

Needs review

Version

2.0

Component

Code

Created by

🇪🇸Spain marcoscano Barcelona, Spain

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇺🇸United States partdigital

    One suggested approach that we've been using on our project to handle entity usage:

    We created a service that accepts a top entity along with a specification. It would then traverse through the tree and only store the results that we needed based on that specification. As we traversed the tree we would also store the location of each item so that once the usage was captured we could easily traverse that set with methods like getParent(), getChild(), getSibling() etc.

    For example, our API looks like this:

    We define a specification. It's basically just an array but it could be made into a plugin/config entity and given a name. So that you could define meaningful traversal specifications for your project. You can also simply generate a "default" specification by observing what fields and entity types there are on the site. Though I've usually found it more useful to be more explicit somehow.

    $spec = [
          'page' => [
            'entity_type' => 'node',
            'bundle' => 'page',
            'fields' => [
              'field_entity_reference' => [],
            ],
          ],
          'article' => [
            'entity_type' => 'node',
            'bundle' => 'article',
            'fields' => ['field_entity_reference' => []],
          ],
        ];
    

    We then pass that specification into a method.

    $collection = $service->getReferencedEntitiesCollection($parentEntity, $spec);
    

    Now we can do things like this:

    // Get first level of children.
    $collection->getChildren();
    
    // Get all children recursively
    $collection->getAllChildren();
    
    // Get the immediate parent if the child is known.
    $collection->getParent($entity);
    
    // Get all the parents of a known child. 
    $collection->getAllParents($entity);
    
    // Get all siblings 
    $collection->getSiblings($parent, $entity);
    

    This is very fast because we store the entity id and its location in the set (basically an index). See the example below. The key is the location and the value is the entity id.

    [
      '0' => 6
      '0:0' => 4
      '0:1' => 3
      '1' => 5
      '1:0' => 3
      '1:1' => 2
      '1:1:0' => 1
      '2' => 8
      '3' => 9
      '4' => 10
    ];
    

    To get this working with the broader entity usage, you could:

    • Cache/store each set for each top entity.
    • Create a relationship between the top entity and each child entity so that it's easy to find the set. So in the example above you'd have 10 records.

    The api might look like this:

    // Finds the top entity and all its sets with this child. Let's say it's just one collection. 
    $collection = $service->findCollection($childEntity);
    
    // This then gets its immediate parent (not the top parent)
    $collection->getParent($childEntity);
    
    

    Just food for thought as you're working on this :)

  • 🇦🇺Australia acbramley

    Is the plan to still go ahead with this 3.x branch? I see there's now a 4.x branch using entity_track. Surely we should consolidate efforts on a single new architecture?

    We use this module pretty heavily on one of our client projects and they've recently asked for features such as filtering the Usage list by current/previous revision so I'm happy to help the efforts in order to unlock so of those more complex features.

  • 🇪🇸Spain marcoscano Barcelona, Spain

    Thanks all who have been providing feedback and ideas to this issue. Apologies for not replying earlier 🙏

    @acbramley thanks! I will take any help available :)

    Currently I would say that both 3.x and 4.x branch are very much experimental and shouldn't be used on prod. Development on 3.x stalled at some point because I didn't feel good being the only one moving this idea forward (being this such a disruptive architectural change). Then at some point in time @seanb and @askibinski came up with the idea of splitting the API into a generic layer to "track things", and then make Entity Usage just be a consumer of that API, which makes sense to me, but we didn't fully make the switch into this new 4.x branch, and the development kind of stalled.

    Yes, I think at this point it makes sense to envision the refactoring mentioned here on top of the 4.x branch. In order for that to happen, I would say that a rough roadmap could be:

    ET = Entity Track
    EU = Entity Usage

    0- [NEEDS WORK] Fix tests in D10 / Switch to GitlabCI 📌 Fix tests in HEAD for D10 Active
    1- [ALMOST DONE ?] review the current code / update the branches with latest commits on EU (entity_usage) 2.x and ensure we have feature parity between ET 1.x + EU 4.x and EU 2.x
    - This was kind of OK as of Dec 2022 with #3324787: Update 4.x branch and #3324797: Update with entity_usage 2.x changes but we'd need to review latest bug-fixes since then.
    2- [NEEDS REVIEW] ensure that the test coverage of ET 1.x and EU 4.x combined is equivalent of what we have in EU 2.x
    - This probably happened as part of the above issues as well, but we'd need to double-check we are not losing test coverage in the switch
    3- [NEEDS WORK] ensure we have an upgrade path for existing users on EU 2.x #3326110: Create an upgrade path for EU 2.x -> ET 1.x + EU 4.x
    4- [DONE ?] have some real world experience / feedback of ET 1.x + EU 4.x
    - I know of one reasonably-sized project that is using ET+EU on prod for a couple years now, but it would be great to get more alpha testers out there if we can.

    After this, I believe we could tag a EU 4.0.0-beta1 and mark it as recommended branch instead of 2.x.

    Then, it would likely make sense to revisit the refactor from this issue and simplify everything in a 5.x branch probably?

    I am OK going forward with this plan and welcome everyone that is able/willing to participate.

    Thanks!

Production build 0.69.0 2024