Explore storing distinct trees for each revision via a TREE ID column

Created on 23 March 2017, about 8 years ago
Updated 3 January 2024, over 1 year ago

Problem/Motivation

Please check the attached screen recording which demonstrates the issue.

On the recording you can see that initially, I had the following tree structure represented by documentation nodes, linked together with a single value ERH field:

  • Parent
    • 1. page
      • 1.1 page
    • 2. page

After that, I created a new revision from 2. page and changed its parent from Parent to 1.1 page and renamed it to 2. page (Parent to 1.1 page):

  • Parent
    • 1. page
      • 1.1 page
        • 2. page (Parent to 1.1 page)

What I was expected to see after that on the UI is the same structure as the tree describes above, rather than what I saw is:

1. When I checked Parent page's children I still could see 2. page there but with its new title. (Always the latest entity (content) revision is loaded on the current UI.)
2. Moreover, I could also see 2. page with its new title among 1.1 page children.

This issue is also caused by how this module tries to support entity revisions in nested sets table. Currently, a nested set table's revision_id field stores the entity revision which created that partial tree. Partial tree, because when an item's parent changes only the affected subtree is rebuilt rather than the whole tree, that is why with the current architectural it would be almost impossible to load the correct child tree after 2. page's parent changed to 1. 1 page.

Proposed resolution

I think nested set tables should have an additional tree_id field as well. So a nested set table would have the following fields: id, revision_id, tree_id, left_pos, right_pos, depth. Tree_id would be a foreign key of a new table, called nested_set_{FIELD_NAME_}tree, which would store unique (auto incremental) tree ids. tree_id should be stored in entity hierarchy field tables as well, so an ERH field's table should have target_id, weight and tree_id fields beside the usual ones.

Let's got back to the initial example and check how proposed solution would solve the problem:

  • Parent
    • 1. page
      • 1.1. page
    • 2. page

Mark this structure with TREE-0 tree id. When the first Parent node is created without any parent reference selected it creates a new tree, called TREE-0. The tree id is generated in nested_set_{FIELD_NAME_}tree table and stored in Parent node's ERH field's tree_id value. When 1. page is added to Parent as a child the system can detect that this node is a child and therefore 1. page inherits its parent's (or we can see root node's) tree_id, so its ERH field's tree_id is also TREE-0 according to this example.

  • Parent
    • 1. page
      • 1.1. page
        • 2. page (Parent to 1.1 page)

This change basically creates a completely new tree, called TREE-1. Which means that all entities in the tree should have a new entity revision created and the ERH field's value on new revisions should be updated to the proper tree_id (TREE-1 in this case). This is important because by storing tree_ids in entities' ERH field revisions proper entity revisions can be identified and loaded when an entity's children are listed on the UI (compared with the current implementation, where revision_id is not able to provide the correct information). This is just a minor modification, for example NestedSetInterface::findDescendants() function also needs to add a new tree_id condition to load all entries from nested set table which belongs to the same tree (version).

Entity hierarchy also makes it possible to achieve this structure by moving Parent under 2. page

  • 1. page
    • 1.1. page
  • 2. page
    • Parent

What do we get? Basically two independent trees from TREE-0. TREE-1 = {1. page, 1.1 page} and TREE-2 = {2. page, Parent}. This is much complex situation than the first one.

Remaining tasks

  • Rebuilding a complete tree when a tree item gets a new revision or reverted to a previous revision (so anything which changes the tree structure) could be really time-consuming, because not just nested tree values need to be recalculated, but also new entity revisions need to be created. Should it happen in batch?

User interface changes

Probably.

API changes

Described in the Proposed resolution section.

Data model changes

Described in the Proposed resolution section.

✨ Feature request
Status

Needs work

Version

2.0

Component

Code (module)

Created by

πŸ‡­πŸ‡ΊHungary mxr576 Hungary

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10

    \Drupal\entity_hierarchy\Form\HierarchyChildrenForm::form calls \Drupal\entity_hierarchy\Storage\EntityTreeNodeMapper::loadAndAccessCheckEntitysForTreeNodes which calls \Drupal\entity_hierarchy\Storage\EntityTreeNodeMapper::loadEntitiesForTreeNodesWithoutAccessChecks which has this code

    $entities = $this->entityTypeManager->getStorage($entity_type_id)->loadMultiple(array_map(function (Node $node) {
          return $node->getId();
        }, $nodes));
        $loadedEntities = new \SplObjectStorage();
    foreach ($nodes as $node) {
          $nodeId = $node->getId();
          $entity = isset($entities[$nodeId]) ? $entities[$nodeId] : FALSE;
          if (!$entity || ($entity->getEntityType()->hasKey('revision') && $node->getRevisionId() != $entity->getRevisionId())) {
            // Bypass non default revisions and deleted items.
            continue;
          }
          $loadedEntities[$node] = $entity;
          if ($cache) {
            $cache->addCacheableDependency($entity);
          }
        }
    

    In this scenario $node is a nested set node from the tree and $entity is the Drupal entity.
    We're calling loadMultiple which only returns default revisions and we're then dropping tree nodes if the revision ID doesn't match the default one.

    This was changed in #2904307: Reorder children saves every revision needlessly β†’ in August 2017 which was after this was created

    I suspect the issue regarding non default revisions being shown in the children tab no longer exists.

    However, I'm still interested in the tree ID idea but on large sites the size of the tree and the number of rows to update can be problematic, so that would likely exacerbate the issue adding more records to the table.

    I'm going to move this to a feature request to explore that.

  • πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10

    I wonder if πŸ“Œ CTE Rewrite Active makes any of this easier?

Production build 0.71.5 2024