Calculate field and component dependencies on save and store them in an easy to retrieve format

Created on 27 June 2024, 7 months ago

Problem/Motivation

📌 Prevent modules from being uninstalled if they provide field types used in an Experience Builder field Fixed fixed an issue where if a field was used in an XB field (i.e. static prop) it prevents uninstall of the module providing the field type.

However, this requires manually writing JSON expressions (JSON_EXTRACT) which is not yet well supported by all core database drivers. See https://git.drupalcode.org/project/experience_builder/-/commit/8888c2eb8...

I think it would be more robust if when saving the entity, we calculate the field (and component, because there's likely to be a similar issue with those) dependencies, and store them in a more predictable structure (like a flat JSON array or something) outside the tree structure. This would be easier to inspect manually for humans trying to debug, and should be be easier to support and more performant across database types.

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

📌 Task
Status

Active

Component

Data model

Created by

🇬🇧United Kingdom catch

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @catch
  • 🇬🇧United Kingdom catch
  • 🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺
  • 🇬🇧United Kingdom catch

    Note that the 'field union json' approach on JSON-based data storage proposal for component-based page building Active would probably solve this problem for field types (the field union type would be in a relational varchar column even if the values are in JSON, allowing us to check dependencies via that), but I don't think it would solve it for components, which whatever we do almost certainly need to be in a nested JSON tree structure.

  • Status changed to Postponed: needs info 6 months ago
  • 🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

    @catch

    1. Would you agree that this could be a fallback solution if 📌 Ensure querying JSON nested values when parent keys are unknown is possible in all supported databases Active ends up not working, i.e. an escape hatch?
    2. Would you agree that 📌 Ensure querying JSON nested values when parent keys are unknown is possible in all supported databases Active is preferable?
  • Status changed to Active 6 months ago
  • 🇬🇧United Kingdom catch

    @Wim Leers - short answer, maybe once it's been proved to scale to tens or hundreds of thousands of complex entities with dozens of revisions each on all supported databases? But if we don't build something that does that prior to a stable (or really, beta) release, then we are looking at having to update tens of hundreds of thousands of entities and their revisions to add the denormalization after the fact, which would not be good.

  • 🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

    Good point. 🤔

    Your thinking is: it's gonna be easier to drop these denormalizations in the future, if we end up not needing them. Is that fair?

  • 🇬🇧United Kingdom catch

    That's one reason, but also:

    Let's assume that mysql, sqlite and pgsql can all run the queries needed in 📌 Ensure querying JSON nested values when parent keys are unknown is possible in all supported databases Active . Even if they can, they each have different syntax for this. They also have different ways of handling indexes on JSON fields.

    The database API does not have an abstraction layer for all of this, there is Add "json" as core data type to Schema and Database API Needs work but I don't know if that can handle nested values where the parent is unknown per the title of the other issue. Even if it can do that in the database layer, that concept currently doesn't exist in the entity query API (which is what we should be using for query entities) so would have to be added there too somehow.

    So that's two levels of abstraction for JSON queries that currently does not exist either in core or contrib. Equally, XB should not be hardcoding JSON query syntax itself, whether mysql-only, or switching on pgsql or sqlite, and it's not a good use case for backend-overridable either - because these are entity queries not some custom storage layer.

    Additionally, XB is currently the only use-case we have for adding arbitrary nesting support for JSON fields to entity query, but once it's in there we can't take it out again easily. So if all of that has to happen before it can be proven that it will scale to a medium-sized Drupal site (let alone a massive one), then it turns out it can't, that's an incredible amount of work wasted - and then we have a new API in core that we either have to support because other people started using it, or deprecate and remove again.

    I also feel like doing all of this is a slippery slope towards providing views support for arbitrary nested JSON values and other things that have been discussed in the JSON storage issue but personally I would very much like to avoid.

  • Assigned to bhuvaneshwar
  • 🇬🇧United Kingdom catch
  • 🇮🇳India bhuvaneshwar

    @catch, I'm experiencing some permission issues with git access. However, based on my findings, I suggest the following approach:

    In the widget, we can define the class as:

    class ExpressionFieldWidget extends WidgetBase {

    /**
    * {@inheritdoc}
    */
    public function formElement(FieldItemListInterface $items, $delta, array $element, array &$form, FormStateInterface $form_state) {
    $element['value'] = [
    '#type' => 'textfield',
    '#title' => $this->t('Expression'),
    '#default_value' => isset($items[$delta]->value) ? $items[$delta]->value : NULL,
    ];

    return $element;
    }
    }

    Then, we can define the field storage:

    function experience_builder_install() {
    // Define the field storage for the expression field.
    \Drupal::service('entity_type.manager')->getStorage('field_storage_config')->create([
    'field_name' => 'field_expressions',
    'entity_type' => 'node',
    'type' => 'expression_field',
    'cardinality' => FieldStorageDefinitionInterface::CARDINALITY_UNLIMITED,
    ])->save();
    }

    For querying the new field, we can use:

    $query = \Drupal::database()->select('node__field_expressions', 'e')
    ->fields('e', ['entity_id'])
    ->condition('e.value', '%' . $field_expression . '%', 'LIKE');
    $result = $query->execute();

    By transitioning to a standard Field API field that accepts multiple values, we can avoid database-specific JSON manipulation issues and ensure the code is compatible across different database systems.

    Please let me know your thoughts. Once I get the access issue resolved, I will apply these changes if everything looks good to you.

  • 🇬🇧United Kingdom catch

    I don't really understand storing the field expressions or needing a widget - it ought to be possible to store just the field type itself that is in use, and this can be calculated and put into the field value in entity presave.

    There will also need to be a field and appropriate logic for storing the components that are in use too.

  • Issue was unassigned.
  • Assigned to effulgentsia
  • 🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

    So, @catch, are you essentially proposing this?

       /**
        * {@inheritdoc}
        */
       public static function propertyDefinitions(FieldStorageDefinitionInterface $field_definition) {
         $properties['tree'] = DataDefinition::create('component_tree_structure')
           ->setLabel(new TranslatableMarkup('A component tree without props values.'))
           ->setRequired(TRUE);
     
         $properties['props'] = DataDefinition::create('component_props_values')
           ->setLabel(new TranslatableMarkup('Prop values for each component in the component tree'))
           ->setRequired(TRUE);
     
         $properties['hydrated'] = DataDefinition::create('component_tree_hydrated')
           ->setLabel(new TranslatableMarkup('The hydrated component tree: structure + props values combined — provides render tree for client side or render array for server side.'))
           ->setComputed(TRUE)
           ->setInternal(FALSE)
           ->setReadOnly(TRUE)
           ->setRequired(TRUE);
     
    +    $properties['dependencies'] = DataDefinition::create('component_tree_props_dependencies')
    +      ->setLabel(new TranslatableMarkup('All config and modules that the stored values of this XB field depend upon..'))
    +      ->setComputed(TRUE)
    +      ->setInternal(TRUE)
    +      ->setReadOnly(TRUE);
    +
         return $properties;
       }
    

    (in \Drupal\experience_builder\Plugin\Field\FieldType\ComponentTreeItem::propertyDefinitions())

  • 🇬🇧United Kingdom catch

    @Wim that will still require querying a flat JSON structure if I'm reading correctly. Better than a nested one but still has potential scaling issues. It could also be a separate unlimited cardinality field that stores a string (obviously this then means you need both fields to make XB work). But yes something like that.

  • 🇺🇸United States effulgentsia

    My initial reaction is whether the denormalization proposed by @catch would get out of sync and lead to all the problems we've had over the years with file_usage. But presumably we could add code somewhere that can rebuild it fairly easily, which I think would alleviate that?

  • 🇬🇧United Kingdom catch

    @effulgentisa file_usage is a count of usages of a file, in a custom database table, across a large number of entities. There is no way to rebuild it because the source of the usage is not recorded, that's one of the reasons we had so many problems with it.

    The idea here is when saving a single entity, to save the dependency information to an additional field (or field property) on the entity itself. There is no way it could get out of sync unless someone ran direct updates against the field tables.

    If you look at config dependencies:

    Taking core/profiles/standard/config/install/field.storage.node.field_tags.yml

    langcode: en
    status: true
    dependencies:
      module:
        - node
        - taxonomy
    id: node.field_tags
    field_name: field_tags
    entity_type: node
    type: entity_reference
    settings:
      target_type: taxonomy_term
    module: core
    locked: false
    cardinality: -1
    translatable: true
    indexes: {  }
    persist_with_no_fields: false
    custom_storage: false
    

    Probably you could determine that this config depends on taxonomy, because the target_type is taxonomy_term, and that it depends on node, because the entity_type is node, and these could be traced back to the modules that provide those entity types, instead of having a dependencies array. But the dependencies key is an inherent property of the config entity, and it's updated every time it's saved, give or take manual editing of files and config import.

    Note that if 📌 Refactor the XB field type to be multi-valued, to de-jsonify the tree, and to reference the field_union type of the prop values Active happens, then the component would already be stored as a string in a multi-valued field. And then depending on 🌱 [META] Configuration management: define needed config entity types Active and similar issues (there might be a more relevant one I can't find), assuming the dependencies of the component are defined in config, not dynamically in the content itself, then all of the dependency information could be deduced from the component.

    e.g. - try to delete a field, check config dependencies of components, check whether content uses the component.

    Then this issue would have been done implicitly, without any explicit denormalization or duplication of data - but those issues do not have defined solutions or agreement yet, and the data loss and database cross-compatibility issues are still unresolved, so I think this issue would be a small step forwards if those are not being prioritised yet.

Production build 0.71.5 2024