Calculate field and component dependencies on save and store them in an easy to retrieve format

Issue created by @catch

🇬🇧United Kingdom catch

🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

🇬🇧United Kingdom catch

Note that the 'field union json' approach on ✨ JSON-based data storage proposal for component-based page building Active would probably solve this problem for field types (the field union type would be in a relational varchar column even if the values are in JSON, allowing us to check dependencies via that), but I don't think it would solve it for components, which whatever we do almost certainly need to be in a nested JSON tree structure.

Status changed to Postponed: needs info about 1 year ago

Comment about 1 year ago →

🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

@catch

Would you agree that this could be a fallback solution if 📌 Ensure querying JSON nested values when parent keys are unknown is possible in all supported databases Needs work ends up not working, i.e. an escape hatch?
Would you agree that 📌 Ensure querying JSON nested values when parent keys are unknown is possible in all supported databases Needs work is preferable?

Status changed to Active about 1 year ago

Comment about 1 year ago →

🇬🇧United Kingdom catch

@Wim Leers - short answer, maybe once it's been proved to scale to tens or hundreds of thousands of complex entities with dozens of revisions each on all supported databases? But if we don't build something that does that prior to a stable (or really, beta) release, then we are looking at having to update tens of hundreds of thousands of entities and their revisions to add the denormalization after the fact, which would not be good.

Comment about 1 year ago →

🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

Good point. 🤔

Your thinking is: it's gonna be easier to drop these denormalizations in the future, if we end up not needing them. Is that fair?

Comment about 1 year ago →

🇬🇧United Kingdom catch

That's one reason, but also:

Let's assume that mysql, sqlite and pgsql can all run the queries needed in 📌 Ensure querying JSON nested values when parent keys are unknown is possible in all supported databases Needs work . Even if they can, they each have different syntax for this. They also have different ways of handling indexes on JSON fields.

The database API does not have an abstraction layer for all of this, there is ✨ Add "json" as core data type Active but I don't know if that can handle nested values where the parent is unknown per the title of the other issue. Even if it can do that in the database layer, that concept currently doesn't exist in the entity query API (which is what we should be using for query entities) so would have to be added there too somehow.

So that's two levels of abstraction for JSON queries that currently does not exist either in core or contrib. Equally, XB should not be hardcoding JSON query syntax itself, whether mysql-only, or switching on pgsql or sqlite, and it's not a good use case for backend-overridable either - because these are entity queries not some custom storage layer.

Additionally, XB is currently the only use-case we have for adding arbitrary nesting support for JSON fields to entity query, but once it's in there we can't take it out again easily. So if all of that has to happen before it can be proven that it will scale to a medium-sized Drupal site (let alone a massive one), then it turns out it can't, that's an incredible amount of work wasted - and then we have a new API in core that we either have to support because other people started using it, or deprecate and remove again.

I also feel like doing all of this is a slippery slope towards providing views support for arbitrary nested JSON values and other things that have been discussed in the JSON storage issue but personally I would very much like to avoid.

Assigned to bhuvaneshwar

Comment about 1 year ago →

🇮🇳India bhuvaneshwar

Comment about 1 year ago →

🇬🇧United Kingdom catch

Comment about 1 year ago →

🇮🇳India bhuvaneshwar

@catch, I'm experiencing some permission issues with git access. However, based on my findings, I suggest the following approach:

In the widget, we can define the class as:

class ExpressionFieldWidget extends WidgetBase {

/**
* {@inheritdoc}
*/
public function formElement(FieldItemListInterface $items, $delta, array $element, array &$form, FormStateInterface $form_state) {
$element['value'] = [
'#type' => 'textfield',
'#title' => $this->t('Expression'),
'#default_value' => isset($items[$delta]->value) ? $items[$delta]->value : NULL,
];

return $element;
}
}

Then, we can define the field storage:

function experience_builder_install() {
// Define the field storage for the expression field.
\Drupal::service('entity_type.manager')->getStorage('field_storage_config')->create([
'field_name' => 'field_expressions',
'entity_type' => 'node',
'type' => 'expression_field',
'cardinality' => FieldStorageDefinitionInterface::CARDINALITY_UNLIMITED,
])->save();
}

For querying the new field, we can use:

$query = \Drupal::database()->select('node__field_expressions', 'e')
->fields('e', ['entity_id'])
->condition('e.value', '%' . $field_expression . '%', 'LIKE');
$result = $query->execute();

By transitioning to a standard Field API field that accepts multiple values, we can avoid database-specific JSON manipulation issues and ensure the code is compatible across different database systems.

Please let me know your thoughts. Once I get the access issue resolved, I will apply these changes if everything looks good to you.

Comment about 1 year ago →

🇬🇧United Kingdom catch

I don't really understand storing the field expressions or needing a widget - it ought to be possible to store just the field type itself that is in use, and this can be calculated and put into the field value in entity presave.

There will also need to be a field and appropriate logic for storing the components that are in use too.

Issue was unassigned.

Comment about 1 year ago →

🇮🇳India bhuvaneshwar

Comment 12 months ago →

🇺🇸United States bradjones1 Digital Nomad Life

Re: #8, see #3452756-18: Ensure querying JSON nested values when parent keys are unknown is possible in all supported databases → for some notes on the JSON querying side of this.

Assigned to effulgentsia

Comment 7 months ago →

🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

So, @catch, are you essentially proposing this?

   /**
    * {@inheritdoc}
    */
   public static function propertyDefinitions(FieldStorageDefinitionInterface $field_definition) {
     $properties['tree'] = DataDefinition::create('component_tree_structure')
       ->setLabel(new TranslatableMarkup('A component tree without props values.'))
       ->setRequired(TRUE);
 
     $properties['props'] = DataDefinition::create('component_props_values')
       ->setLabel(new TranslatableMarkup('Prop values for each component in the component tree'))
       ->setRequired(TRUE);
 
     $properties['hydrated'] = DataDefinition::create('component_tree_hydrated')
       ->setLabel(new TranslatableMarkup('The hydrated component tree: structure + props values combined — provides render tree for client side or render array for server side.'))
       ->setComputed(TRUE)
       ->setInternal(FALSE)
       ->setReadOnly(TRUE)
       ->setRequired(TRUE);
 
+    $properties['dependencies'] = DataDefinition::create('component_tree_props_dependencies')
+      ->setLabel(new TranslatableMarkup('All config and modules that the stored values of this XB field depend upon..'))
+      ->setComputed(TRUE)
+      ->setInternal(TRUE)
+      ->setReadOnly(TRUE);
+
     return $properties;
   }

(in \Drupal\experience_builder\Plugin\Field\FieldType\ComponentTreeItem::propertyDefinitions())

Comment 7 months ago →

🇬🇧United Kingdom catch

@Wim that will still require querying a flat JSON structure if I'm reading correctly. Better than a nested one but still has potential scaling issues. It could also be a separate unlimited cardinality field that stores a string (obviously this then means you need both fields to make XB work). But yes something like that.

Comment 7 months ago →

🇺🇸United States effulgentsia

My initial reaction is whether the denormalization proposed by @catch would get out of sync and lead to all the problems we've had over the years with file_usage. But presumably we could add code somewhere that can rebuild it fairly easily, which I think would alleviate that?

Comment 7 months ago →

🇬🇧United Kingdom catch

@effulgentisa file_usage is a count of usages of a file, in a custom database table, across a large number of entities. There is no way to rebuild it because the source of the usage is not recorded, that's one of the reasons we had so many problems with it.

The idea here is when saving a single entity, to save the dependency information to an additional field (or field property) on the entity itself. There is no way it could get out of sync unless someone ran direct updates against the field tables.

If you look at config dependencies:

Taking core/profiles/standard/config/install/field.storage.node.field_tags.yml

langcode: en
status: true
dependencies:
  module:
    - node
    - taxonomy
id: node.field_tags
field_name: field_tags
entity_type: node
type: entity_reference
settings:
  target_type: taxonomy_term
module: core
locked: false
cardinality: -1
translatable: true
indexes: {  }
persist_with_no_fields: false
custom_storage: false

Probably you could determine that this config depends on taxonomy, because the target_type is taxonomy_term, and that it depends on node, because the entity_type is node, and these could be traced back to the modules that provide those entity types, instead of having a dependencies array. But the dependencies key is an inherent property of the config entity, and it's updated every time it's saved, give or take manual editing of files and config import.

Note that if 📌 Refactor the XB field type to be multi-valued, to de-jsonify the tree, and to reference the field_union type of the prop values Active happens, then the component would already be stored as a string in a multi-valued field. And then depending on 🌱 [META] Configuration management: define needed config entity types Active and similar issues (there might be a more relevant one I can't find), assuming the dependencies of the component are defined in config, not dynamically in the content itself, then all of the dependency information could be deduced from the component.

e.g. - try to delete a field, check config dependencies of components, check whether content uses the component.

Then this issue would have been done implicitly, without any explicit denormalization or duplication of data - but those issues do not have defined solutions or agreement yet, and the data loss and database cross-compatibility issues are still unresolved, so I think this issue would be a small step forwards if those are not being prioritised yet.

Assigned to wim leers

Comment 4 months ago →

🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

Let's do this as a stepping stone towards 📌 Refactor the XB field type to be multi-valued, to de-jsonify the tree, and to reference the field_union type of the prop values Active .

I'm still collecting my thoughts, but there's at least 3 use cases already identified for adding more field properties (i.e. expanding ComponentTreeItem::::propertyDefinitions()) and hence DB columns:

the dependencies implied by a component tree and its inputs
👉 (this issue)
multi-valued XB field with a unique exposed_slot property per delta, to target a specific exposed slot in a ContentTemplate config entity, i.e. for the per-content entity unique creative freedom while the overall component tree is guaranteed to be consistent by the ContentTemplate
👉 being worked on at ✨ Content templates, part 3b: store exposed slot subtrees on individual entities Active , as part of 🌱 [META] 7. Content type templates — aka "default layouts" — clarify the tree+props data model Active
(surfaced yesterday by @lauriii at DDD and discussed at length with @pdureau and others at DDD) the fact that some SDC props may need to be considered either translatable or not (see 📌 Allow Experience Builder fields to support Asymmetric and Symmetric translations Needs review ) — for example an image in some SDC might be considered translatable or not — details TBD
👉 issue to still follow

(Not linking any of those issues nor crediting those people, because that's what that follow-up will be for. I'm just writing it down as context, and as a reminder for myself when I do write that next week.)

I will implement basically what I wrote in #15, but @catch was concerned in #16 that would still require a flat JSON structure.
AFAICT he thought that because it'd store both config and module dependencies:

  config:
    - a
    - b
  module:
   - x

But that's easily transformed to:

config:a config:b module:x

… at which point it's a flat list that is trivially queryable :)

Given what @catch wrote in #18, I think that my thinking has arrived at the same point: we basically want for each row for an XB field revision a flat list of dependencies, that is functionally equivalent to what a config entity's dependencies stores. IOW: we want the equivalent of \Drupal\Core\Config\Entity\ConfigEntityInterface::calculateDependencies(), but for the XB field, to allow efficiently (and with 100% confidence — unlike file_usage) determining which XB fields (and even which revisions of XB fields) are impacted by a potential change in available modules/themes/config.

IOW:

so I think this issue would be a small step forwards if those are not being prioritised yet.

💯

P.S.: I'm no DB nor SQL expert, so perhaps it seems super naïve. That's fine! I just want to restart this conversation, @catch! 😄

Comment 4 months ago →

🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

wim leers → changed the visibility of the branch 3457504-0-x to hidden.

Merge request !919Resolve #3457504 "Calc and store deps" → (Merged) created by wim leers

Comment 4 months ago →

🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

Working PoC live, with plenty of @todos.

Thoughts, @catch? 😊

Comment 4 months ago →

🇬🇧United Kingdom catch

I will implement basically what I wrote in #15, but @catch was concerned in #16 that would still require a flat JSON structure.
AFAICT he thought that because it'd store both config and module dependencies:
  config:
    - a
    - b
  module:
   - x
But that's easily transformed to:

config:a config:b module:x
… at which point it's a flat list that is trivially queryable :)

hmm this wasn't actually what I was concerned about, my concern was about trying to store multiple values in one column. I did not think about imploding the array into a literal string, but pretty sure this also will have scalability issues. How it would compare to a JSON query I don't know.

The problem with the implode() is that MySQL LIKE queries can only use indexes from left to right until they hit a wildcard, so if you have an index on a string field, you can do a query like:

SELECT * FROM bar WHERE baz LIKE 'something%' and that is fine.

But if you need to get rows matching an arbitrary part of the string, like config:b in the example above the query is this:

SELECT * FROM bar WHERE baz LIKE '%config:b%'

And this does not use an index. Random blog post from google with more detail: https://planetscale.com/learn/courses/mysql-for-developers/indexes/index...

Calculate field and component dependencies on save and store them in an easy to retrieve format

Problem/Motivation

Steps to reproduce

Proposed resolution

Remaining tasks

User interface changes

API changes

Data model changes

Merge Requests

!986Calculate field and component dependencies on save and store them in an easy to retrieve formatMerged

!919Calculate field and component dependencies on save and store them in an easy to retrieve formatMerged

Comments & Activities

#1

#2

#2

!986Calculate field and component dependencies on save and store them in an easy to retrieve format
Merged

!919Calculate field and component dependencies on save and store them in an easy to retrieve format
Merged