Spike: Explore storing component inputs in separate columns (aka field union)

Created on 12 May 2025, 2 days ago

Problem/Motivation

At present we store component inputs in a JSON blob.

This means we cannot efficiently query components as it is just a large JSON Blob. πŸ“Œ [PP-1] Evaluate storing XB field type's "deps_*" columns in separate table Active does allow us to at least identify which entities are using which plugins/components but it doesn't provide any path for updates. So for example if a block changes its settings, we have to loop over every revision and search for components and then update the whole blob.

Proposed resolution

Store the data normalized - in the same way that we currently do for field API fields in core. But instead of one table per field (prop) we would have one table per component version (set of fields)

I had wondered whether we actually need two config entity types at all - i.e. could field union directly use a component config entity type instead of using its own, or could XB directly use field unions without an extra entity type in-between, but... no idea whether that would even be desirable even if it's possible.

- From #3477428-10: Refactor (or decide not to) the XB field type to be multi-valued, to de-jsonify the tree, and to reference the field_union type of the prop values β†’

I think we don't need a lot of the complexity of Field Union module and it would instead be better to borrow the concept of field-type derivatives from Field Union. i.e. We can derive one field-type plugin per component and version πŸ“Œ Version component prop definitions for SDC and Code components Active

Spike outcomes, some of these may be split into separate child stories:

  1. Evaluate if this is even feasible
  2. Try to do it in a storage layer that supports one table per component (version), not one per component version per entity type (as is the case with fields in core)
  3. Explore if we can do it without requiring field definitions for each component (field derivative). This will bloat the field map and lead to performance issues
  4. Explore the impact on the number of tables and joins this will entail - we can expect there might be up to 50 different component types in a given site, possibly more. We will likely also need to store versions of components in separate tables if new props are added or data-types change. So there might be as many as 100 tables. That's assuming we can reuse the same table across multiple entity-types. If we have one table per field per entity-type.
  5. Explore decorating SqlContentEntityStorage and storage handler that extend from it to support loading of this data in a single query during standard entity load even though we're not making use of standard fields here
  6. Explore what views integration would look like
  7. Explore nested field definitions for object and array shape data
  8. Explore making this something component source plugins control as it doesn't apply to all source plugins
  9. Explore what this would like for e.g. Block settings that whilst modeled using config schema (and therefore typed data) are arbitrary in shape and would traditionally be stored in a serialized column

User interface changes

πŸ“Œ Task
Status

Active

Version

0.0

Component

Data model

Created by

πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @larowlan
  • πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10
  • πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10
  • πŸ‡§πŸ‡ͺBelgium wim leers Ghent πŸ‡§πŸ‡ͺπŸ‡ͺπŸ‡Ί

    This potential direction is why I prioritized πŸ“Œ [later phase] Support matching `{type: array, …}` prop shapes Postponed and got it finished. Because I have a hard time seeing how this work with multi-value fields. Especially because of #3052670: Support multi-valued "field union"s β†’ .

    So, to avoid us adopting this and potentially losing multi-value support, I made sure πŸ“Œ [later phase] Support matching `{type: array, …}` prop shapes Postponed was working, and proves that multi-value scalars (type: array, items: { type: integer } β€” see the sparkline test SDC) and multi-value object shapes (see the image-gallery test SDC) can work in the current architecture.

    (I'm not fundamentally opposed to this β€” just concerned we'd forget about that, and now we can't! πŸ‘)

    Related: I tried to push #3467890 forward and assigned it to you at #3467890-13: [later phase] Support `{type: object, …}` prop shapes with single level that require *multiple* field types: use `field_union`? β€” OUT OF SCOPE: nested components/component reuse β†’ for feedback, @larowlan πŸ˜„

  • πŸ‡§πŸ‡ͺBelgium wim leers Ghent πŸ‡§πŸ‡ͺπŸ‡ͺπŸ‡Ί

    So for example if a block changes its settings, we have to loop over every revision and search for components and then update the whole blob.

    Indeed. And for that, we have πŸ“Œ [SPIKE] Prove that it's possible to apply block settings update paths to stored XB component trees Active .

  • πŸ‡§πŸ‡ͺBelgium wim leers Ghent πŸ‡§πŸ‡ͺπŸ‡ͺπŸ‡Ί

    we would have one table per component version (set of fields)

    Try to do it in a storage layer that supports one table per component (version), not one per component version per entity type (as is the case with fields in core)

    🀯 That could easily be hundreds of DB tables: a site can easily have a 100 components (the issue summary assumes 50), and for each of those multiple versions. (Note that "version" here is a massively overloaded term β€” there can be very different reasons. See #3523841-6: Version component prop definitions for SDC and Code components β†’ .)

    ⚠️ Concern: this would make πŸ“Œ [later phase] When the field type for a PropShape changes, the Content Creator must be able to upgrade Postponed much harder. What if a site with a million existing revisions decides to implement hook_storage_prop_shape_alter() to change the field type for a prop of an SDC that is present in all of them (to improve the authoring experience, or to switch from plain images to Media Library or $REASON).
    This architecture would require rows to be removed from one table and moved into another!

    Although I think it could be argued that that would be much clearer. It'd also allow dropping tables for older "component versions" that don't have any remaining rows anymore, and would also allow removing the corresponding entries in the Component config entity that πŸ“Œ Version component prop definitions for SDC and Code components Active would've added.

    πŸ€” Not sure yet, but for sure interesting πŸ˜„

    Explore what views integration would look like

    I don't see yet how that'd be meaningful. Views lists things of the same type in a single list/grid/table/…. But here those same things (instances of the same component version) are spread across many entities and bear no relation to one another. Unless you're thinking about listing all the different component instances of a single entity? Or something else still? But listing the first or fifth or Nth instance of some component still is not meaningful?

    I struggle to follow your thinking here πŸ˜‡

    Explore what this would like for e.g. Block settings that whilst modeled using config schema (and therefore typed data) are arbitrary in shape and would traditionally be stored in a serialized column

    I'm really curious about this part 🧐

  • πŸ‡¬πŸ‡§United Kingdom catch

    @Wim πŸ“Œ [PP-1] Consider not storing the ComponentTreeStructure data type as a JSON blob Postponed is (I think, still catching up on the latest issues a bit) a row-per-component with a single JSON column for the values in a single table, so it would be mutually exclusive with this issue.

    For me, having multiple tables, or multiple rows for a single delta, feels like it would be incredibly complex both from the point of view of having to adapt all SQL storage backends to support it, and also for views integration.

    However row-per-component with a JSON column would simplify dependency checking, updates, potentially things like revision compression etc. and might well be useful for 🌱 [META] Support alternative renderings of prop data added for the 'full' view mode such as for search indexing or newsletters Active too. Views integration feels like a very low priority because the data is arbitrary as you say.

    I have on occasion added listing filters with CONTAINS on the body field or similar on sites that otherwise don't use the search module, when the dataset is small enough that it won't kill the database. There might be the odd case like that but don't think there will be many.

    I could see wanting to list entities that are using component x - that would be easy to do with row-per-component because it doesn't rely on the values. e.g. you could list all articles that have an image gallery in them, things like that.

    A JSON column would make views integration (at least for the values if not other things like component) dependent on ✨ Add "json" as core data type Active , but that feels like a reasonable limitation to me. No matter how complicated it might be, it is almost going to be less complicated than views integration for the current JSON blob with everything in it, and it might even be less complicated than supporting a fully relational schema here.

    So for me personally, I would postpone this issue on πŸ“Œ [PP-1] Consider not storing the ComponentTreeStructure data type as a JSON blob Postponed , and if that one works out, then this might not be very necessary to explore.

Production build 0.71.5 2024