[PP-1] Spike: Explore storing a hash lookup of component inputs

Issue created by @larowlan

Comment 3 months ago →

🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

Comment 3 months ago →

🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

Comment 3 months ago →

🇺🇸United States effulgentsia

wim leers → credited effulgentsia → .

Comment 3 months ago →

🇬🇧United Kingdom longwave UK

wim leers → credited longwave → .

Comment 3 months ago →

🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

Actually, I think that @larowlan intended one hash per component instance, not one hash per explicit input per component instance. That probably makes more sense 👍

Prior art at @effulgentsia's & @longwave's ✨ Add way to "intern" large field item values to reduce database size by 10x to 100x for sites with many entity revisions and/or languages Active . The difference:

#3469082 would do it for the entire component tree (i.e. N component instances' explicit inputs)
this would do it for a single component instance (i.e. 1 component instance's explicit inputs)

It's probably still worth doing, and would actually tie in nicely with 📌 [later phase] When the field type for a PropShape changes, the Content Creator must be able to upgrade Postponed .

Comment 3 months ago →

🇬🇧United Kingdom catch

While this is interesting to explore, I'm not sure the final implementation should be in XB itself - we have at least two other options:

1. Having this as an option for any longtext/json field in sql storage - the approach would be applicable to long body fields too (think issue summaries with 300 revisions where the text itself is only updated 5 times).

2. Other approaches for reducing revision table size like purging - e.g. purge all non-default revisions prior to the previous default revision (somewhat implemented in workspaces or workspaces extra iirc). Or purge default revisions with a decay (keep the most recent ten, then purge ever other revision, then purge every 9/10 revisions based on thresholds etc.). This could be done via putting the entity into a queue when it's saved with a new revision, the queue would then thin out the older revisions. There is probably already a core issue for this around but can't find it immediately.

A big reason to do #2 would be because it's not always only the size of the table on disk that's the problem, but if there are millions or hundreds of thousands of rows, just things like indexes on revision IDs etc can get huge too, increases memory requirements, writes can slow down, allRevisions() queries get slower.

Comment 3 months ago →

🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

I think you're thinking of ✨ Add way to "intern" large field item values to reduce database size by 10x to 100x for sites with many entity revisions and/or languages Active ?

Not opposed to #3469082, nor to #7.2. But I think especially #7.2 is something that could happen later, and most certainly should not be XB-specific.

The only reason for me to prefer @larowlan tackling it here instead of landing #3469082: time (well, and potential scope creep). #3469082 will not land before 11.2, which means it won't be in time for XB's 1.0-goal-by-DrupalCon-Vienna-in-October.

Comment 3 months ago →

🇬🇧United Kingdom catch

The only reason for me to prefer @larowlan tackling it here instead of landing #3469082: time (well, and potential scope creep). #3469082 will not land before 11.2, which means it won't be in time for XB's 1.0-goal-by-DrupalCon-Vienna-in-October.

Once a complex schema for data compression is in Experience Builder it will be incredibly hard to change. Once either revision pruning or field value compression is in core, it will be very easy to enable for sites (might be harder to enable field value compression for existing sites, but it could be added for any new xb field added to a site, and migration paths could be added later).

I don't see why revision compression would be a stable blocker, especially if it introduces complex technical debt that will be hard to refactor later.

Comment 3 months ago →

🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

Fair point. @effulgentsia, thoughts?

Comment 3 months ago →

🇬🇧United Kingdom catch

Found the revision pruning issue, linking it here.

Comment 3 months ago →

🇬🇧United Kingdom catch

Comment 3 months ago →

🇬🇧United Kingdom catch

The immediate concern I have with not having that is: data storage of XB would then still change at a later time.

It shouldn't change.

If ✨ Add way to "intern" large field item values to reduce database size by 10x to 100x for sites with many entity revisions and/or languages Active follows the current proposed solution, the 'interning' would be a configurable (per field instance), internal detail of the sql storage. It would be transparent to XB and the field definition etc.

It may not be straightforward for existing sites to change that on a field that already exists (would probably need a custom update path, maybe to a new field name), but this is not the same as XB itself having to change its data model and provide an upgrade path.

If #2770417: Revision garbage collection and/or compression → happens, there is no change to the data model at all, just occasional pruning of older revisions. The two issues also aren't mutually exclusive although if we do one of them, the other becomes lower priority.

Comment 3 months ago →

🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

It may not be straightforward for existing sites to change that on a field that already exists (would probably need a custom update path, maybe to a new field name), but this is not the same as XB itself having to change its data model and provide an upgrade path.

This is indeed what I was referring to in #10.

That being said: I agree, and I think #13 describes it eloquently.

That means this definitely is not a stable blocker anymore 😊

Comment 3 months ago →

🇺🇸United States effulgentsia

I agree with #13 and #14. Also, I think we can do ✨ Add way to "intern" large field item values to reduce database size by 10x to 100x for sites with many entity revisions and/or languages Active in a way that's upgrade path friendly. For example, adding the interned column but keeping the old column around until all old records have been updated.

Comment 3 months ago →

🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺

Captured in the meta → as of #3520449-31: [META] Production-ready data storage → . 👍

[PP-1] Spike: Explore storing a hash lookup of component inputs

Overview

Proposed resolution

User interface changes

Comments & Activities