- Issue created by @larowlan
- πΊπΈUnited States effulgentsia
wim leers β credited effulgentsia β .
- π§πͺBelgium wim leers Ghent π§πͺπͺπΊ
Actually, I think that @larowlan intended one hash per component instance, not one hash per explicit input per component instance. That probably makes more sense π
Prior art at @effulgentsia's & @longwave's β¨ Add way to "intern" large field item values to reduce database size by 10x to 100x for sites with many entity revisions and/or languages Active . The difference:
- #3469082 would do it for the entire component tree (i.e. N component instances' explicit inputs)
- this would do it for a single component instance (i.e. 1 component instance's explicit inputs)
It's probably still worth doing, and would actually tie in nicely with π [later phase] When the field type for a PropShape changes, the Content Creator must be able to upgrade Postponed .
- π¬π§United Kingdom catch
While this is interesting to explore, I'm not sure the final implementation should be in XB itself - we have at least two other options:
1. Having this as an option for any longtext/json field in sql storage - the approach would be applicable to long body fields too (think issue summaries with 300 revisions where the text itself is only updated 5 times).
2. Other approaches for reducing revision table size like purging - e.g. purge all non-default revisions prior to the previous default revision (somewhat implemented in workspaces or workspaces extra iirc). Or purge default revisions with a decay (keep the most recent ten, then purge ever other revision, then purge every 9/10 revisions based on thresholds etc.). This could be done via putting the entity into a queue when it's saved with a new revision, the queue would then thin out the older revisions. There is probably already a core issue for this around but can't find it immediately.
A big reason to do #2 would be because it's not always only the size of the table on disk that's the problem, but if there are millions or hundreds of thousands of rows, just things like indexes on revision IDs etc can get huge too, increases memory requirements, writes can slow down, allRevisions() queries get slower.
- π§πͺBelgium wim leers Ghent π§πͺπͺπΊ
I think you're thinking of β¨ Add way to "intern" large field item values to reduce database size by 10x to 100x for sites with many entity revisions and/or languages Active ?
Not opposed to #3469082, nor to #7.2. But I think especially #7.2 is something that could happen later, and most certainly should not be XB-specific.
The only reason for me to prefer @larowlan tackling it here instead of landing #3469082: time (well, and potential scope creep). #3469082 will not land before 11.2, which means it won't be in time for XB's 1.0-goal-by-DrupalCon-Vienna-in-October.
- π¬π§United Kingdom catch
The only reason for me to prefer @larowlan tackling it here instead of landing #3469082: time (well, and potential scope creep). #3469082 will not land before 11.2, which means it won't be in time for XB's 1.0-goal-by-DrupalCon-Vienna-in-October.
Once a complex schema for data compression is in Experience Builder it will be incredibly hard to change. Once either revision pruning or field value compression is in core, it will be very easy to enable for sites (might be harder to enable field value compression for existing sites, but it could be added for any new xb field added to a site, and migration paths could be added later).
I don't see why revision compression would be a stable blocker, especially if it introduces complex technical debt that will be hard to refactor later.
- π§πͺBelgium wim leers Ghent π§πͺπͺπΊ
Fair point. @effulgentsia, thoughts?
- π¬π§United Kingdom catch
Found the revision pruning issue, linking it here.
- π¬π§United Kingdom catch
The immediate concern I have with not having that is: data storage of XB would then still change at a later time.
It shouldn't change.
If β¨ Add way to "intern" large field item values to reduce database size by 10x to 100x for sites with many entity revisions and/or languages Active follows the current proposed solution, the 'interning' would be a configurable (per field instance), internal detail of the sql storage. It would be transparent to XB and the field definition etc.
It may not be straightforward for existing sites to change that on a field that already exists (would probably need a custom update path, maybe to a new field name), but this is not the same as XB itself having to change its data model and provide an upgrade path.
If #2770417: Revision garbage collection and/or compression β happens, there is no change to the data model at all, just occasional pruning of older revisions. The two issues also aren't mutually exclusive although if we do one of them, the other becomes lower priority.
- π§πͺBelgium wim leers Ghent π§πͺπͺπΊ
It may not be straightforward for existing sites to change that on a field that already exists (would probably need a custom update path, maybe to a new field name), but this is not the same as XB itself having to change its data model and provide an upgrade path.
This is indeed what I was referring to in #10.
That being said: I agree, and I think #13 describes it eloquently.
That means this definitely is not a stable blocker anymore π
- πΊπΈUnited States effulgentsia
I agree with #13 and #14. Also, I think we can do β¨ Add way to "intern" large field item values to reduce database size by 10x to 100x for sites with many entity revisions and/or languages Active in a way that's upgrade path friendly. For example, adding the
interned
column but keeping the old column around until all old records have been updated. - π§πͺBelgium wim leers Ghent π§πͺπͺπΊ