- @geek-merlin opened merge request.
- Status changed to Needs review
over 1 year ago 9:00pm 9 April 2023 - π©πͺGermany geek-merlin Freiburg, Germany
Took a first stab at this. Feedback appreciated.
- Status changed to Needs work
over 1 year ago 10:17pm 9 April 2023 The Needs Review Queue Bot β tested this issue. It fails the Drupal core commit checks. Therefore, this issue status is now "Needs work".
This does not mean that the patch needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.
Consult the Drupal Contributor Guide β to find step-by-step guides for working with issues.
- π©πͺGermany geek-merlin Freiburg, Germany
FTR: In the first commit i explored delegating load and save to FieldStorageDefinition, which in turn delegates to static FieldItem class methods.
As of the latter (delegate to FieldItem), there are ups and dows with this, and as FieldItem already is tightly coupled with storage, e.g. via ::schema, the better DX tipped the balance.
In later commits, i delegated save to a non-static FieldItem method. Why? Because we can and i wanted to explore it. All in all, the first approach with static methods on load and save looks cleaner to me. Also it gives FieldStorageDefinition full control, to maybe do sth different like annotation based mapping definition or whatever. So will work the code in that direction again. - π©πͺGermany geek-merlin Freiburg, Germany
This is an interesting failure: How can PathAliases return an array for
$entity->$fieldName
?? Die, __get, die! - π©πͺGermany geek-merlin Freiburg, Germany
OK, this triggers other edgy cases:
Drupal\Core\Entity\EntityStorageException: Field langcode / content_translation_source is unknown.
- Status changed to Needs review
over 1 year ago 9:41pm 10 April 2023 - π©πͺGermany geek-merlin Freiburg, Germany
Yup, here's the api. Now we need an example implementation and tests.
- Status changed to Needs work
over 1 year ago 1:20am 12 April 2023 - πΊπΈUnited States smustgrave
In addition could you update the issue summary please. Or state why the tag is no longer needed/current IS is correct.
- π©πͺGermany geek-merlin Freiburg, Germany
Yup, i agree on needs-IS-update. Did that. Also removed the citation from #2414835: SqlContentEntityStorage does not unserialize base field data β added by @xjm in #12 which basically stated that unserialization does not happen on load. Judging from the code, i guess this was changed long ago. Which makes this a task rather than a bug. Tentatively keeping this major as it's a key blocker for the whole JSON in DB columns work like JSON Map Field.
- last update
over 1 year ago 29,302 pass - last update
over 1 year ago 29,360 pass - last update
over 1 year ago 29,358 pass, 2 fail - last update
over 1 year ago 29,365 pass, 2 fail - last update
over 1 year ago 29,365 pass, 2 fail - last update
over 1 year ago 29,365 pass, 1 fail - last update
over 1 year ago 29,364 pass, 3 fail - last update
over 1 year ago 29,365 pass, 1 fail 41:38 32:56 Running- last update
over 1 year ago 29,365 pass, 1 fail - last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago Build Successful - last update
over 1 year ago 29,365 pass, 2 fail - last update
over 1 year ago 29,365 pass, 2 fail - last update
over 1 year ago 29,365 pass, 1 fail - last update
over 1 year ago 29,367 pass - last update
over 1 year ago 29,367 pass - Status changed to Needs review
over 1 year ago 11:17pm 30 April 2023 - π©πͺGermany geek-merlin Freiburg, Germany
Banzai!
The MR
- adds an API like requested in the IS
- implements the API on behalf of MapField
- Extends MapField tests to cover thisCurrently working on a JsonMapField that depends on this API.
- Status changed to Needs work
over 1 year ago 6:43pm 6 June 2023 - πΊπΈUnited States smustgrave
Was hoping someone more knowledgable would review this one but don't want to leave it hanging.
Test coverage does seem to be good.
Only glaring thing is if there could be a change record for the interface?
Will keep an eye out for that too.
- First commit to issue fork.
- last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago 29,455 pass - Status changed to Needs review
over 1 year ago 3:39am 28 July 2023 - πΊπΈUnited States bradjones1 Digital Nomad Life
Took a first stab at a change record. Also updating the title to reflect the fact this change is broader than just serialization.
I believe this should again be ready for maintainer/committer review.
- πΊπΈUnited States bradjones1 Digital Nomad Life
I think π MapItem is broken for configurable fields Needs work is heavily affected by this change, and might need an update/rethink after this lands.
Installed this patch along with β¨ Add "json" as core data type to Schema and Database API Needs work and took a stab at a JSON-backed field - see https://git.drupalcode.org/-/snippets/124 for a working example.
This is mostly a reimplementation of the
MapItem
code but usingjson_encode()
andjson_decode()
instead. It works, however I'm curious if there's undesirable overhead in always doing ajson_decode()
on stringified JSON input to denormalize the values into theMap::$values
array.I don't think this is the right issue to add something like
JsonMapItem
, however the linked snippet at least proves out we're heading in a good direction on this change enabling things like JSON data storage.If anyone has thoughts on making
setValue()
less janky in either or both of these classes, I'm all ears.Before I go adding an issue to implement something like
JsonMapItem
in core... is that something we want in core? Or should this be implemented by more opinionated modules/services/etc. that bring along more business logic with them? - Status changed to RTBC
over 1 year ago 4:23pm 22 August 2023 - πΊπΈUnited States smustgrave
IS seems good and all threads are resolved.
Going to put in the committers bucket to see their thoughts.
- last update
over 1 year ago 29,470 pass - last update
over 1 year ago 29,470 pass - last update
over 1 year ago 29,470 pass - last update
over 1 year ago 29,470 pass - last update
over 1 year ago 29,471 pass - last update
over 1 year ago 29,471 pass - Status changed to Needs work
over 1 year ago 5:49am 2 September 2023 - π³πΏNew Zealand quietone
I'm triaging RTBC issues β . I read the IS and the comments and skimmed the MR.
The diff no longer applies, tagging for a reroll. And the MR should be against 11.x. Thanks.
I read the Change record and wonder if code examples are needed for how to implement this.
- First commit to issue fork.
- Merge request !4814Issue #2232427: Allow field types to control how properties are mapped to and from storage β (Open) created by rpayanm
- last update
over 1 year ago 30,165 pass - Status changed to Needs review
over 1 year ago 4:40pm 18 September 2023 - Status changed to RTBC
over 1 year ago 7:55pm 18 September 2023 - πΊπΈUnited States smustgrave
Reroll seems good so restoring previous status.
- last update
over 1 year ago 30,169 pass - last update
about 1 year ago 30,169 pass - last update
about 1 year ago 30,206 pass - last update
about 1 year ago 30,364 pass - last update
about 1 year ago 30,358 pass, 2 fail - last update
about 1 year ago 30,361 pass - last update
about 1 year ago 30,372 pass - last update
about 1 year ago 30,380 pass - last update
about 1 year ago 30,378 pass 17:17 14:58 Running- last update
about 1 year ago 30,393 pass - last update
about 1 year ago 30,398 pass - last update
about 1 year ago 30,398 pass - last update
about 1 year ago 30,414 pass - last update
about 1 year ago 30,418 pass - last update
about 1 year ago 30,427 pass - last update
about 1 year ago 30,427 pass - last update
about 1 year ago 30,438 pass - last update
about 1 year ago 30,439 pass - last update
about 1 year ago 30,465 pass - last update
about 1 year ago 30,481 pass - πΊπΈUnited States tim.plunkett Philadelphia
"Bumping" is unhelpful, now the issue is further down the RTBC queue
- last update
about 1 year ago 30,484 pass - last update
about 1 year ago 30,487 pass - last update
about 1 year ago 30,489 pass - Status changed to Needs work
about 1 year ago 6:01am 7 November 2023 - π¦πΊAustralia larowlan π¦πΊπ.au GMT+10
Left some comments on the MR - I'll ping @Berdir and @catch in slack for a subsystem maintainer review
- π¨πSwitzerland berdir Switzerland
This is a very old issue, but comment #1 still summarizes my concerns very well.
I'm not convinced that moving the serialize() calls out of the storage makes the situation better. In my opinion, that doesn't make moving to json easier:
* We have no API to change a field type, so i don't like introducing a new JsonMapItem field type as a solution.
* with json storage, there is also the idea to have all (configurable) fields in a single json data structure. With that, this suddenly becomes counter-productive and results in having a double json encoding.
* It also seems counter-productive for completely different possible backends, like the test key value implementation in core, or something like mongodb.In regards to the contrib blocker tags here, I don't really see that. Both TMGMT and Paragraphs use serialized data in a base field, and Metatags in a configurable field. They do the mapping and serialization outside of storage and it works. both TMGMT and metatags also migrated from a serialized string to json. It's not always pretty, but it's not blocking. MapItem is a special case, it's architecture is both too heavy and too rigid to be a sensible solution for any of these modules (partially because it's not fully working to be fair), I'm a bit unsure what we should do with that.
To be clear, my concerns are about the *serialize* part, not the part that's actually in the issue title, which is mapping. We could just leave out the serialize() call and then treat the properties like we already do and support, for example with linkItem attributes?
The second part that I don't like is that we essentially introduce the same API twice, once through the field item class and once through the definition class, and we use both each, but the definition classes just use the trait as a passthrough to the field item class again. Going through the definition classes means we risk breaking this feature for field types when they are used for definitions that are neither config nor base fields (which have limited support but are technically possible). Why do we even need this? The implementation of FieldStorageConfig::getFieldItemClass() is imho not internal/secret, I'm certain we have other cases that directly get the field item class from the field type, that's basic plugin definition stuff.
- πΊπΈUnited States bradjones1 Digital Nomad Life
bradjones1 β changed the visibility of the branch 2232427-field-storage-mapping to hidden.
- πΊπΈUnited States bradjones1 Digital Nomad Life
Trying to move this along in the context of β¨ Add "json" as core data type to Schema and Database API Needs work .
I rebased the 11.x MR and it is passing tests, which is a nice fresh baseline.
Having read #58 a few times and I think I understand most of the main points, though I'm not as familiar with some of the contrib examples provided.
At the risk of reading too far into others' arguments without the benefit of a synchronous exchange, I think that we might be prematurely blocking ourselves from a big leap forward in some of the assumptions around JSON storage vis-a-vis the field API. While it's true there are issues like #3277081: [Plan] Convert field storage to use JSON fields β proposing a more flexible storage solution for fields, that's solidly in D12 territory at this point.
Re: these specific concerns:
We have no API to change a field type, so i don't like introducing a new JsonMapItem field type as a solution.
I don't think this need be an either-or type situation. In theory, we could provide a field-type-change API to convert backing storage from serialized PHP arrays to JSON data columns. Also, the use cases for a
JsonMapItem
type field are different enough from the current MapItem that there is room for both in the ecosystem. While the universe of use cases for the JSON map field probably entirely encapsulates the existing map field, many map fields likely don't need the other benefits of being in JSON storage. (Mainly, being able to query for document members in SQL.)with json storage, there is also the idea to have all (configurable) fields in a single json data structure. With that, this suddenly becomes counter-productive and results in having a double json encoding.
I'll be honest, I don't entirely follow this point, although I'm sure that's just my lack of context. I think this gets to the point I made above about prematurely limiting ourselves re: a major change in field storage that is still only in an idea stage.
It also seems counter-productive for completely different possible backends, like the test key value implementation in core, or something like mongodb.
This is probably the strongest concern of the three, however I still think it's a bit theoretical and I'm struggling to find a case where this is a blocker. The test K/V entity storage is rather feature-incomplete and is a bit of a theoretical example. MongoDB is a bit more of an interesting case. Drupal certainly is written with an SQL backend in mind, and it will be a significant lift (but not impossible) to allow entities be stored in MongoDB. What that would look like, I'm not sure, however I don't think it's a reason to block anything here.
To the other concern:
To be clear, my concerns are about the *serialize* part, not the part that's actually in the issue title, which is mapping. We could just leave out the serialize() call and then treat the properties like we already do and support, for example with linkItem attributes?
I think I understand this in theory but am struggling to see it in practice. By my read the issue title "Allow field types to control how properties are mapped to and from storage" is inextricably linked to getting the storage out of the serialization business entirely, and allowing fields to determine how and what gets sent to the storage backend's columns. So if LinkItem were to be refactored to use this new interface, its ::mapColumnsOnSave() would look something like:
return [ 'uri' => $properties['uri'], 'title' => $properties['title'], 'options' => serialize($properties['options']), ];
That's a rather trivial example, however one can imagine more interesting cases where field properties are retrieved and set in a much less storage-bound manner and the field then does load and save logic to form them into a sane mix of columns, which don't have to track with the field's user-facing properties at all. If some of them are serialized, great, if not, great.
I'm taking a fresh look at the MR and making a few stylistic edits, however I'm skeptical that this can be any less than it is now. Looking at it more by way of commenting, I'd actually love to see us entirely deprecate and remove all default serialization and require fields to handle it themselves.
- π¨πSwitzerland berdir Switzerland
I think I understand this in theory but am struggling to see it in practice. By my read the issue title "Allow field types to control how properties are mapped to and from storage" is inextricably linked to getting the storage out of the serialization business entirely, and allowing fields to determine how and what gets sent to the storage backend's columns. So if LinkItem were to be refactored to use this new interface, its ::mapColumnsOnSave() would look something like:
I disagree with this. I see the use case that properties and columns don't map 1:1, I don't see the need to remove serialization from the storage and put it into the field item classes. IMHO, serialization belongs in the storage.
So if LinkItem were to be refactored to use this new interface, its ::mapColumnsOnSave() would look something like:
And what would we gain from that? I don't see this as an improvement. Alternative storage implementations are rarely used, but they do exist and we explicitly support that as a feature.
My proposal is that we define that each column must either be a scalar value or an array (with only scalars and arrays as content, which is where it gets interesting*), and in case of an array, the storage is responsible for serialization which is then an implementation detail and must not be relied on, which means that such properties/columns can't be queried, as it might be a php serialize string or json.
* one reason why switching to json in the database is hard is that there are cases where people currently store objects in there, for example language for link attributes.
with json storage, there is also the idea to have all (configurable) fields in a single json data structure. With that, this suddenly becomes counter-productive and results in having a double json encoding.
I'll be honest, I don't entirely follow this point, although I'm sure that's just my lack of context. I think this gets to the point I made above about prematurely limiting ourselves re: a major change in field storage that is still only in an idea stage.
The idea is that we have a storage implementation that stores all configurable fields in a single "fields" column in the database as json (which now all database backends allow to have queries on, with indexes), no more node__field_... tables. And then such properties wouldn't need to be serialized again as the whole thing is already a json structure.
- πΊπΈUnited States bradjones1 Digital Nomad Life
Thanks for the feedback and clarification.
in case of an array, the storage is responsible for serialization which is then an implementation detail and must not be relied on, which means that such properties/columns can't be queried, as it might be a php serialize string or json.
I think I see where the disconnect in approaches and philosophy is coming from, now. From where I sit, the whole point of adding support for JSON data structures is to empower queries that do depend on the data being stored as JSON per se - not just as an implementation detail. It would be unfortunate if we ship first-class support for JSON data types in the database and then say, if you use it with field API, we make no representations that you can actually depend on this being stored as JSON in the db.
To help address your concerns over the serialization piece, and also help with the JSON data storage piece, I've updated the MR to focus on mapping. The serialization bits in the SQL content entity storage have been abstracted into a pair of new methods which will also help to auto-magically encode and decode JSON.
In the course of updating the MR I think I discovered a bit of a hole in the test coverage in the original MR, though it probably needs more to ensure all this works in all three cases of 1) base tables, 2) shared tables, 3) dedicated tables.
- Status changed to Needs review
12 months ago 11:02pm 2 January 2024 - πΊπΈUnited States bradjones1 Digital Nomad Life
The new
::fromStoredValue()
method where we are consolidating unserialization andjson_decode()
could unlock us usingallowed_classes
for PHP class unserialization, too, I think. See #3046696-26: Move from serialized columns to JSON encoded data wherever possible, or use allowed_classes β . - Status changed to Needs work
12 months ago 11:08pm 3 January 2024 - π¨πSwitzerland berdir Switzerland
Need to have a closer look at the changes to see if that's closer to what I had imagined.
The last paragraph of #58 hasn't been adressed yet, we still have two interfaces and StorageMapperDelegatorTrait to channel the calls in some cases through the field definition objects, I'm pretty certain we don't need that.
- πΊπΈUnited States bradjones1 Digital Nomad Life
Yeah, I noticed that as well while refactoring out the serialization bits but reached my cognitive-load limit on refactoring. I think we can simplify this, or at least make it clearer why it ended up the way it is.
- πΊπΈUnited States bradjones1 Digital Nomad Life
...we still have two interfaces and StorageMapperDelegatorTrait to channel the calls in some cases through the field definition objects, I'm pretty certain we don't need that.
Taking a fresh look at this after some sleep and a re-read of the code and I think the answer here is that these are similar, but distinct, interfaces. In this case, the one always delegates to the other, however that could potentially not be the case.
StorageMapperInterface
is for field storage definitions to determine how properties are mapped to db columns. In theory, at least, this would allow for custom storage implementation that is completely transparent to the field item. This was introduced at #36 with some reasoning explained there.In turn,
FieldItemStorageMapperInterface
signals to the storage that it is capable of handling this decision-making itself, and that the storage definition can delegate these method calls.I agree we could be more explicit about explaining this in the docblocks. I kinda like this solution since it does help with potential future flexibility around storage, and might even address your concerns around how database drivers might handle these situations e.g. in the case of MongoDB as first-class database entity storage.
if this is a no-go for you, then we rip it out and do away with the delegation piece easy enough. This is just a judgment call. I kinda like how it is now and can see the vision of why it was done, but I'm also motivated to get it passing muster with the maintainer (you) and the core committers to unblock other work. So IOW, I don't care enough to go back and forth on it much, and see both sides of this decision.
- π©πͺGermany geek-merlin Freiburg, Germany
@bradjones1, @berdir: Thanks for moving this further forward! I see the concerns and decision-points, esp. that of #62/63. +1 for the thorough discussion of that.
- πΊπΈUnited States bradjones1 Digital Nomad Life
@geek-merlin, appreciate your kind words. Do you have anything to add here, since you were the one to first introduce these interfaces? Did I summarize your motivations correctly?
- π©πͺGermany geek-merlin Freiburg, Germany
@bradjones:
Yes, what i want to say is that i feel that all the energy i spent to make the first iteration, i feel is in good hands with both of you.I don't quite remember the details, but the two-interface thing was to have the first POC in the least invasive way that seemed possible. I did not feel too comfortable with it from the beginning, so feel free to replace with sth better.
I can follow @berdirs pints well, and having the storage say "hey, you have to decide between a scalar and a json value" feels like a even better solution. (Yes, it should be tied to JSON to allow JSON queries later.)
That said, i wouldn*t feel too comfortable with another version of the NoSQL wars a decade ago, where new features had to deny that our primary target is SQL.Happy you are taking further the energy that i put in, even if for the foreseeable future i won't be able to spend a lot more. That said, feel free to PM me if a can add sth.
- πΊπΈUnited States bradjones1 Digital Nomad Life
Thanks. That's helpful. I kinda like it how it sits now, though as mentioned, not wed to it and I'd rather get something in than go back and forth for much longer. I think the ball is in @Berdir's court and we can try to get this RTBC "soon."
- πΊπΈUnited States bradjones1 Digital Nomad Life
Pinged @Berdir in Slack to loop around on this. Hoping the follow-up comments address some of the questions/concerns in #65 about what appear to be duplicate interfaces but are not. Still open to changing the approach, but I think there are merits to keeping this as it is and would like specific guidance on what needs changed to get this to RTBC if we do need to change tack.
Also adding OPTASY which are sponsoring my time now.
- Status changed to Needs review
8 months ago 12:17am 29 April 2024 - πΊπΈUnited States bradjones1 Digital Nomad Life
Addressed MR comments. Rebased and back to green. Needs fresh review.
- Status changed to Needs work
8 months ago 9:07pm 30 April 2024 The Needs Review Queue Bot β tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".
This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.
Consult the Drupal Contributor Guide β to find step-by-step guides for working with issues.
- Status changed to Needs review
8 months ago 4:33am 4 May 2024 - Status changed to Needs work
7 months ago 7:12am 22 May 2024 - π§πͺBelgium wim leers Ghent π§πͺπͺπΊ
AFAICT this will block β¨ JSON-based data storage proposal for component-based page building Active , which is how I ended up reviewing this :)
Lots of comment nits plus a few actual questions. I think the change record should also be updated to be more explicit about what we expect field type maintainers to do: what are the updated best practices? (It captures that in a brief sentence or two, I'd like to see a concrete and , because that'll make it MUCH more likely that the ecosystem will adopt the new best practices: more explicitness.)
- πΊπΈUnited States bradjones1 Digital Nomad Life
So latest test reveals that if we restrict unserialize() in SqlContentEntityStorage, layout builder demonstrates they are depending on this functionality - but they're the only ones doing so.
I imagine there could also be contrib field types that are depending on this unrestricted unserialize(). That should probably be deprecated officially... but as it stands I think this means that layout builder needs to implement this change at the same time?
Moving LB internals to use JSON or this new interface would unblock at least two LB issues postponed on the unserialize() aspect.
- πΊπΈUnited States bradjones1 Digital Nomad Life
Updating layout builder to use the new storage mapping was trivially easy. There will need to be a migration of existing data, though, and I honestly don't have personal interest in writing migration tests right now. Hopefully someone with more skill and specific knowledge of that module can help out over at π Do not store sections as PHP objects Active . The BC layer introduced in the commit above basically means that these sections will be updated as they are touched and saved, but we will need a migration to capture them all if we indeed deprecate and remove support for PHP object serialization in the database.
Speaking of, I have stubbed out a CR for deprecating PHP object serialization β , linked to this issue. This is in response to MR feedback asking why we couldn't add ['allowed_classes' => FALSE] to the unserialize call. There's a reason why it's not there currently - it would preclude behavior like seen in layout builder. In fact, there's even a test (also now deprecated) for this behavior. If we need to keep it, that would be fine and simplify things here... but it's probably the right thing to do.
- πΊπΈUnited States bradjones1 Digital Nomad Life
Well that was a fun afternoon of chasing my tail on a workspaces integration test only to find it was due to an out of sync static cache between the test execution environment and the test web server. Oy vey. Makes me yearn for #3324241-8: Provide DIC-friendly, concurrency-safe alternative(s) to `drupal_static()` β or similar fixes that will allow us to avoid this kind of WTF in the future. Not to mention better support for async runtimes. End rant.
There seem to be an awful lot of "random" test failures lately in functional tests, but I can't recreate any of them locally. I think this is ready for fresh review.
- πΊπΈUnited States smustgrave
Cleaned up some of the threads for ya, will leave in review for more familiar eyes of the issue.
- π³π±Netherlands daffie
Back to needs work for my remarks on the PR.
- π©πͺGermany geek-merlin Freiburg, Germany
As of @wimleers' comment regarding security of unserializing:
SA-CORE-2019-003 β is about user-entered form data.
This unserializes app-controlled data from the DB, the same as in \Drupal\Core\Cache\DatabaseBackend::prepareItem.
(Adding a comment to ALL such places makes a lot of sense though.) - π©πͺGermany geek-merlin Freiburg, Germany
Updated reply to @wimleers' comment regarding security of unserializing:
We already have that unrestricted unserialize in \Drupal\Core\Entity\Sql\SqlContentEntityStorage::mapFromStorageRecords today.
So if we need security hardening here, it's a separate issue.That said, after some thinking i share the concern about API-injected evil classes into field data. Thinking about where to open that issue.