- Issue created by @Anybody
I am a little confused as to the purpose of this issue because it seems to duplicate its related issues. How is it different?
- 🇩🇪Germany Anybody Porta Westfalica
Hi @cilefen, thanks a lot for your feedback. The point is, that all the related issues were very specific to the different database engines and I was missing a general issue for the schema addition. Perhaps I may have misunderstood any of the related issues?
Perhaps we should make this the META issue for the different database engine specific issues and put this one as parent? Would that solve your point?
Oh, I see: the other related meta issue is about field queries. Do I understand correctly?
- 🇩🇪Germany Anybody Porta Westfalica
Yes, at least that is how I'd also understand that. Different topics, even though related.
- 🇺🇸United States RustedBucket
Any movement on this? It is a bit frustrating one can't have a module create a json field in a new table. I get the 'work-around' but that nullifies the core power of the field type. I might be wrong but it seems like something the schema api could support without impacting much of anything else.
- 🇨🇦Canada gapple
It could cause confusion or unexpected behaviour if Drupal's
json:normal
field type acts differently between database engines, so the expected behaviour for databases should be defined.- MySQL supports a single
JSON
type, that normalizes data before storing it (remove duplicate values, collapse whitespace, order keys)
- PostgreSQL supports a raw (json
) and normalized (jsonb
) data type (and recommends thejsonb
/normalized type be used by default).
- SQLite stores JSON as text (for backwards compatibility). Itsjson()
function normalizes whitespace, but does not currently remove duplicate keys or order keys (the documentation notes that future versions may start removing duplicate keys).json_extract
does not specify how it handles a JSON string with duplicate keys.----
SQLite could possibly cause some challenges, but I think normalization should be expected.
I'm not sure if it would be a benefit to also provide a
json_raw
type for Postgres'json
type, and other DBs could just store as text. - 🇫🇷France andypost
Is there issue to add support for sqlite? Somehow I can't find
- 🇫🇷France andypost
Filed patch for sqlite to new issue 📌 Support JSON fields in Sqlite schema Closed: duplicate
- 🇩🇪Germany Anybody Porta Westfalica
Thank you @andypost for pushing this important piece forward!
Now that we have:
MySQL: ✨ Support JSON fields in MySQL schema Closed: duplicate
Sqlite: 📌 Support JSON fields in Sqlite schema Closed: duplicate
PostgreSQL: ✨ PostgreSQL: Support JSON data type Closed: duplicate
RTBC'd, how to proceed?In https://www.drupal.org/project/drupal/issues/2472709#comment-13945836 ✨ PostgreSQL: Support JSON data type Closed: duplicate the great @alespott wrote:
fwiw https://www.drupal.org/project/json_field → exists and implements a field type for Postgres's jsonb format as well as the json format.
I think for something to get core inclusion we need to implement it for all core drivers - ie. #3109340: [policy] Decide whether to require json support for all database drivers for Drupal 10 → needs to get agreed first and then we need to decide how this is implemented. Going to discuss with other framework managers.
So I'll move the "Needs framework manager review" to the META issue (from ✨ PostgreSQL: Support JSON data type Closed: duplicate ) to check further steps in general.
Should contrib modules for other database engines be pinged? Like https://www.drupal.org/project/sqlsrv →
- 🇳🇱Netherlands daffie
As the maintainer of the Database API, I would like to do all in a single patch. In this patch I would like to see the following:
- for all 3 databases that the JSON field can be created.
- that we cab use the DBAL to filter on a key value in JSON field. This with condition command. Like\Drupal::database()->select('some_table')->fields()->condition('some_field', 'some_value', '-->')
. The operator-->
is new and should be documented.
- An index should be added, so that the query above does not result in a full table scan.I want this in a single patch/MR to make sure that it all works together.
- 🇫🇷France andypost
There's only 2 operators
->
and->>
for
- sqlite https://sqlite.org/json1.html#jptr
- mysql https://dev.mysql.com/doc/refman/8.0/en/json-function-reference.htmlAnd much more for pgsql https://www.postgresql.org/docs/15/functions-json.html
So not sure we need to support all of them
- Merge request !4433Issue #3343634: [META] Add "json" as core data type for Schema API → (Open) created by gapple
- last update
over 1 year ago 29,836 pass - 🇨🇦Canada gapple
Created a MR combining the latest patches from the child issues.
I changed the postgres type to
jsonb
based on comments on that issue. - last update
over 1 year ago 29,836 pass - last update
over 1 year ago Fetch Error - last update
over 1 year ago 29,836 pass - 🇺🇸United States bradjones1 Digital Nomad Life
No longer meta, closing children as duplicate per #14 and the updated MR.
- last update
over 1 year ago 29,866 pass, 3 fail - 🇺🇸United States effulgentsia
diff --git a/core/tests/Drupal/KernelTests/Core/Database/DriverSpecificSchemaTestBase.php b/core/tests/Drupal/KernelTests/Core/Database/DriverSpecificSchemaTestBase.php index 96933c2333494818f10903697a38d7413413a36b..64af4617dc18e2a495bc5144b2b4bec84648e2cb 100644 --- a/core/tests/Drupal/KernelTests/Core/Database/DriverSpecificSchemaTestBase.php +++ b/core/tests/Drupal/KernelTests/Core/Database/DriverSpecificSchemaTestBase.php @@ -1354,4 +1354,42 @@ public function testChangeSerialFieldLength(): void { $this->assertEquals($id + 1, $id_two); } + /** + * Tests json schema type. + */ + public function testJsonSchema(): void {
Unlikely that this function as written in this MR works on PostgreSQL, so I queued a test run for that.
- 🇺🇸United States bradjones1 Digital Nomad Life
There's a failure with Postgres in
Drupal\Tests\views\Functional\Plugin\DisplayFeedTranslationTest
which I didn't look into (yet) - it's comparing two very large values in a views test so... will take some additional investigation (maybe it's flaky or known elsewhere.)As for the test predicted to fail in #20, the test runner confirmed it. After reading the documentation on Postgres Json query syntax, there's a subtle difference between how the ->> operator works in Postgres vs. MySQL.
Per the PSQL docs:
Extracts JSON object field with the given key, as text.
And for MySQL:
...given a JSON column value column and a path expression path (a string literal)...
I briefly went down a rabbit hole on why this might be and if it violates (not that it would matter...) the ISO SQL standard and... well, it's not public (why you gotta pay?!) and again, not gonna change.
So basically it boils down to, we'll need to enable special-casing for queries involving JSON path in PostgreSQL where MySQL only needs ->> in both SELECT and WHERE clauses. That is,
JSONB_PATH_QUERY()
andJSON_VALUE()
, respectively.Interestingly, SQLite acknowledges this difference between MySQL and PSQL (I couldn't find much on it anywhere, surprisingly) and it automatically handles disambiguating based on the value to the right side of the
->>
looking like a jsonpath or not. So it looks like we'll only need to special-case this for PSQL. - Status changed to Needs work
over 1 year ago 7:00am 30 July 2023 - last update
over 1 year ago 29,911 pass - Status changed to Needs review
over 1 year ago 6:28am 31 July 2023 - 🇺🇸United States bradjones1 Digital Nomad Life
So I refactored this a bit, in a way that I hope will document the various differences in operation between MySQL, SQLite and PostgreSQL for future developers (this stuff is complicated) as well as perhaps help us forge a path forward on a standardized DX for querying JSON values in these various database types.
The issue described in #21 led me to initially working on PGSQL-specific syntax, however I started to wonder why we had different _result_ truth tables for each db type. It seemed to me that we would want to demonstrate how you get the _same_ data out of the database as you put in, however the SQLite and MySQL tests for instance had 1 and 'true' (the string) returning for the boolean value true, which seemed not quite right to me. At the moment we're basically just testing that we know how to integrate with each database and set the column type correctly... but if you couldn't turn back around and set the value you received from the db back into the same key (symmetry) that doesn't seem right to me.
In a way, it's kind of silly to run these tests at all, because we're more or less just testing the database drivers, which is not in scope of Drupal's codebase. But it does help us figure out how we are going to attack JSON support in core, particularly when it comes to querying.
There's a big discussion on 🌱 [Meta, Plan] Pitch-Burgh: JSON field storage & JSON field schema support Active about if we can require SQLite >= 3.38.0, because that's the version that added the
->
and->>
operators. I think my tinkering on this issue today renders that issue moot.Those operators are shorthand, kind of, for "JSON result" and "SQL result" formatted, respectively. Except the devil is in the details, particularly when it comes to things like boolean results as I mentioned before. Also, SQLite basically does some very unintuitive conditional result formatting even if you use the
->
operator, which _should_ give you JSON formatted results but has various special-cases (such as 1/0 for booleans.) Basically the only way you can 100% force SQLite to reliably, always give you a JSON-formatted result is to use theJSON_EXTRACT()
function and pass a second path (it can be a dummy, like in the tests I use $._) and then you will get an array of valid JSON objects, and we can always use the first and throw away the second.PGSQL's operators are similarly wonky, as the original test failure demonstrated. You can't pass a "true" jsonpath to the
->>
operator, only a key name. If you want to use jsonpath you must use functions as noted above.All of this led me to write these tests in such a way that we always want to get a JSON value back from the database, and then parse it to get the expected value in a predictable shape (that is, the exact same types as you input.)
The conclusion here is that the
->
and->>
operators in SQLite and PGSQL have unintuitive operation that is inconsistent with the MySQL equivalents.So if the common denominator between all three supported databases is jsonpath, and the already-required versions of all databases (including SQLite) support it through longhand functions, we're on solid footing. The database modules will simply be required to translate jsonpath expressions into SQL.
We will need to do work over in ✨ [META] Add support for JSON field queries in entity queries Active on what entity queries look like (that's not what this issue is about) but I think the work here underscores that even if we do add
->
and/or->>
to the list of allowed operators toQueryInterface::condition()
, it probably wouldn't directly become an->
or->>
in the SQL, except perhaps for MySQL. We would likely do some smoothing in the various database modules to support a predictable subset of functionality that can be implemented predictably across them all. - last update
over 1 year ago 29,911 pass - last update
over 1 year ago 29,911 pass - Status changed to Needs work
over 1 year ago 7:47am 31 July 2023 - 🇳🇱Netherlands daffie
What I am still missing is:
- that we cab use the dynamic to filter on a key value in JSON field. This with condition command. Like\Drupal::database()->select('some_table')->fields()->condition('some_field', 'some_value', '->')
. The operator->
is new and should be documented.
- that we cab use the dynamic to filter on a key value in JSON field. This with condition command. Like\Drupal::database()->select('some_table')->fields()->condition('some_field', 'some_value', '->>')
. The operator->>
is new and should be documented.
- An index should be added, so that the query above does not result in a full table scan. - Status changed to Needs review
over 1 year ago 8:58am 31 July 2023 - 🇺🇸United States bradjones1 Digital Nomad Life
@daffie thanks for the review.
If I'm not mistaken the scope for this issue is to add the data type and demonstrate that the data type mapping for the database modules/drivers results in a usable column of the appropriate type.
You mentioned three points in #14, of which I think we've accomplished the first (demonstrate the column is correctly typed). I think we could add index creation (your third point) in this issue as well and that's in the same vein.
As for adding condition support for entity queries with json data, I think that's 1) a much bigger lift and 2) legitimately a separate issue. I think we can keep the scope of this issue to the data type in the schema API and leave the entity query integration to ✨ [META] Add support for JSON field queries in entity queries Active . Put another way, adding the data type is a feature in itself and if we can commit this soon-ish, provides a firm basis for implementing things like entity queries and #2232427-48: Allow field types to control how properties are mapped to and from storage → .
The end goal of course is to build upon this foundation with a full integration with the entity API but I also respectfully disagree that they are inextricably linked so as to block this issue from getting closed out first.
Also per my last paragraph in #23 I think there's work to be done on the semantics of entity query conditions, but people can start experimenting with and building upon the basic data type support even without entity queries being settled.
- Status changed to Needs work
over 1 year ago 11:15am 31 July 2023 - 🇳🇱Netherlands daffie
Adding support for
\Drupal::database()->select('some_table')->fields()->condition('some_field', 'some_value', '->')
and\Drupal::database()->select('some_table')->fields()->condition('some_field', 'some_value', '->>')
is for the use in dynamic queries. This is not for "entity queries". Just regular database queries. What you are now doing will result in that we shall need database specific expressions for each database. That will result in a lot of database specific overrides and/or in contrib code that will only works with MySQL. When using thecondition()
method we add a wrapper that hiddes the database specific stuff. A much better developer experience.Adding indexes is important, because without an index the query will do a full table scan. Which is very slow.
We can do this issue without doing the
condition()
wrapper and without the index stuff. It is acceptable for me.
I am pretty sure that those 2 parts will need to be added, before all the other stuff can be added. For me fixing all the database layer stuff in a single issue is the best solution to make sure that it all works together. I know that it is all complicated.
Getting the database layer stuff in is the foundation for all the other functionality that is build on top of it.We shall need a change record for the added field type. Contrib database drivers need to implement it too. BC break?
- 🇺🇸United States effulgentsia
For now I opened ✨ Add support for JSON field queries in database API Active to discuss and start experimenting with the changes to the database API that would be needed. I don't currently have an opinion on whether or not that issue needs to block this one.
- 🇩🇪Germany Anybody Porta Westfalica
Thanks for opening ✨ Add support for JSON field queries in database API Active . I agree with #25 that issue shouldn't block this one. The basic data type is the important first step.
- 🇺🇸United States bradjones1 Digital Nomad Life
Thanks all.
When using the
condition()
method we add a wrapper that hiddes the database specific stuff. A much better developer experience.Agree totally, and the just-opened ✨ Add support for JSON field queries in database API Active has a good starting IS for us to start hashing out the syntax here. I think it's great that the proposal there is different in syntax from what I had originally thought of in my head, noodling on this, so we will at least have a few different options to discuss and choose from on the specific syntax.
That all said, I think incrementalism here is our friend. I am already able to implement this patch on a project site and that enables me to at least start storing data in native json field types (granted, with a custom field item plugin, but very very easy to do) and that's a good step in the right direction even if we haven't yet ironed out the querying side. Addressing the generic query syntax is a further incremental step along the way to entity queries. Building blocks that are more digestable, smaller chunks.
As regards indexes, I think it's slightly more nuanced than either not performant ("full table scan") or explicitly indexed. E.g. from the MySQL docs:
The binary format is structured to enable the server to look up subobjects or nested values directly by key or array index without reading all values before or after them in the document.
Not denying at all the importance of indexes (and some anecdotal evidence exists for comparison on some search types) but just pointing out that at least one engine does some optimization beyond scanning all records. That being said I think demonstrating in this changeset the method for adding an index (e.g. MySQL generated column or Postgres JSONB indexing) is definitely a good thing.
One question for those who are more expert in SQLite... it doesn't appear its indexing functionality supports JSON data in a similar way to MySQL or PGSQL? I don't think this is at all a blocker for us but also don't think we need to go into the weeds in core to like, create DB triggers to emulate some sort of indexing, as that linked forum post describes. We might just put a note in the test coverage or the interface to explain that indexes aren't create-able for JSON data in SQLite, or that it's your responsibility to manually implement them with a work-around based on the actual structure of your data.
- last update
over 1 year ago 30,145 pass, 2 fail - 🇺🇸United States bradjones1 Digital Nomad Life
OK, so I've addressed some of the code style pieces re: code re-use, though this is balanced against the fact that syntax for JSON operations (and particularly index creation) means that we can't really abstract this away completely.
That brings me to some subsystem maintainer type questions:
I added a
mysql_as
option on the field spec, and added a note on the DB Schema API docs as well regarding driver-namespaced options like this.TL;DR, MySQL uses a special syntax for generated columns. This has the nice added benefit of supporting generated columns in general, not just for JSON. However, this doesn't seem to be very much in demand until now since I couldn't find any issues requesting or mentioning this. But anyway, I don't think we can get around this special syntax since you have to proactively decide what JSON path query to target with your index... so this is expert-level stuff. At least we are providing an example here. Because the generated column operates more or less like any other when it comes to being part of an index, the existing index syntax works here no problem. I believe MySQL JSON data indexing is supported and is ready for review.
Postgres is a different beast. Their GIN indexing engine explicitly supports indexing on JSON columns, but we need to hook into the
CREATE INDEX
syntax more directly than with MySQL. Turns out that GIN/GIST indexing is underway over at #2988018-95: Performance issues with path alias generated queries on PostgreSQL → , however that path currently prefers GIST and we need GIN for JSON support... also it seems Postgres recommends GIN anyway on balance. I'll comment on that issue about adding GIN/GiST indexing in general.The TL;DR on Postgres indexing is that you don't specify an index on a particular path like with MySQL... you just tell GIN which JSON column you want to be able to contribute to indexing... however queries that use the index are limited to:
key-exists operators
?
,?|
and?&
, the containment operator@>
, and the jsonpath match operators@?
and@@
or optionally
The non-default GIN operator class
jsonb_path_ops
does not support the key-exists operators, but it does support@>
,@?
and@@
These could be useful operators for some specific cases, however I don't think we're going to support them in a first implementation, preferring to instead implement very specific equality comparison and probably some basic numeric comparision, always by json path (because that's the lowest-common-denominator between drivers.)
It would be awesome to do something like the
@>
"containment" operator across all the drivers, however I can only find this functionality in Postgres and MySQL, it doesn't appear SQLite supports searching for a contained value within another.For that reason, I'm wondering if we can basically +1 the idea of supporting GIN index creation, but because these indexes do not contribute to satisfying queries with
->
,->>
orjsonb_path_query()
, we will postpone this on a decision around supporting "containment" queries in the future.If we're ok with that, then I think this is ready for review.
- Status changed to Needs review
over 1 year ago 5:22am 15 September 2023 - last update
over 1 year ago 30,157 pass - last update
over 1 year ago 30,157 pass - last update
over 1 year ago 30,136 pass, 2 fail - last update
over 1 year ago 30,157 pass - 🇳🇱Netherlands daffie
Unfortunately is the testbot for MariaDB failing for the current MR with the following error:
PDOException: SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '>'$.key') VIRTUAL,
PRIMARY KEY ("id"),
INDEX "jsonpath" ("generated_field_for_' at line 4-
These could be useful operators for some specific cases, however I don't think we're going to support them in a first implementation, preferring to instead implement very specific equality comparison and probably some basic numeric comparision, always by json path (because that's the lowest-common-denominator between drivers.)
Fine by me. Just as long as the database specific stuff is hidden behind a wrapper. That wrapper will be the method
condition()
. Something like:$results = $connection->select('test_json') ->fields('test_json') ->condition('testfield', 'some_path', 'some_value', 'some_operator') ->execute() ->fetchAll();
My questions:
1. Which values for the operator will you add and what do they do?
2. Is that enough functionality to do everything you/we want to do with entity queries and other uses of the JSON fields? -
If for PostgreSQL the GIN index is best, then please use that. Maybe add a generic JSON field index to the Schema API. For PostgreSQL that will result in an GIN index and for MySQL/MariaDB in another special type index. Lets hide those implementation specifics from regular Drupal developers. That is the beauty of the Schema API, for regular Drupal developers all databases work the same.
Is there an SQLite index for JSON fields?
For testing the use of an index you can do something like:
$query = $connection->select('test_json'); $query->addExpression('json_extract([test_field], :path, "$._")', NULL, [ ':path' => $path, ]); $sql = $query->toString(); $result = $connection->query('explain ' . $sql)->fetchAll(); // The result will contain an explanation of how the database will run the query and which indexes are used.
-
For the
mysql_as
:
- The testbot is failing for MariaDB for this.
- I am not very happy with it. If it is necessary to add the partGENERATED ALWAYS AS (test_field->'$.key') VIRTUAL
, can we do this without adding the Schema API? We already know the field name for the field.
-
- last update
over 1 year ago 30,171 pass - 🇺🇸United States bradjones1 Digital Nomad Life
@daffie thanks for the review. To address your bullets in order:
-
You've raised the potential syntax for the entity query side before, and I think that's spun off at
✨
Add support for JSON field queries in database API
Active
.
As far as indexes go, here's a summary of index support for the various drivers:
MySQL, MariaDB and SQLite support indexing JSON column data via generated columns. In the case of MariaDB and SQLite, however, you must query the generated column directly, which means 1) creating said generated column in a way that makes sense for your data structure and 2) knowing to/being able to filter on it per se. MySQL's optimizer is, however, smart enough to look at the query you're doing on the JSON column and consult the indexes on the table to find a candidate index that covers your query. "The MySQL optimizer also looks for compatible indexes on virtual columns that match JSON expressions." This is a bit of an afterthought statement in the docs but is very meaningful in practice - it just works. I tested this on and the same cannot be said of MariaDB. So for these databases, only MySQL is really efficient in allowing you to create indexes that are optimized by querying the JSON column directly, even though it involves a generated column. The catch is, you need to generate the column. See below.
Re: Postgres, it has the strongest index system in GIN, and I think what we can do here is automatically create a single-column index that covers any JSON data column. The catch with GIN is that it doesn't optimize queries based on JSON_EXTRACT() statements, like MySQL, but you can use supported operands that give you queries like
select * from a where j @? '$.d ? (@ > 20)'
, e.g. for a greater-than value query on thed
key. - I think we can do auto-generation of indices for Postgres sites, and as described above, there's no real way to give you the current Drupal abstracted query experience with generated columns for SQLite or MariaDB.
- I've fixed the MySQL test to also work on MariaDB, but as explained above, it won't really result in an index-optimized query when querying the JSON column directly. I don't think there's a real way around this at the moment without adding a lot of complexity to the DB API. I also don't love requiring a new key a la mysql_as, but as I've explained, you need to specify some sort of JSON path to populate the generated column - unlike Postgres, you can't just tell MySQL to generate a hash table for the whole column. So I think either we say, look, if you want zero-config indexes for JSON data, use Postgres, or make it clear that you need to do some explicit optimization for MySQL.
I like the idea of an auto-generated index because it gives you performance by default and in this example, j is the json column and I think we'll always basically be querying from the root of the json column's document structure. You can target a GIN index on a sub-key, but that would require the kind of explicit configuration that MySQL needs.
-
You've raised the potential syntax for the entity query side before, and I think that's spun off at
✨
Add support for JSON field queries in database API
Active
.
- last update
over 1 year ago 30,171 pass - last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago 30,344 pass, 12 fail - last update
over 1 year ago 30,367 pass - last update
over 1 year ago Build Successful - last update
over 1 year ago 30,344 pass, 12 fail - last update
over 1 year ago 30,344 pass, 12 fail - last update
over 1 year ago Build Successful - last update
over 1 year ago 30,344 pass, 12 fail - last update
over 1 year ago 30,363 pass, 2 fail - last update
over 1 year ago 30,367 pass - last update
over 1 year ago 30,367 pass - last update
over 1 year ago 30,367 pass - last update
over 1 year ago 30,365 pass, 2 fail - last update
over 1 year ago Build Successful - last update
over 1 year ago 30,365 pass, 2 fail - Status changed to Needs work
over 1 year ago 2:10pm 26 September 2023 - last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago Custom Commands Failed - Status changed to Active
over 1 year ago 3:31pm 26 September 2023 - 🇺🇸United States bradjones1 Digital Nomad Life
Ah, PHPStan didn't catch the fact that
Postgres
is alone among the drivers in passing NULL as the$args
parameter, instead of an array. I updated the phpdoc for this parameter to hopefully hint IDEs better on that in the future. This was addressed in a null-coalesce in execute() but since we are now iterating on it, I added an explicit set-when-NULL at the top of the method. I wanted to update the param typehints accordingly but that is a very complex BC dance.@catch did help with pointing me to the proper way to add typehints, so we could perhaps do that here, but this MR is getting a bit large as it is.
- last update
over 1 year ago 30,367 pass - last update
over 1 year ago 30,224 pass, 71 fail - last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago 30,367 pass - last update
over 1 year ago 30,345 pass, 2 fail - last update
over 1 year ago 30,367 pass - last update
over 1 year ago 30,367 pass - last update
over 1 year ago 30,367 pass - last update
over 1 year ago 30,367 pass - Status changed to Needs review
over 1 year ago 11:37pm 26 September 2023 - 🇺🇸United States bradjones1 Digital Nomad Life
OK, I think this is ready for review and should address all feedback received thus far.
A few notes to reviewers:
"Why are we doing all this new stuff around binding parameters with explicit type?" - Basically, see https://www.bennadel.com/blog/3189-type-coercion-will-bypass-index-selec... and https://www.bennadel.com/blog/3937-exploring-type-coercion-and-value-com... which TL;DR means index selection (particularly the auto-optimization using an index on a generated column in MySQL) is sensitive to the type of the parameter where most SQL operations are safely type-coerced. I discovered this while writing the test for the MySQL index selection, so tests++. This also does add some marginal new flexibility for people to do interesting things in queries and is added in a BC-compatible way. I would strongly urge us not to back out of this change, particularly as it might come in handy as we proceed with the query-builder side of this.
What's up with the different index support across databases? Re: @daffie in Slack: "The most used database with Drupal is MariaDB. Using JSON storage without the possibility to add an index on a JSON field with MariaDB, makes the whole JSON storage thing not very useful to me."
Respectfully, I think this is a bit reductionist, and JSON data is still plenty powerful even with the differences between database drivers. In short, just because a database might not provide a straightforward path to automatic or low-effort indexing for JSON path queries, that doesn't mean the functionality is useless. Database indexing is also just one of many factors in performance. E.g., Drupal might cache the results of an expensive query, or said queries can be run during non-page-blocking async requests (e.g., JSON:API) or, whatever. It's also worth noting that a lot of Drupal sites in the wild are likely under-indexed, because Drupal doesn't necessarily know your data structure and thus can't magically create indexes for you.
I think this is mostly in response to my notes in #33 around the out-of-the-box indexing performance of various databases. Postgres is basically free, broad-coverage indexing with GIN, MySQL enables you to index JSON data paths that are auto-optimized for matching queries. MariaDB allows a similar generated column generation, which can also be indexed, however it does not auto-optimize queries on the "source" column. SQLite similarly allows generated columns, yet it does no auto-optimization.
This does lead me to a question around broader support for generated columns. In the MariaDB example, if a site owner really wants to get index performance benefits from a JSON data query, he could write code that creates and then later makes queries against the generated column. That's out of scope right now for anything in core, but it's a straightforward performance improvement.
Right now I have added support for creating a generated column with the
mysql_as
member of the column schema, however should we change that to just,as
, and implement it across all database drivers? I don't want to clutter this MR too much, but the syntax is basically the same for all supported drivers and it's a kind of "why not" item. This could also potentially address @daffie's note in #32. I might just go ahead and make that change in the coming days if nobody feels strongly about it.Where's the query builder and entity API implementation? We decided early on that adding JSON data support in smaller-scoped issues would help keep things moving. It is also entirely possible to utilize this issue's functionality now, without entity API integration, e.g. for custom field types. As you can see from the above items, this already has creeped out from being "just" JSON data stuff, by way of updating related code for compatibility, so I'm trying to keep it tight.
- 🇳🇿New Zealand quietone
I am wondering what documentation needs to be updated along with this. The IS contains a link for Drupal 7 documentation, where is the similar documentation for Drupal 8+, I was not able to find it?
And related to that is that there is no entry for the JsonAPI on Drupal APIs → . Is there an issue to create that? There is a guide for the Jsonapi module and it looks much of that information should be moved to a new Json API guide.
- 🇳🇱Netherlands daffie
Where's the query builder and entity API implementation? We decided early on that adding JSON data support in smaller-scoped issues would help keep things moving.
I get why you want to do the JSON field storage & JSON field schema support in smaller scoped issues. Normally I would agree with you on this. Only the JSON storage, query and index support from the different by Drupal core supported databases makes this difficult. We unfortunately need to carefully check that the functionality we would like to support in JSON field entity queries is supported by the databases.
Drupal's dynamic database queries is used to hide implementation details for database specific stuff. In the case of JSON it is the way JSON fields are queried. Each database has its own way that it requires the field to be queried. Therefor Drupal dynamic query needs to add database independent way to query JSON fields. To me the best way is something like this:
$results = $connection->select('some_table') ->fields('some_table', ['some_field']) ->condition('some_field', 'some_value', 'some_operator') ->execute() ->fetchAll();
The question for me is about the
condition()
method. What I would like to know is:
1. Which operators are going to be supported?
2. What functionality do those operators support?
3. Are there any limitations on JSON field queries?When it is clear what functionality we can support, I will go to the subsystem maintainers of the entity system and ask them if this is enough to support JSON field entity queries in Drupal core? When that is not the case, it should all go in a contrib module.
I am not very happy with the fact that MariaDB does not support indexes on JSON fields. Very disapointing!
I wish that we could do a lot more with JSON storage. Even more than what you are now working on @brad.jones1. Only we are limited by what the by Drupal core supported databases support.
@brad.jones1: What about waiting for Drupal 11 and have the ability to use the JSON operators
->
and->>
? - 🇺🇸United States bradjones1 Digital Nomad Life
@daffie thanks for taking a look. Addressing your concerns and questions more-or-less as you raised them:
I get why you want to do the JSON field storage & JSON field schema support in smaller scoped issues. Normally I would agree with you on this. Only the JSON storage, query and index support from the different by Drupal core supported databases makes this difficult.
The main reason it's important to split up the various components of this initiative is exactly the reason you're skeptical. There are useful reasons to implement JSON data storage as implemented in this issue alone - e.g., custom field types that can leverage this data. They could even query a generated field to obtain index performance optimizations! Each step of the process adds functionality independent of the others, and they are additive.
We unfortunately need to carefully check that the functionality we would like to support in JSON field entity queries is supported by the databases. Drupal's dynamic database queries is used to hide implementation details for database specific stuff.
We are in agreement - and the brilliance of this abstraction is exactly what will make this work. As I explored in #23, the database drivers have wildly differing and inconsistent applications of common syntax (e.g., ->> doesn't work the same everywhere, or is even supported in SQLite) but that doesn't matter. The lowest-common denominator is that each driver has JSON-specific functions and/or operators that we will implement in each driver to get the desired functionality. I've learned a lot about the different implementations of JSON data in the various supported db types in the course of working on this. It's hard stuff, but we can for sure implement MVP-level implementation.
The question for me is about the
condition()
method. What I would like to know is:
1. Which operators are going to be supported?I've explored this a bit in the feature-specific issue #3378275-2: Add support for JSON field queries in database API → . I believe the current universe of supported operators will overlap what we can do with JSON values. There are functions in all supported drivers to filter on a JSON Path value, and so they should be able to compare like any other value "straight" from a column. There are considerations around type safety, but I have addressed that in this issue by giving the database API the opt-in ability to bind query parameters with expliict typing. See the first point of #37.
2. What functionality do those operators support?
As noted in the previous item, I believe any value you get from a JSON path can be compared using existing SQL syntax. I think you might be asking about special database-specific operators such as these? Each database driver will use its specific syntax under the hood, e.g. for GIN indexes on Postgres we'll need to follow these guidelines: "The default GIN operator class for jsonb supports queries with the key-exists operators ?, ?| and ?&, the containment operator @>, and the jsonpath match operators @? and @@" I don't think we'll be exposing any of these driver-specific operators for direct use, however.
3. Are there any limitations on JSON field queries?
The question of indexes is an obvious one, however I am less pessimistic on this point than you. My take on this boils down to 1) let developers weigh these trade-offs for themselves, e.g. running expensive queries asynchronously or caching their results using existing means and 2) there are work-arounds such as using generated columns, however this isn't automagical.
A second point worth raising here is not so much a limitation in JSON queries themselves but in our existing query builder syntax. Right now we query on an entity API field's [primary] property, or at a lower level, a specific column. With JSON path we need to know the column to query, but also the JSON path. I raised this at by proposing one idea:
we will need to address how to merge these concepts with our DBAL's chaining feature. E.g.,
->condition('json_field_name.$.json.path.to.subelement', 'value', 'operator')
. This is kinda like how you can doentity_reference_field.entity.that_entities_field.property
.Since jsonpath is IETF standards-track which is used by the supported databases, we can safely use it in our abstraction layer and parse as appropriate into db-specific statements.
When it is clear what functionality we can support, I will go to the subsystem maintainers of the entity system and ask them if this is enough to support JSON field entity queries in Drupal core? When that is not the case, it should all go in a contrib module.
I believe there's sufficient information above to have this conversation and, I believe, make a defensible case for putting this into core. It's also worth keeping in mind our "competition" here. It doesn't go unnoticed by me, anyway, that people love talking about how Drupal is past its prime. I think our core development gates keep quality high, yet leaving this kind of thing on the cutting-room floor would reinforce the justified negative take on Drupal that it's not modern enough. For instance, Laravel has had good support for JSON data queries for five years. I am not asking us to lower our standards, however it seems like this might be a case of the Nirvana fallacy ("Perfect is the enemy of the good.") Core development is also not democratic, but it's worth noting that we're doing this because the proposal won a contest that demonstrated at least some broad support for it. If it ends up in contrib, that's fine I guess, though I don't see anything here that wouldn't be broadly beneficial as an API to build cool things upon, with core quality.
What about waiting for Drupal 11 and have the ability to use the JSON operators -> and ->>?
This is the one point at which I will strongly redirect as I think there's an important misunderstanding. As i explored in #3369988-12: [Meta, Plan] Pitch-Burgh: JSON field storage & JSON field schema support → , there is no consistent application of the
->
and->>
operators across drivers, and as noted above, they aren't compatible with GIN indexing on Postgres anyway. Laravel uses->
as an intuitive chaining delimiter in front of its jsonpath abstraction.As I work on this more I am becoming a little more in favor of a new alternative to
::condition()
that would be specifically for JSON data. It would give us the ability to not get too cute with the field parameter (e.g., for the syntax I spitball above). We could one-up Laravel's support by supporting more than just equivalence or containment (with specific syntax abstracted by the database drivers.)So to your specific question about if we would have to wait for D11 to support these operators directly - no, and we wouldn't anyway. (Especially not in a way to mislead anyone into thinking they work in any of the conflicting ways they are used directly in the databases.)
In conclusion, this stuff is hard and complicated. But we have actually done a ton of the foundational work in this issue, and uncovered some important areas for discussion and improvement (e.g., indexes.) And one advantage of being a little late to this party is that we can probably lift a lot of the db-specific code from Illuminate/Laravel, and it's MIT. Let's go!
- Status changed to Needs work
over 1 year ago 8:17pm 1 October 2023 - 🇮🇹Italy mondrake 🇮🇹
Added some comments inline in the MR.
Apart from those, I am worried about the
mysql_as
key for fields. That breaks the abstraction because it forces consuming code to define it, therefore forcing it to know about mysql specifics. Also, it means alternative contrib/custom drivers will not have that chance. - 🇺🇸United States bradjones1 Digital Nomad Life
Thanks!
Apart from those, I am worried about the mysql_as key for fields. That breaks the abstraction because it forces consuming code to define it, therefore forcing it to know about mysql specifics. Also, it means alternative contrib/custom drivers will not have that chance.
As I mentioned in #37 this can just become as to support generated columns in all databases. It wouldn't necessarily be related to JSON data support in all of them, but it will let us not do driver-specific namespacing and the base feature is supported across all of them. So I share your assessment and this can be made consistent across all drivers.
- last update
over 1 year ago 30,378 pass - 🇺🇸United States bradjones1 Digital Nomad Life
I extended the implementation of generated columns to all drivers, so we are no longer awkwardly namespacing this to only mysql, and it might be useful new functionality for power-users, anyway.
Incorporated @quietone's docs suggestions. As far as entries in the guide:
And related to that is that there is no entry for the JsonAPI on Drupal APIs. Is there an issue to create that? There is a guide for the Jsonapi module and it looks much of that information should be moved to a new Json API guide.
I think there's some confusion as to the different technologies at play here. While this is JSON data, these changes have little to nothing to do with JSON:API. This is a new data type for the database API. Where should that go, then? (This isn't really typed data API, I don't think.) I did look at the existing database API docs and they mostly point to the auto-generated API docs. I did however notice that I needed to add 'json' to the list in the docblock of supported generic types.
- last update
over 1 year ago 30,378 pass - 🇺🇸United States bradjones1 Digital Nomad Life
Adding MySQLi issue as related, since it touches some of the same code.
37:49 36:54 Running- last update
over 1 year ago 30,386 pass - last update
over 1 year ago 30,384 pass, 2 fail - last update
over 1 year ago 30,386 pass - Status changed to Needs review
over 1 year ago 7:32pm 4 October 2023 - last update
over 1 year ago 30,386 pass - Status changed to Needs work
over 1 year ago 9:59pm 4 October 2023 - 🇺🇸United States bradjones1 Digital Nomad Life
I got some good feedback in a Slack DM today on how we might be able to make some of the indexing/generated column handling more automated for MySQL, MariaDB and SQLite. Basically, we can have the hot-path jsonpath expressions put into the schema config, and create generated columns as necessary for indexing. In the case of SQLite and MariaDB, we can then consult the schema config for the path used in the DBAL query and, if there's a match, query the generated column instead. And, like with what we're doing with Postgres, it can automatically ensure those generated columns are covered by an index. It also allows us to back out of using the
as
key for this, which means you won't get unnecessary generated stored columns on Postgres if you're just trying to target cross-database compatibility.I'm tempted to keep the
as
key and possibility for explicit generated column code in, though, since it could still prove useful for cases not covered here. But I don't feel strongly about that.We'll see how heavy a lift this is but i think the plan is pretty straightforward.
- Open on Drupal.org →Environment: PHP 8.2 & MySQL 8last update
over 1 year ago Not currently mergeable. - last update
over 1 year ago 30,384 pass - 🇺🇸United States bradjones1 Digital Nomad Life
I made very good progress today on the plan outlined in #46, though it required me to veer more deeply into the condition builder, and particularly create a new
::jsonCondition()
method. I want to polish these changes a bit more before I push them up for review. Two main items on this though:"Do we really need a new method as an alternative to
::condition()
?" I think so. The new method signature adds a$jsonpath
argument, which is distinct from the field being filtered. In addition, this method allows us to provide override-able protected methods to the various database drivers for fine-tuning their handling of these queries, and even doing things like swapping out the column to be queried, e.g. in the case of MariaDB, so we can make use of automagical indexing.To accomplish the field-swap to the generated column in supported scenarios, there's also new code to 1) create these columns when necessary, 2) create indexes for them, and 3) use the generated column instead of the requested one in scenarios when it will provide a performance boost. To do this, however, the
Condition
classes need to receive the $table being targeted for the condition in its constructor. This is going to require the awkward deprecation do-si-do for adding to existing signatures. In addition I think we'll need to do some creative stuff around adding the new::jsonCondition()
methods toConditionInterface
and the various drivers. It can be done, just a lot of extra deprecation boilerplate.Anyway, that's what I've been working on - it def. seems like this issue will come to include the work that was intended for ✨ Add support for JSON field queries in database API Active but that's fine.
- last update
over 1 year ago Custom Commands Failed - 🇺🇸United States bradjones1 Digital Nomad Life
OK, pushed up a lot of changes that include automatic creation and use of generated columns for MySQL/MariaDB and SQLite. This also includes the first implementation of a ::jsonCondition() method. I fully understand this includes some changes to function signatures that will need a deprecation dance, but I want to get this up and looked at before going through all that since it's all boilerplate anyway. So please don't get hung up on that just this second - I know it needs a deprecation but there are bigger items to hash out that are not just busy work.
I hope this puts to bed most all the concern about index performance, and I appreciate the help I've received thus far from everyone. I think this issue is now a duplicate of ✨ Add support for JSON field queries in database API Active since I had to wade into the condition abstraction methods. The good news is, I've learned a ton about the internals of this API in doing so and I think we are actually really close to some huge breakthroughs.
At this stage it would be good to know if there are any strong objections to 1) a new ::jsonCondition() method, and 2) the way we've handled the auto-creation of generated columns.
I am also updating the IS but understand that there is still follow-up work to be done in these areas:
- Clean up any auto-created generated columns on changes to columns (e.g., adding new or deleting jsonpath hotpaths or deleting columns entirely)
- Deprecation dance for new::condition()
method andConditionInterface
/Condition
- last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago 30,390 pass - last update
over 1 year ago 30,390 pass - last update
over 1 year ago 30,390 pass - last update
over 1 year ago 30,390 pass - Status changed to Active
over 1 year ago 5:51am 8 October 2023 - 🇺🇸United States bradjones1 Digital Nomad Life
Moving this to "active" as well, it's more active than NW.
Also we will need to add a conditional on the auto-creation and use of generated columns in SQLite since support for them was only added in version 3.31 yet D10 supports down to 3.26. I don't think this is a huge deal, since the lower version still has JSON support overall... you just won't get the automated indexing magic.
Running against the other 3 db drivers not part of the auto-testing to make sure I didn't regress on these latest changes, at least.
- last update
over 1 year ago 30,368 pass, 2 fail - last update
over 1 year ago 30,390 pass - last update
over 1 year ago 30,368 pass, 2 fail - last update
over 1 year ago 30,390 pass - last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago Custom Commands Failed - last update
over 1 year ago 30,390 pass - last update
over 1 year ago 30,390 pass - last update
over 1 year ago 30,390 pass - 🇫🇷France andypost
FYI the latest release of sqlite 3.43.x
Performance enhancements to JSON processing results in a 2x performance improvement for some kinds of processing on large JSON strings.
- Status changed to Needs review
over 1 year ago 8:12pm 11 October 2023 - 🇬🇧United Kingdom catch
I haven't reviewed the MR yet, but just quickly on the MariaDB lack of index support - I am not too worried about this because:
1. We are not migrating everyone's entity and field tables to json any time soon, so no-one is forced to use this yet.
2. By the time this is in widespread use, there might be a MariaDB release with better index support.
For me it would become a concern if we start running JSON queries in core (and then if there's a MariaDB release with better index support, we might want to raise requirements to that release etc.), not so much before that. If a contrib module starts using it heavily, they might want to add a note about which database drivers support indexing etc.
- Status changed to Active
over 1 year ago 11:13pm 11 October 2023 - 🇺🇸United States bradjones1 Digital Nomad Life
@catch thanks for dropping in and providing your thoughts on that. I understand you were only able to briefly catch up, however I _think_ I've largely addressed the performance concerns raised initially by @daffie by adding in an automagical work-around for the indexing question, so long as the column is defined with jsonpath "hotpaths" that correspond with jsonpath expressions that are reasonably expected to be queried and would benefit from an index. The actual nitty-gritty of doing so is behind the abstraction layer. So I think your analysis is good and I agree with you, _and_ we can do something to improve performance in this area regardless.
I am going to move this back to Active since this isn't NR in the sense that it is ready for review ahead of being RTBC, mostly because since we have veered into adding ::jsonCondition() that needs to be completed before this is likely to quality for core commit.
Keep the feedback coming and thanks also @andypost for the pointer to the latest SQLite changes. This is an exciting area to work in and I think Drupal will have the best framework support for JSON data across our competitive landscape, hands down.
- 🇬🇧United Kingdom catch
Discussed this a fair bit with @alexpott and @daffie (and possibly others, sprint room conversations tend to blend into one long one on multiple subjects) at DrupalCon Lille.
@daffie was concerned about MariaDB currently not supporting JSON indexes in general.
@catch (me) is only concerned about that if we used JSON for entity and field storage.
@alexpott was concerned about it for entity and field queries, and additionally if we used JSON for config storage (which would allow us to drop some key/value hacks that enable config lookups).
We came up with the following proposal:
1. Regardless of indexing support, we would like to introduce JSON support to replace serialized PHP where possible - an obvious example is the {users_data} table.
2. I (@catch) think that unindexed config queries might not be too bad, since there is an approximate maximum number of rows to scan of around 2-3,000 which might be negligible in the cases we are looking up uuids or block list, most of which are behind cache set/get, or not in the critical path. We could try to implement it for that, and see how it looks. @alexpott and @daffie were also I think OK with this plan - i..e try and see.
3. We all feel strongly that any replacement for entity/field support should first be implemented as a contrib module, and would have to be demonstrated as working well on a real site with a large data set before inclusion in core. It also needs to be compared against existing contrib solutions like MongoDB. However #1 and #2 are enough of a use case to add support, without rushing into #3.
- 🇺🇸United States bradjones1 Digital Nomad Life
Thanks, @catch. That all sounds reasonable.
I do want to be very explicit though and say, this issue is only about adding the API that would enable any of the above practical examples to be implemented. I am more with you on the question of the impact of totally-automatic, native-supported indexing - and keep in mind this MR already contains a work-around for known hotpath jsonpaths to be indexed using generated columns. And yes, paired with additional caching layers, this becomes less of an issue as well.
My proposal/contract with the Drupal Association on this functionality includes a proof-of-concept implementation for a single service, and I also proposed the same as you - user data service.
At its core, this is an API that enables all sorts of cool innovation, both in core and outside. For instance, this issue combined with the other "half" of my Pitch-burgh project, OOTB schema generation, means you could have a JSON-backed field exposed over JSON:API, for instance, with a defined schema, heck, even validation against said schema. That could enable all sorts of neat decoupled/HATEOAS craziness.
But yeah, TL;DR, that's all a great outline of what _could_ follow from this issue, but this isn't about replacing field storage, or config storage, or anything like that, yet.
What I don't see in your comment is any real concern for the implementation as it's laid out thus far. Obviously there's follow-up to be done here, as I outlined in the IS - e.g., making sure we provide an update method for existing schemas and cleaning up generated columns on table changes. The entity query API integration will leverage the new ::jsonCondition() method in the database API, with driver-specific abstractions for various operators.
Thanks, all.
- Status changed to Postponed
about 1 year ago 10:19am 30 October 2023 - 🇳🇱Netherlands daffie
In 🐛 [PP-1] Performance issues with path alias generated queries on PostgreSQL Postponed there was some discussion about how to add GIN/GIST index support in core. I have created a separate issue for it ( 📌 Adding GIN and GIST indexes to PostgreSQL databases RTBC ). I am postponing this issue on getting the other is done.
- 🇺🇸United States effulgentsia
I added this as a target issue that we'll allow in post-alpha (but before beta) in 🌱 [meta] Release preparation steps for 10.2.0 Active if it's ready in time.
- 🇺🇸United States bradjones1 Digital Nomad Life
MR over at 📌 Adding GIN and GIST indexes to PostgreSQL databases RTBC for GIN and GIST indexes.
- 🇺🇸United States bradjones1 Digital Nomad Life
Rebased and also included the code from the GIN/GIST issue, will continue to incorporate any changes from there. I'd say this is now un-postponed but I'd like to get feedback on that code before removing the flag and label.
- 🇺🇸United States bradjones1 Digital Nomad Life
OK, been working on ensuring all the various helper methods are addressed, as well as their test coverage. I believe I've addressed the requests in @daffie's last review, except for:
make the method Drupal\Core\Database\Schema::dropIndex() work with JSON (or virtual) field indexes or add a new method for removing an index to a JSON (or virtual) field.
When it comes to
::addField()
,::dropField()
, we address the creation and cleanup of any generated indexes and their dependent auto-generated columns. (Boy, did this get complicated.) For::addIndex()
, we support the creation of explicit indexes.As for
::dropIndex()
, this should only be called for indexes which are not managed by core (that is, they were not automatically-generated.) I don't believe we should do any cleanup of any dependent auto-generated fields if you go rogue and explicitly use the API to drop an auto-generated index. For completeness I have added some add-drop index operations to the base method for testing JSON data types, but in this case it's there only as a smoke test and doesn't correspond with any new code."Complicated" conditions queries. Queries with multiple conditions, were one or more are JSON conditions. For instance, when we do a entity query we need to do a JSON condition and a condition on the field "deleted" to be FALSE.
Makes sense and since I believe I'm now mostly done with the database API pieces this will be more obviously tested with the entity API integration.
I am going to try and see if I can do the MVP implementation of entity query integration without having to wade too deeply into 📌 Allow field types to control how properties are mapped to and from storage Needs work . Or, perhaps now that we have a concrete use-case for entity and field API updating its serialization, this might unstick that issue.
So to summarize, next step is circling back on the new
::jsonCondition()
method, and implementing driver-specific syntax for a base set of operators.Still targeting end of the month to get this done. ("Done" in this case being that I've achieved the objectives laid out in the grant, not necessarily that this is committed - especially with holidays that's impractical.)
- 🇺🇸United States bradjones1 Digital Nomad Life
This is kinda PP'd on 📌 Allow field types to control how properties are mapped to and from storage Needs work in so far as we need the field API to support alternatives to serialization. I'm going to take the same approach as with 📌 Adding GIN and GIST indexes to PostgreSQL databases RTBC and track the MRs in this MR because otherwise it's a bit chicken-egg.
- 🇺🇸United States bradjones1 Digital Nomad Life
Added an example alternative implementation of the user data service to demonstrate how this might be put into practice for a non-trivial use case. Not saying it's perfect or production-ready, but I think it's a nice reference implementation of how this could be done. (Esp. in the case of services that may need to be overrideable per driver, given the differences in JSON function syntax among drivers.)
Interestingly, there is no existing test coverage for the user data service. I had wanted to basically re-use the test cases against a swapped service, but yeah, no. See 📌 UserData needs testing Needs work .
- 🇺🇸United States bradjones1 Digital Nomad Life
Rebased. I didn't catch the comments on the MR earlier because they don't trigger a "new" activity flag on d.o. (can't wait for the GitLab switch!) but I will try to respond and engage on those soon.
- 🇺🇸United States bradjones1 Digital Nomad Life
Rebase now failing, need to investigate. https://git.drupalcode.org/issue/drupal-3343634/-/pipelines/118675/test_...
- 🇺🇸United States bradjones1 Digital Nomad Life
OK, this is optimistically back to NR.
Currently there are some test fails however they seem to be related to the test runner instead of this MR. Asked in Slack what's up.
Don't love doing it, but I think this is now PP-3... BUT they are all very very close to being done.
For anyone reviewing this, consider the PP'd issues:
- 📌 [11.x] [policy] Set the SQLite minimum requirement to at least 3.45 Needs review is a policy question about SQLite minimum requirements in 11.0. We have gone around and around on SQLite requirements adjacent to this issue. The biggest question still outstanding is whether to require an SQLite version that supports JSONB, so we can go "straight" to using that instead of plain-ol JSON. I don't have a strong feeling about this because SQLite is rarely used in production. I am of the belief we should reach for the higher version as a minimum and warn site owners that if they want to run SQLite in production with JSON Data storage, they need to plan accordingly. (It's a very straightforward change to update this MR in that case.) I don't want to hang this up any longer than we already have, though, so I don't care what gets decided so long as we decide on something.
- 📌 Allow field types to control how properties are mapped to and from storage Needs work is necessary for JSON data integration with the field API. It is in NR and is pending framework manager review. I have pinged twice in Slack for review.
- 📌 Adding GIN and GIST indexes to PostgreSQL databases RTBC is necessary for index creation in Postgres. It is in NR.
The two code issues are incorporated into this MR (because they would otherwise have a merge conflict if we patched them in) so keep that in mind. It would be amazing to get this in this week during DrupalCon if we can. OPTASY is sponsoring my dev time right now on core so I have time to commit to getting this over the finish line.
- 🇺🇸United States bradjones1 Digital Nomad Life
Rebased with a fix to testing on 11.x HEAD.
Also this is only PP-2 as the SQLite policy was finalized, that issue was only NW due to a CR update.
This will require a small change to use JSONB out of the box for SQLite which I'll add after confirming we're back to green at baseline.
- Status changed to Needs review
8 months ago 1:05am 7 May 2024 - 🇺🇸United States bradjones1 Digital Nomad Life
I'm going to mark this NR as both of the "postponed" related issues are also NR and incorporated into this MR (for the moment) as they are interconnected.
- Status changed to Needs work
8 months ago 7:37am 22 May 2024 - 🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺
Being entirely new to this space and trying to review:
- WOW — epic work! 😮👏
- I found the issue summary fairly clear at a high level 👍, although kinda sparse 🫣. A new
json:normal
database Schema API data type makes sense. 👍 - … but what does not make sense (to me, yet) is that the issue summary talks about how this is not for the field level, yet this issue's MR is introducing two new (test-only) field types: see the
JsonBackedPropertiesItem
andMappedPropertiesItem
classes. That seems a contradiction at first sight? I suspect the point is to show how to better achieve the equivalent of https://www.drupal.org/project/json_field → with these new core APIs? If so: great! But could you then also explain/show why this is better? Is it better because less code is needed? Fewer hacks? Deeper integration? - Related: the issue summary uses the term "data type" many times, but approaching this issue with an Entity/Field/Typed Data API mindset, that reads like a new Typed Data
DataType
plugin, for a newjson
data type. Can you please disambiguate it in the issue summary and explicitly mention how it's different? - I find the current change records to be too sparse to help me understand the before vs after.
- Could you please add a concrete use case to
https://www.drupal.org/node/3389683 →
? And perhaps you could even add such an index on
extracted_value
forMappedPropertiesItem
? That'd make it much clearer! 😄 - Could you please add concrete examples to
https://www.drupal.org/node/3389682 →
, especially for the new
::jsonCondition()
? (Which is also not mentioned in the issue summary?) 🫣
- 🇧🇪Belgium wim leers Ghent 🇧🇪🇪🇺
It only took a few more weeks for Experience Builder to run into things where having this in Drupal core would make a huge difference: 📌 Ensure querying JSON nested values when parent keys are unknown is possible in all supported databases Active .
Tweaking the issue metadata to raise the visibility of this issue/MR 🤞
- 🇬🇧United Kingdom catch
but what does not make sense (to me, yet) is that the issue summary talks about how this is not for the field level, yet this issue's MR is introducing two new (test-only) field types: see the JsonBackedPropertiesItem and MappedPropertiesItem classes. That seems a contradiction at first sight?
Pretty sure the difference is this:Field types can have multiple columns, e.g. you could make a field type that has a boolean and varchar column.
So 'JSON at the schema level not field level' means support for defining a JSON column in a field type. i.e. you could have a field with a boolean and JSON column, and this is handled automatically from information in the field type definition.
json_field module on the other hand provides a field type that stores in JSON - the JSON support is limited to that field type specifically. I'm sure it makes it easier to do the same for another custom field type but it's a different thing.
- 🇺🇸United States bradjones1 Digital Nomad Life
but what does not make sense (to me, yet) is that the issue summary talks about how this is not for the field level, yet this issue's MR is introducing two new (test-only) field types: see the JsonBackedPropertiesItem and MappedPropertiesItem classes. That seems a contradiction at first sight?
Re: #71 - the big difference between this feature-set and what you have in
json_field
is querying. As you can see from their "native" field types, it's rather trivial to tell your database to create a JSON(B) column and stick some JSON in it. But, you can't query nested values. As Wim's linked issue reveals... that's kinda the point of storing data as JSON, otherwise it's just another string. And as this issue's long history illustrates, that's not trivial. But it does unlock a lot of potential to be able to straddle the line between relational database models and JSON documents where you can query arbitrarily-nested values directly. (Yes, I know it may be less performant without an index, but those are also easy to make and querying with a table-scan isn't the worst sin in the world for most sites.)This issue doesn't ship with a field type out of the box because, IMO, that's rather boring - most end-users aren't eager to edit JSON on the front end. It's anticipated that the majority of use-cases for this early on would be things like experience builder or layout builder or your-custom-field that contains a JSON document you get from wherever, e.g. an API. This is expert-mode stuff that enables novice stuff through a specific implementation. That's why the map item is a test fixture.
- 🇨🇦Canada gapple
My interest in this issue is for Reporting API → module, being able to store Content Security Policy violation reports in their original format and then being able to filter them.
- 🇩🇪Germany c-logemann Frankfurt/M, Germany
Firs of all I like to thank @bradjones1 for all the work you've already done on this issue. As soon as I can I like to help with testing in our MariaDB setup.
I have a similar plan like @gapple for the "signal" module with unknown mixed structured data which also like to preserve at it comes from external data sources. With unknown I want to say that this should not be limited to a specific external system as long as it can "report" with json. I will check Reporting API module and maybe integrate.
Additionally I think about some modules where I like to enable some kind of config options content which can be managed by group members. Beside to avoid giving normal users access to Field API it would be a nightmare of multiple field groups with a complete overhead of field schemas to manage this flexibility. Just using a json base field type could just store this user-config as json and it would be searchable when this feature is rolled out.
- Status changed to Postponed
3 months ago 12:19am 29 October 2024 - 🇺🇸United States bradjones1 Digital Nomad Life
This is so close but a lot of this code is also in 📌 Allow field types to control how properties are mapped to and from storage Needs work which is getting focused review, so let's get that sorted out.
- 🇩🇪Germany c-logemann Frankfurt/M, Germany
@bradjones1 Thanks again for all the work you doing on this issue. But this is also a big change which isn't easy to understand fast. So I have a question about compatibility between Mysql and Mariadb:
Following their documentation Mariadb has implemented a method to convert "json" tables on import. Is there a similar support on Mysql?
Or al little bit more complex, Is the current approach is supporting situations like this:
Start a project with SQLlite with project browser etc. (see starhot initiative). Install a module with json field. Export database to a Dev-System with MariaDB and then bring it to live with a Mysql Setup etc. - 🇺🇸United States bradjones1 Digital Nomad Life
Re: #76 thanks for the kind words.
I'm not really sure how to address the questions about data migration. AFAIK there is no statement of policy or procedure for an official path to migrate Drupal sites from one database type to another. Given there is a lot of compatibility (but not perfect...) between SQL syntax, it's likely that basic migrations from one db type to another will largely work, but this is an area where support and syntax does differ.
I have asked the question in the course of developing this feature - do we need to support moving from one DB to another? And the answer has so far been mostly silence. I don't think this is a common workflow. That said, your example of in-browser Starshot demo being exported from, say, SQLite to MySQL or MariaDB is a concrete example. I wonder if @mglaman has thoughts on that.
All this being said, I don't think any of this is a particular reason to (further...) delay getting this into core. Supporting something that has never been explicitly supported or inventing the equivalent of a commit gate over this would not be a good move. (Not saying you're suggesting this.)
- 🇩🇪Germany c-logemann Frankfurt/M, Germany
@bradjones1 Thanks for clarification.
The SQLite > MySQL migration ist just getting more important because of Drupal CMS. But this was always a question of migration.
Regarding MYSQL and MariaDB and the problem with growing differences are not in the focus of the Drupal community I think. After the fork there was no migration needed the dumps where just interoperable for a very long time. In my personal experience it's since MySQL 8 it became harder to just use both variants parallel.
On the database requirements page → "we" are still naming them together as "MySQL, MariaDB, or Percona Server (Recommended)". So I can argue the other way around like: We are currently not telling the users that they are (meanwhile) different kind of databases that needed migration.Because I belief the differences of MariaDB and MySQL will grow in future I don't think we should put energy in supporting a direct interoperable usage of both. So migration is something we need in future between MySQL and MariaDB. I don't want to block anything here but it's clear that the JSON support is one of the things which needs at least documentation about how to handle DB migration. And maybe it would be good to think about this change if there is something we can do make migration between databases easier. Maybe we can tackle two problems at once if we provide an list of Json fields which can be used to show a warning on status page and can be used to export via drush as json list which can be used by commandline database tools.
So in my opinion the migration itself is something that should be managed outside the core with tools like drush. As long as the code is working the same way after migrating the database, everything is fine. - 🇬🇧United Kingdom catch
It's my understanding that the Drupal CMS demo will use core's database export format (which is PHP), the one we use for testing fixtures at the moment - because going from sqlite to MySQL is just not possible at the moment.
I have asked the question in the course of developing this feature - do we need to support moving from one DB to another? And the answer has so far been mostly silence.
I haven't seen that question asked - is there an issue for it?
MariaDB and MySQL not being interoperable is interesting. My guess is that quite a few people develop sites with mariadb or mysql locally mismatching what's used on production because it currently works, so it would definitely need to be documented at least that you can no longer do this.
- 🇫🇷France andypost
Reminds me about https://www.drupal.org/project/dbtng_migrator →