- Merge request !2378Issue #2875033: Optimize joins and table selection in SQL entity query implementation โ (Open) created by rlmumford
- ๐บ๐ธUnited States smustgrave
Seems there are still some open questions to answer before review.
#39 and #41 should be answered (added to remaining tasks)
- ๐ท๐บRussia Chi
Faced this issue with "single-table" entity type. Entity query joined base table to itself which caused bad performance. Patch #16 works well on Drupal 10.0.
- ๐ท๐บRussia Chi
Patch #16 works well on Drupal 10.0.
Actually it does not. EFQ with entity references produces wrong SQL join. See comment #30.
- ๐ณ๐ฑNetherlands spadxiii
We have been using the mr in #37 for a while, but there are some issues with it: when using multiple conditions on the same column in an entity-query, the same table is joined several times.
I've fixed this with another if-statement in the patch that checks if the table is already joined (with the same type).
@spadxiii Please make the change to the merge request, rather than submitting a patch.
- ๐ณ๐ฑNetherlands spadxiii
I seem to have attached the wrong patch. Here's the correct one, that works.
- ๐ณ๐ฑNetherlands spadxiii
@solideogloria the mr is quite old and needs to be rebased :\
and when I push, I get an error that: remote: You are not allowed to push code to this project.
So I cannot update the mr. You have to click the "Get Push Access" button at the top of this page.
- First commit to issue fork.
- ๐ฎ๐ณIndia arunkumark Coimbatore
arunkumark โ changed the visibility of the branch 2875033-optimize-joins-and to hidden.
- ๐ฎ๐ณIndia arunkumark Coimbatore
arunkumark โ changed the visibility of the branch 2875033-optimize-joins-and to hidden.
- ๐ฎ๐ณIndia mrinalini9 New Delhi
Hi,
I have tried to create MR for the changes mentioned in patch #48 but was unable to do so because the MR points to branch 9.5.x instead of 11.x. Also, I have tried to create a new branch from 11.x but getting the below error:
Thanks & Regards,
Mrinalini @mrinalini9 This should be helpful for you: https://www.drupal.org/docs/develop/git/using-gitlab-to-contribute-to-dr... โ
- First commit to issue fork.
- ๐ซ๐ทFrance Nixou Toulon
Thanks for this !
Attach is the patch from #48 (2875033-46.patch) rerolled for Drupal 10.3.x and 10.4.x
- ๐บ๐ธUnited States pwolanin
patch #61 is failing for my colleague when filtering with jsonapi on the value of a referenced entity referenced by the main entity.
It's writing the WHERE clause such that it's filtering the main entity to the node ID of the referenced entity.
example:
- main entity house.
- house references a city node
- city references a state node
if I filter houses in jsonapi by state, the SQL where clause is filtering the house node ID by the state node ID.
- ๐จ๐ฆCanada Charlie ChX Negyesi ๐Canada
This issue and ๐ \Drupal\Core\Entity\Query\Sql\Tables causes extremely poor performance when using MariaDB and filtering on multiple relationships in JSON:API Active IMO needs to be consolidated.
- ๐บ๐ฆUkraine HitchShock Ukraine
Hi all.
First of all, I want to thank everyone who is working on this task. It solves the performance problem for big data entities in certain cases.But I also found a possible way to make it better for some scenarios.
We can remove `$type === 'INNER'` from the condition.
If the table is the same, then it doesn't matter which type of join is used. Anyway, the same table will be used.
Removing this condition can be useful for big data queries with base fields of the entity, which are stored in the same table if the data table is the same as the base table for an entity.For example,
- we have a `custom_entity`
- we send a simple entity query to get IDs sorted by uuid
- the default query will beSELECT base_table.id AS id, base_table.id AS base_table_id, custom_entity.uuid AS uuid FROM custom_entity base_table LEFT JOIN custom_entity custom_entity ON custom_entity.id = base_table.id ORDER BY custom_entity.uuid ASC LIMIT 20 OFFSET 0
What is the problem? If custom_entity is a big data entity, then we are trying to join big data to big data, which will take much more time than without 'join'. This impacts performance a lot
If we remove `$type === 'INNER'` part of the condition, the issue will be solved, because the query will be generated like
SELECT base_table.id AS id, base_table.id AS base_table_id, base_table.uuid AS uuid FROM custom_entity base_table ORDER BY base_table.uuid ASC LIMIT 20 OFFSET 0
I added a hidden patch with this fix.
P.S. Please let me know if my opinion is correct or if it has obvious flaws in the context of Drupal core
- First commit to issue fork.
- ๐ฌ๐งUnited Kingdom Driskell
I've attempted to work on this a little and produced the following based on 11.x:
https://git.drupalcode.org/issue/drupal-2875033/-/commit/41ac87795f50465...(Up front disclosure - I tested this as a patch on 10.4.x and this is just a cherry pick for analysis)
Here is an outline:
* For testing, a clone of a Query will now properly clone the conditions and update the attached query on the Conditions. Happy to take this out of this issue into another if it's desired. Essentially at the moment __toString on a Query is going to work with Conditions that still refer to the original Query. It causes problems in debugging mostly.
* getTables on Query was creating a new tables instance, not returning the existing property value. This meant tables added via separate Condition instances (such as nested in an AND or an OR) were not deduplicated - they were effectively creating new joins for every branch. This change keeps getTables for BC but it should be now unused, and it introduces getSqlTables that returns the same object after creating it so all conditions share the same instance, preventing attaching the same table twice
* Specifying different langcode in conditions for a shared table field previously could cause issues as it would reuse the same table join even though it would refer to a different langcode - now it joins separately for each langcode
* When following a reference field, the tables were never shared at all by addNextBaseTable, and this introduces a nextBaseTables property to track and reuse these
* When there is a data table, we don't just forcefully set simple_query to FALSE. Warrants some eyes and testing but it feels this is unnecessary as the table joining code that attaches the data table already sets it to FALSE at the point the table might get attached. In this change it actually only sets simple_query to FALSE if the join of a data table is not going to use a langcode. If a langcode is used, then the data table will provide a single row, simplifying the query.
* Added ability to specify langcode as "__default__" to make it join the data table on default_langcode=1 with simple_query remaining. For me this looks to massively improve query performance and allow us to add indexes in several places on the data table for massive gain. The API here might warrant discussion.A note worth mentioning is that if a table joins as INNER JOIN and the next request is for a LEFT JOIN it will not change it. It will remain as INNER. This is how it was previously when deduplicating of table join was working. This seems fine to me as if it's INNER there's a restriction so loosening it serves no purpose. For the inverse, where it is initially added as LEFT or LEFT OUTER, it also won't change it to INNER if a request comes in. It seems fine to me this as I think from what I can see every join that uses INNER will have a WHERE addition anyway, so the fact it's LEFT doesn't make much difference. It could be "nice" to clean it up to INNER but then you're modifying already added tables in the query and it feels the API change stretches too far.
Generally it looks like the improvements from this experiment are:
* Tables are no longer duplicated in all cases (from what I can see)
* Where multiple shared data table fields are used in conditions, indexes on the data table can now get properly used as they use the same table join, and the optimiser can kick in
* Grouping is now completely avoided for basic queries that don't involve the shared data table
* Sorting by a shared data field that is untranslated can also be massively sped up with a quick index addition and specifying the langcode "__default__" as this ensures it keeps simple query and works on the default langcode, which inherently will have the untranslated value for that field. For me personally this is what I was trying to get to to resolve some slow queries.Would be great to get some feedback while I do some further testing and look at adding some tests etc.