Optimize joins and table selection in SQL entity query implementation

Created on 2 May 2017, over 7 years ago
Updated 17 September 2024, about 2 months ago

Problem/Motivation

Quite a long time ago, before Drupal 8.0, we made the decision to make Views more intelligent about which entity table to select from by default. Before, we always started of the base table and then almost always joined the data table if it exists, because that's where all the data is (except the UUID).

Unfortunately, we never updated entity query accordingly and it is actually far worse there:

* It always picks the base table first, that was the same as views
* It always adds a join as soon as you start to add conditions. *even* if that happens to be the UUID, then it just joins itself or an entity type without a data table.
* It always joins other entity types through the base table and then adds yet another join even if they have no data table. Because reasons.

Proposed resolution

This is an attempt at making things more sane and possibly considerably more performant.

Just like views, it starts of at the data table if possible, I'm also avoiding the duplicated join by initializing the entity tables (really, joins) mapping table. And I'm trying to improve actual joining as well by skipping the initial join as well.

I've only tested it with a simple example queries, didn't even run the tests yet, I just wanted to start this before I forget about this again (Had the idea of doing this probably a few times already).

My test script:


echo "\nQuery 1:\n";
$ids = \Drupal::entityQuery('node')
  ->execute();
print_r($ids);

echo "\nQuery 2:\n";
$ids = \Drupal::entityQuery('node')
  ->condition('title', 'First node')
  ->execute();
print_r($ids);

echo "\nQuery 3:\n";
$ids = \Drupal::entityQuery('node')
  ->condition('uuid', '20478baa-64e4-4b01-bf68-5ea34e3db78b')
  ->execute();
print_r($ids);

echo "\nQuery 4:\n";
$ids = \Drupal::entityQuery('node')
  ->condition('uid.entity.uid', 1)
  ->execute();
print_r($ids);

(conditions only work on my specific nodes of course). I also added a 'echo $this->sqlQuery . "\n";' so I can see the generated SQL query.

HEAD:

Query 1:
SELECT base_table.vid AS vid, base_table.nid AS nid
FROM 
{node} base_table
Array
(
    [1] => 1
    [2] => 2
    [3] => 3
    [4] => 4
)

Query 2:
SELECT base_table.vid AS vid, base_table.nid AS nid
FROM 
{node} base_table
INNER JOIN {node_field_data} node_field_data ON node_field_data.nid = base_table.nid
WHERE node_field_data.title LIKE :db_condition_placeholder_0 ESCAPE '\\'
Array
(
    [3] => 3
)

Query 3:
SELECT base_table.vid AS vid, base_table.nid AS nid
FROM 
{node} base_table
INNER JOIN {node} node ON node.nid = base_table.nid
WHERE node.uuid LIKE :db_condition_placeholder_0 ESCAPE '\\'
Array
(
    [1] => 1
)

Query 4:
SELECT base_table.vid AS vid, base_table.nid AS nid
FROM 
{node} base_table
INNER JOIN {node_field_data} node_field_data ON node_field_data.nid = base_table.nid
LEFT OUTER JOIN {users} users ON users.uid = node_field_data.uid
INNER JOIN {users_field_data} users_field_data ON users_field_data.uid = users.uid
WHERE users_field_data.uid = :db_condition_placeholder_0
Array
(
    [4] => 4
)

With my patch.

Query 1:
SELECT base_table.vid AS vid, base_table.nid AS nid
FROM 
{node_field_data} base_table
Array
(
    [1] => 1
    [2] => 2
    [3] => 3
    [4] => 4
)

Query 2:
SELECT base_table.vid AS vid, base_table.nid AS nid
FROM 
{node_field_data} base_table
WHERE base_table.title LIKE :db_condition_placeholder_0 ESCAPE '\\'
Array
(
    [3] => 3
)

Query 3:
SELECT base_table.vid AS vid, base_table.nid AS nid
FROM 
{node_field_data} base_table
INNER JOIN {node} node ON node.nid = base_table.nid
WHERE node.uuid LIKE :db_condition_placeholder_0 ESCAPE '\\'
Array
(
    [1] => 1
)

Query 4:
SELECT base_table.vid AS vid, base_table.nid AS nid
FROM 
{node_field_data} base_table
INNER JOIN {users_field_data} users_field_data ON users_field_data.uid = base_table.uid
WHERE users_field_data.uid = :db_condition_placeholder_0
Array
(
    [4] => 4
)

Comparison:
1. Very similar except that my select is on the node_field_data table.
2. No join necessary with my version.
3. This actually requires a join in the new implementation, and this query is currently unfortunately very common (loadByUuid()). But the current implementation joins itself which is likely not much better?
4. A single join instead of *3*

Remaining tasks

* Figure out things that dont' work yet, pretty sure that translations will be one such case as I'm possibly skipping the language condition now in at least some cases on the base table.
* Is this an API change or not?
* Answer #39
* Answer #41

User interface changes

API changes

Generated queries change, which might break query alters?

Data model changes

📌 Task
Status

Needs work

Version

11.0 🔥

Component
Entity 

Last updated about 23 hours ago

Created by

🇨🇭Switzerland berdir Switzerland

Live updates comments and jobs are added and updated live.
  • Performance

    It affects performance. It is often combined with the Needs profiling tag.

Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.71.5 2024