EntityQuery returns incorrect results for notExists condition where one translation matches but others do not

Created on 27 December 2017, about 7 years ago
Updated 8 February 2023, almost 2 years ago

REPRODUCTION STEPS:

1. Create and publish a content entity.
2. Add a translation for the node and publish it also
3. Using adminer or mysql CLI examing the node_field_data for the node.

select * from node_field_data where nid=17621

For example,

nid	vid	type	langcode	title	uid	status	created	changed	promote	sticky	revision_translation_affected	default_langcode	moderation_state	content_translation_source	content_translation_outdated	rh_action	rh_redirect	rh_redirect_response	published_at	publish_on	unpublish_on
17621	1177975	article	de	Cloud, AI and integration for today’s digital economy	1	1	1514324112	1514324319	0	0	1	0	published	en-us	0	bundle_default	NULL	301	1492179338	NULL	NULL
17621	1177975	article	en-us	Cloud, AI and integration for today’s digital economy	0	1	1492179338	1514324128	0	0	NULL	1	published	und	0	bundle_default	NULL	301	1492179338	NULL	NULL

This looks good and is what is expected: A published version of the source language and a published version of the translation.

4. Add a new optional text field to the content type and mark it as "users can translate this field"
5. Edit the published translation, fill in a value for the field and publish the translation

The expectation is that now there is a published source language document without a value for the new field and a published translated language document with a value for the new field. Examining the tables manually, we can confirm this is the case.

SELECT * FROM `node__field_article_new` LIMIT 50

 Modify	bundle	deleted	entity_id	revision_id	langcode	delta	field_article_new_value
 edit	article	0	17621	1177976	de	0	GERMAN NEW

So far so good.

Now, we want to write an entity query that will return to us all entities (including translations) that do *NOT* have a value for field_article_new. So we try something like this.

use Drupal\Core\Database\Database;

$query = \Drupal::entityQuery('node')
  ->condition('type', 'article')
  ->notExists('field_article_new')
  ->execute();

echo "REVID     ENTITYID\n";
foreach ($query as $revid => $nid) {
  echo "$revid      $nid\n";
}

echo count($query) . " results found\n";

EXPECTED BEHAVIOR:

We would expect that the published source language doc will match since it does not have a value yet for this new field and that the published translated language doc will not match since it now has a value for this field.

ACTUAL RESULT:

The published source language doc is not returned in the query results.

PRELIMINARY FINDINGS:

Examining the query that entityQuery produces behind the scenes for us shows the following for the above entityQuery:

SELECT base_table.vid AS vid, base_table.nid AS nid
FROM
{node} base_table
INNER JOIN {node_field_data} node_field_data ON node_field_data.nid = base_table.nid
LEFT JOIN {node__field_article_new} node__field_article_new ON node__field_article_new.entity_id = base_table.nid
WHERE (node_field_data.type = :db_condition_placeholder_0) AND (node__field_article_new.field_article_new_value IS NULL)

With respect to the entity under test, simplifying the query for informational/demonstration purposes (select * instead of just nid/vid and hardcode our article bundle), and pop it in adminer, we notice that our node 17621 is not in the results at all.

SELECT *
FROM
node base_table
INNER JOIN node_field_data node_field_data ON node_field_data.nid = base_table.nid
LEFT JOIN node__field_article_new node__field_article_new ON node__field_article_new.entity_id = base_table.nid
WHERE (node_field_data.type = 'article') AND (node__field_article_new.field_article_new_value IS NULL)

Now if we take off the IS NULL condition and instead add in node_field_data.nid = 17621 so that we are just looking at our node of interest, we see the following.

nid	vid	type	uuid	langcode	nid	vid	type	langcode	title	uid	status	created	changed	promote	sticky	revision_translation_affected	default_langcode	moderation_state	content_translation_source	content_translation_outdated	rh_action	rh_redirect	rh_redirect_response	published_at	publish_on	unpublish_on	bundle	deleted	entity_id	revision_id	langcode	delta	field_article_new_value
17621	1177976	article	5972bfd5-4210-4d5a-b527-13c78017fcea	en-us	17621	1177976	article	de	Cloud, AI and integration for today’s digital economy	1	1	1514324112	1514324319	0	0	1	0	published	en-us	0	bundle_default	NULL	301	1492179338	NULL	NULL	article	0	17621	1177976	de	0	GERMAN NEW
17621	1177976	article	5972bfd5-4210-4d5a-b527-13c78017fcea	en-us	17621	1177976	article	en-us	Cloud, AI and integration for today’s digital economy	0	1	1492179338	1514324128	0	0	NULL	1	published	und	0	bundle_default	NULL	301	1492179338	NULL	NULL	article	0	17621	1177976	de	0	GERMAN NEW

Now the problem is evident. Because of the LEFT JOIN condition in the query, it *looks* like the published EN article has a value for field_article_new_value and so it is not returned.

POSSIBLE RESOLUTION:

IMO, the problem here is the join condition does not also join on langcode. If we modify the entityQuery sql by adding the langcode in the LEFT JOIN, it fixes the problem:

SELECT *
FROM
node base_table
INNER JOIN node_field_data node_field_data ON node_field_data.nid = base_table.nid
LEFT JOIN node__field_article_new node__field_article_new ON node__field_article_new.entity_id = base_table.nid AND node__field_article_new.langcode = node_field_data.langcode
WHERE (node_field_data.type = 'article') AND (node__field_article_new.field_article_new_value IS NULL)

Now, the published EN row correctly shows NULL as its value for field_article_new_value. If we then go back to the original query entityQuery() uses and just add the join condition, our node 17621 shows up in the results.

Of course even if entityQuery did this and the entity/revision was returned, we still would not know which translation(s) for that revision matched and which did not since langcode is not reflected in the result has. Not sure how that would be indicated. Maybe by making the results keyed by vid and then langcode and then nid - although this would be a breaking change.

Perhaps entityQuery is not the right tool to use for this job? The goal being "Give me all published 'foo' entities (including translations) that do not yet have a value for the new field 'foo_new'. Is there a different way to go about this? We try not to write sql queries directly against the tables, favoring the API instead, but perhaps in this case it is our only route?

🐛 Bug report
Status

Active

Version

9.5

Component
Entity 

Last updated about 8 hours ago

Created by

🇺🇸United States sdewitt

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.71.5 2024