- πΊπ¦Ukraine vlad.dancer Kyiv
I have the same problem (Drupal 9.5.11). Thank you @Grayside for identifying it!
For the moment I removed sorting on paged query to avoid duplicates.
It is possible to build a seemingly straightforward Entity Query which, if used to "page" through all items, will randomly exclude some nodes, and randomly duplicate others.
Pulling together the elements of this query from the different layers of a class hierarchy, it looks a bit like this:
$query = $this->entityTypeManager->getStorage('node')->getQuery()
->addTag('modulename_pluginname')
->condition('type', 'article')
->condition('status', 1)
->sort('changed', 'DESC')
->range($offset, $limit);
if (!empty($title)) {
$query->condition('title', $this->request->get('titleContains'), 'CONTAINS');
}
This generates queries such as:
SELECT base_table.vid AS vid, base_table.nid AS nid, max(node_field_data_2.changed) AS expression
FROM @node base_table
INNER JOIN @node_field_data node_field_data ON node_field_data.nid = base_table.nid
LEFT JOIN @node_field_data node_field_data_2 ON node_field_data_2.nid = base_table.nid
WHERE
(node_field_data.type = :db_condition_placeholder_0)
AND (node_field_data.status = :db_condition_placeholder_1)
GROUP BY base_table.vid, base_table.nid
ORDER BY expression DESC
LIMIT 100
OFFSET 0
Note that for purposes of my query, revisions are irrelevant.
This query powers a customized REST plugin to generate lists of nodes. A script crawling through the entire "collection" by incrementing the offset and checking the data found this output based on a title string search to focus on a more easily measured subset of the data:
INFO :: Searching for 'X1234'
Node 21403 "X1234: Title A": 1 instances.
----===> Position 64 on Page 1
Node 20677 "X1234: Title B": 2 instances.
----===> Position 24 on Page 9
----===> Position 10 on Page 10
Node 20490 "X1234: Title C": 2 instances.
----===> Position 33 on Page 11
----===> Position 8 on Page 12
Node 19680 "X1234: Title D": 1 instances.
----===> Position 33 on Page 20
Node 19554 "X1234: Title E": 2 instances.
----===> Position 90 on Page 20
----===> Position 53 on Page 21
Node 19283 "X1234: Title F": 1 instances.
----===> Position 86 on Page 23
Node 18919 "X1234: Title G": 2 instances.
----===> Position 88 on Page 27
----===> Position 31 on Page 28
Node 18397 "X1234: Title H": 1 instances.
----===> Position 5 on Page 33
Node 17989 "X1234: Title I": 1 instances.
----===> Position 54 on Page 37
--------------------
MISSING ITEMS
=> Node 18483 "X1234: Title J": NOT FOUND.
=> Node 19191 "X1234: Title K": NOT FOUND.
Note that missing items were identified by separately querying the database for all nodes with the matching "X1234" string, which would be expected in this query as published items.
When I remove the ordering condition by simply deleting the sort, it changes the query to:
SELECT base_table.vid AS vid, base_table.nid AS ned
FROM @node base_table
INNER JOIN @node_field_data node_field_data ON node_field_data.nid = base_table.nid
WHERE
(node_field_data.type = 'cruise_offer')
AND (node_field_data.status = '1')
GROUP BY base_table.vid, base_table.nid
LIMIT 100
OFFSET 0
After making that change, I tested my crawling script against the resulting resource 3 times, and had another developer cross-check me 3 more times on their machine. (In fact, the developer that suggested removing ->sort() might help). While this problem is erratic, I had not seen it vanish completely before, let alone 6 times in a row on two environments.
As a data integrity problem driven by straightforward developer usage if the API, this seems like it should be a major. Leaving that for the experts to decide. While this is filed in 8.2, since I could not find anything meaningful in the issue queue about this it is reasonable to assume it remains a problem and should be tested against later minor releases.
Correct how the sort is applied in Entity Queries such that data consistency is restored.
Create a test to demonstrate this bug.
Identify a path forward.
N/A
The query result of adding a sort might change.
N/A
Active
11.0 π₯
Enhances developer experience.
Not all content is available!
It's likely this issue predates Contrib.social: some issue and comment data are missing.
I have the same problem (Drupal 9.5.11). Thank you @Grayside for identifying it!
For the moment I removed sorting on paged query to avoid duplicates.