Not using counts in grouped queries results in fundamentally incorrect results

Created on 9 April 2025, 3 months ago

Problem/Motivation

When a view is grouped and filtered on a field on a relation, sbef is fundamentally incapable of correctly determining which filter values are in use, or how many rows each filter value will produce.

Steps to reproduce

Take for example node type A with an unlimited reference to node type B. This reference allows multiple B for each A. B has a field F. (Let's assume it's a single selection)

If you make a view of node type A, with a relation on node type B, then add an exposed filter on field F and enable "Show only used items" sbef will incorrectly hide fields that are present in the result set.

Currently, sbef takes the result from views, loads an entity from each result row, and checks the field on that to determine which fields are in use. This works fine for this example result set:

nodeA  | nodeB  | field1
------ | ------ | ------
nodeA1 | nodeB1 | fieldValue1
nodeA1 | nodeB2 | fieldValue2

However when grouped, the result of the views generated query is like this:

nodeA  | nodeB  | field1
------ | ------ | ------
nodeA1 | nodeB1 | fieldValue1

This works fine for views, since views can still filter field1 = fieldValue2 and will always get nodeA1, but since sbef naively takes nodeB1 and assumes that's the only result, it never discovers that fieldValue2 exists in the result set.

Proposed resolution

In sbef you currently check if a filter value exists, and then only check the count of each value afterwards. I presume this is for performance, but this is backwards. You should check the count first, then you know if it exists (because the count is 0)

This produces the correct results even when grouped, and can be done very quickly since you don't have to load relations for every row. Here's a pseudo SQL example of a views generated query:

SELECT nodeA.nid AS nid, nodeA.vid AS revision
FROM nodeA
JOIN nodeB
WHERE nodeA.status = 1
AND nodeB.field1 IN(fieldValue1, fieldValue2)
AND nodeB.field3 IN(field3Value1)
GROUP BY nodeA.nid, nodeA.vid

In order to get the accurate count of an exposed filter, throw away the filter, group by it, and COUNT(DISTINCT) the fields you grouped by in the original query:

SELECT nodeB.field1 AS field1, COUNT(DISTINCT nodeA.nid, nodeA.vid)
FROM nodeA
JOIN nodeB
WHERE nodeA.status = 1
AND nodeB.field3 IN(field3Value1)
GROUP BY nodeB.field1

SELECT nodeB.field3 AS field3, COUNT(DISTINCT nodeA.nid, nodeA.vid)
FROM nodeA
JOIN nodeB
WHERE nodeA.status = 1
AND nodeB.field1 IN(fieldValue1, fieldValue2)
GROUP BY nodeB.field3

Performance

I think running a (fairly quick) query for each filter is going to be significantly more performant than loading related nodes from every row of the result set, so this should be more performant out of the gate.

There are a few ways to improve performance further but they are significantly more complicated than using a different provider eg. elasticsearch.

User interface changes

"Show option items count" and "Filter items based on filtered result set" would no longer need to be hidden behind the "Show only used items" option

🐛 Bug report

Status

Active

Version

3.0

Component

Code

Created by

🇳🇱Netherlands J V

Live updates comments and jobs are added and updated live.

Comments & Activities

Issue created by @J V

contrib.social Blog FAQ Discussions

Production build 0.71.5 2024