Views relationships with multi-valued entity reference fields invalidate Distinct query option

Created on 19 August 2018, over 6 years ago
Updated 11 September 2024, 2 months ago

When creating a view with a relationship to an entity reference field (to join the referred-to entity in the query), duplicates will be produced if the entity reference field is multi-valued (i.e. cardinality > 1). Users assume that checking the box at Advanced -> Query settings -> Distinct will solve this problem, but it doesn't.

This is because from a database perspective, the records actually aren't duplicates. Each different entity reference in the multi-valued field will produce a different row as the target IDs are different, even though this isn't necessarily reflected in the view itself if not all fields are included. Said another way, the view results may look the same on the front-end, even though they're not on the back-end.

It should be noted that this affects all entity types, whether it be taxonomy terms, nodes, paragraphs, custom entities, etc. There are several duplicates of this issue in various other queues that should be marked as such, with all efforts directed here.

Proposed solution

If the Distinct box is checked in the view configuration, after the query has executed, remove results whose entity IDs are already in the list.

Remaining tasks

  1. Add support for pagers (as per #12).
  2. Add support for aggregation (as per the D7 version of Views Distinct โ†’ ).
  3. Add tests.

Original report

If you have a content type with a multivalue taxonomy term field and add a relationship based on that field in a view (in my case I want to use the term name as an argument) the resulting views result has duplicates if one or more of the nodes has multiple terms in that field.

The DISTINCT keyword does not help, since the tid is added as a field in the SELECT part of the query.

I have the following query:

SELECT DISTINCT node_field_data.sticky AS node_field_data_sticky, node_field_data.created AS node_field_data_created, node_field_data.nid AS nid, taxonomy_term_field_data_node_field_data.tid AS taxonomy_term_field_data_node_field_data_tid
FROM 
{node_field_data} node_field_data
LEFT JOIN {taxonomy_index} taxonomy_index ON node_field_data.nid = taxonomy_index.nid
LEFT JOIN {taxonomy_term_field_data} taxonomy_term_field_data_node_field_data ON taxonomy_index.tid = taxonomy_term_field_data_node_field_data.tid
WHERE (node_field_data.promote = '1') AND (node_field_data.status = '1')
ORDER BY node_field_data_sticky DESC, node_field_data_created DESC

.. which produces duplicates, even though the DISTINCT keyword is present, because the taxonomy_term_field_data_node_field_data.tid makes the duplicate nodes in the results 'distinct'.

Ideally the query should just exclude the field from the SELECT part like this:

SELECT DISTINCT node_field_data.sticky AS node_field_data_sticky, node_field_data.created AS node_field_data_created, node_field_data.nid AS nid
FROM 
{node_field_data} node_field_data
LEFT JOIN {taxonomy_index} taxonomy_index ON node_field_data.nid = taxonomy_index.nid
LEFT JOIN {taxonomy_term_field_data} taxonomy_term_field_data_node_field_data ON taxonomy_index.tid = taxonomy_term_field_data_node_field_data.tid
WHERE (node_field_data.promote = '1') AND (node_field_data.status = '1')
ORDER BY node_field_data_sticky DESC, node_field_data_created DESC

which works an nodes with multiple terms in the field are now only appearing once.

I have couldn't find a way to actually omit the field in the SELECT part of the query, and I'm also not sure, if it could create other issues if removed. But the tid in the select the DISTINCT keyword is useless..

Steps to reproduce.
1. add multivalue taxonomy term field to node type, and add a node with multiple terms selected
2. add a view which uses that field to add a relationship to the term (needed to use the term name as argument)
3. make the view results DISTINCT
4. optionally add the argument (contextual filter) that uses the relationship to filter the views result on the term name.
5. if no argument is given, the node with multiple terms in the field will appear multiple times

๐Ÿ› Bug report
Status

Needs work

Version

11.0 ๐Ÿ”ฅ

Component
Viewsย  โ†’

Last updated about 2 hours ago

Created by

๐Ÿ‡ฉ๐Ÿ‡ฐDenmark prinds

Live updates comments and jobs are added and updated live.
  • Needs tests

    The change is currently missing an automated test that fails when run with the original code, and succeeds when the bug has been fixed.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States komlenic

    +1 for #99. This resolved the issue in my use case on Drupal 9.5.2.

  • Status changed to RTBC almost 2 years ago
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States aharown07

    I ended up going back to #99 because I had a couple of views where I couldn't use the target ID method in #109 (or maybe more accurately, had a couple of views where I couldn't figure out how to use it). In any case there are tradeoffs: with #109, you have to set this up for each view. With #99, you solve the problem with all of them.
    It turned out I have a pretty large number of views with entity reference relationships causing duplicate rows.
    I also no longer see any errors with that patch in place.

    Seems like this qualifies for RTBC at this point?

  • Status changed to Needs work almost 2 years ago
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States danflanagan8 St. Louis, US

    This is definitely not RTBC. It still has the "Needs Tests" tag. I added tests back in #85. I don't believe the tests were ever reviewed and they definitely are not included in the patch in #99.

    I think it would be wise to see if the patch in #99 makes the fail test in #85 pass.

    Back to NW.

  • ๐Ÿ‡ธ๐Ÿ‡ชSweden cruze72

    Has anyone got this to work with Drupal 10? I have duplicates when using relationships on a view for commerce orders:
    Order Item
    User
    (User Profile)

    Here is the sql query:

    SELECT DISTINCT "commerce_order"."order_number" AS "commerce_order_order_number", "commerce_order_item_commerce_order__order_items"."title" AS "commerce_order_item_commerce_order__order_items_title", "profile_users_field_data__profile__address"."address_given_name" AS "profile_users_field_data__profile__address_address_given_nam", "commerce_order"."order_id" AS "order_id", MIN(commerce_order.order_id) AS "order_id_1", MIN(commerce_order_item_commerce_order__order_items.order_item_id) AS "commerce_order_item_commerce_order__order_items_order_item_i", MIN(users_field_data_commerce_order.uid) AS "users_field_data_commerce_order_uid", MIN(profile_users_field_data.profile_id) AS "profile_users_field_data_profile_id"
    FROM
    {commerce_order} "commerce_order"
    LEFT JOIN {commerce_order__order_items} "commerce_order__order_items" ON commerce_order.order_id = commerce_order__order_items.entity_id AND commerce_order__order_items.deleted = '0'
    LEFT JOIN {commerce_order_item} "commerce_order_item_commerce_order__order_items" ON commerce_order__order_items.order_items_target_id = commerce_order_item_commerce_order__order_items.order_item_id
    LEFT JOIN {users_field_data} "users_field_data_commerce_order" ON commerce_order.uid = users_field_data_commerce_order.uid
    LEFT JOIN {profile} "profile_users_field_data" ON users_field_data_commerce_order.uid = profile_users_field_data.uid
    LEFT JOIN {profile__address} "profile_users_field_data__profile__address" ON profile_users_field_data.profile_id = profile_users_field_data__profile__address.entity_id AND profile_users_field_data__profile__address.deleted = '0'
    GROUP BY "commerce_order"."order_id", "commerce_order_order_number", "commerce_order_item_commerce_order__order_items_title", "profile_users_field_data__profile__address_address_given_nam"
    LIMIT 100 OFFSET 0
    

    The output prints every created profile from all orders on the site against every single order. ie 20 orders, 20 profiles thus 400 entries.
    The duplicates are removed as soon as the (User)Profile relationship is removed.

  • ๐Ÿ‡ฎ๐Ÿ‡ณIndia prem suthar Ahemdabad- Gujrat , Jodhpur - Rajsthan

    Re-roll the #99 Patch For 10.1

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States bburg Washington D.C.

    Using patch in #99 with the Distinct and new option enabled and I am still seeing duplicates. I have an "Event" node with a multi-value set of "Session" paragraphs, which have Smart date range fields. I only want to display a single version of a node with the next upcoming session. I was able to engineer a query alter that inserts a subquery to handle the filter. However, I still need the relationship to handle the sorting for me, which brought me to this issue.

  • #3 working perfectly for me instead of applying patch, thanks...

  • Open in Jenkins โ†’ Open on Drupal.org โ†’
    Environment: PHP 8.1 & MariaDB 10.3.22
    last update over 1 year ago
    Custom Commands Failed
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States joshuasosa

    Also just noting that using #3 worked well for my specific use case. But the patch making automatic base fields optional seems promising.

  • ๐Ÿ‡ท๐Ÿ‡ดRomania vasike Ramnicu Valcea

    it seems there are some issues with DISTINCT ...

    for example, it's not taken if there are other elements ... like sorting or some specific fields

    the part !empty($this->fields) of this condition if (empty($this->noDistinct) && $this->distinct && !empty($this->fields)) { ...it "stops" the DISTINCT

    So 2993688-85-FAIL.patch from #85 ... i don't think it will work, even we "fix the SQL".

    About the latest patch

    i would say it could help, but not sure it's the "universal panacea" for multiple value fields relationship.

    However i would change some things like naming and "loop break"

    for example:
    Name: exclude_relationship_base_fields instead of disable_automatic_base_fields
    and
    Code:

          foreach ($entity_information as $info) {
            if (!$exclude_relationship_base_fields || $info['relationship_id'] === 'none') {
              $entity_type = \Drupal::entityTypeManager()->getDefinition($info['entity_type']);
              $base_field = !$info['revision'] ? $entity_type->getKey('id') : $entity_type->getKey('revision');
              $this->addField($info['alias'], $base_field, '', $params);
            }
          }
    

    or something ... to be more clear what we're trying to do ...

    But what if we want to add fields with the relationships ... how this could affect the results ...

    And so on ...

  • ๐Ÿ‡ง๐Ÿ‡ชBelgium flyke

    I have a view of commerce_product entity types with a relation on commerce_product_variation.
    When I add and apply the patch from #99 and check 'Distinct' and 'Disable automatic base field for relationships', the view no longer works and I get this error:

    The website encountered an unexpected error. Please try again later.
    TypeError: Drupal\views\Plugin\views\field\RenderedEntity::getEntityTranslation(): Argument #1 ($entity) must be of type Drupal\Core\Entity\EntityInterface, null given, called in /var/www/html/web/core/modules/views/src/Plugin/views/field/RenderedEntity.php on line 132 in Drupal\views\Plugin\views\field\RenderedEntity->getEntityTranslation() (line 69 of core/modules/views/src/Entity/Render/EntityTranslationRenderTrait.php).
  • ๐Ÿ‡ธ๐Ÿ‡ฐSlovakia coaston

    Any progress?

  • last update 11 months ago
    25,775 pass, 1,803 fail
  • ๐Ÿ‡ฌ๐Ÿ‡ทGreece pappis Greece

    Same problem here. Any proposed solution or patch that will remain when drupal gets updated?

  • The patch (2993688-117) โ†’ with "Distinct" and "Disable automatic base field for relationships" both checked works well for me to deduplicate items, but it breaks the view's autocomplete filters.

  • ๐Ÿ‡ท๐Ÿ‡ดRomania vasike Ramnicu Valcea

    i'm still doubtful about this issue
    i believe whenever relationships are involved ... "AGGREGATIONS" should be used - DISTINCT is not a way for Views generic SQL solution - imho.

    related with my previous comment i noticed something that could improve things about DISTINCT .. in general ... so linked the issue, maybe someone is interested and could help there.

  • Couldn't there be a "client-side distinct" (performed in PHP) that only affects the results shown in the table? That would be different than DISTINCT in the SQL if the query itself contains extra fields behind the scenes.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States azinck

    @solideogloria Then you get into a situation where you have an inconsistent # of results per page (because some are culled by your PHP code) and other complexities that come out of that.

  • Then it could use a query like

    SELECT DISTINCT col1, col2
    FROM (
      SELECT col1, col2, col3, col4
      FROM table
      ...
    )
    
  • ๐Ÿ‡ฉ๐Ÿ‡ชGermany c-logemann Frankfurt/M, Germany

    I just successfully tested patch #117 with core 10.2 and it fixed a view with entity references when using a filter on multiple value field.

  • last update 7 months ago
    29,718 pass, 1 fail
  • ๐Ÿ‡บ๐Ÿ‡ธUnited States aharown07

    #117 working for me on 10.3.2.

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States Danny Englander San Diego

    I ran into the same issue with a reference of multiple taxonomy terms. #117 working for me with 10.2.5. Pager works fine as well as exposed filters. Thank you!

  • Hi, it doesnโ€˜t work for me on 10.1. (I canโ€˜t update core because of an domain access issue).

    My views-page has contextual filter: "Has tax term ID with depth" and "Has tax term ID with depth modifier".
    Then I add a relationship "Taxonomy term on node". Just then I have the field "Tax-Image" available in the fields section.
    Adding this getting duplicates!
    Query-settings donโ€˜t work. Actually hiding the image from display if any of the two options are activated.

    I follow this thread since my first attempt with D8. Now Iโ€˜m still experiencing this issue on D10. Itโ€˜s crucial to get this fixed on my taxonomy driven job-board site migrated from D7.

    regards

  • ๐Ÿ‡ง๐Ÿ‡ชBelgium flyke

    Typically when I try out a workaround for this issue, I create a custom 'Remove duplicate results' filter like this:

    mymodule/mymodule.module:

    function mymodule_views_data() {
      $data['views']['remove_duplicates'] = [
        'title' => t('Remove duplicate results'),
        'filter' => [
          'title' => t('Remove duplicate results'),
          'field' => 'id',
          'id' => 'remove_duplicates',
          'click sortable' => FALSE,
        ],
      ];
    
      return $data;
    }

    mymodule/src/Plugin/views/filter/RemoveDuplicatesFilter.php:

    <?php
    
    namespace Drupal\mymodule\Plugin\views\filter;
    
    use Drupal\Core\Form\FormStateInterface;
    use Drupal\views\Plugin\views\filter\StringFilter;
    
    /**
     * Remove duplicates by adding id field and group by for that field.
     *
     * @ingroup views_filter_handlers
     *
     * @ViewsFilter("remove_duplicates")
     */
    class RemoveDuplicatesFilter extends StringFilter {
    
      /**
       * {@inheritdoc}
       */
      protected function defineOptions() {
        $options = parent::defineOptions();
    
        return $options;
      }
    
      /**
       * {@inheritdoc}
       */
      public function buildOptionsForm(&$form, FormStateInterface $form_state) {
        parent::buildOptionsForm($form, $form_state);
      }
    
      /**
       * {@inheritdoc}
       */
      public function query() {
    
        // Set distinct to TRUE;
        $this->query->distinct = TRUE;
        $this->query->options['distinct'] = TRUE;
    
        /**
         * @var string contains the entity type the view uses.
         */
        $entity_type = $this->view->getBaseEntityType()->id();
        switch ($entity_type) {
          case 'node':
            $entity_id_label = 'nid';
            break;
    
          case 'media':
            $entity_id_label = 'id';
            break;
    
          case 'commerce_product':
            $entity_id_label = 'product_id';
            break;
    
          default:
            break;
        }
    
        // Add identifier field and group by it.
        if (!empty($entity_type) && !empty($entity_id_label)) {
          $this->query->addField($entity_type, $entity_id_label, '', ['function' => 'groupby']);
          $this->query->addGroupBy($entity_id_label);
        }
      }
    }
    

    You need to clear cache first after adding this code, and then add this filter to your view.
    As you can see in the code, this only works for views for node, media and commerce_product entities, but it served its function on several projects here.

  • Hi, thanks @flyke, I tried it out immediately. Iโ€˜m getting the filter in the filter criteria. It asks me to choose an operator.
    Unfortunately none of my selections work to eliminate the duplicates.
    Any further advice?

    regards

    ... having high hope for this :)

  • ๐Ÿ‡ง๐Ÿ‡ชBelgium flyke

    Hi Vincent_Jo, just adding it should be enough, no need to set anything (leave default and just save). Maybe someone can improve my code to clear the settings for that filter.

  • Hi flyke, thanks for the response ... none of the operators worked. Maybe itโ€˜s on the custom image field in this vocabulary? ... oh, just realized, when adding a custom description field it showed up in all four languages, so I get four duplicated items with each description field in another language. Even though I have a translation language filter (Content language selected for page) active. So this could be the issue here in my case.

    well ....

  • ๐Ÿ‡ง๐Ÿ‡ชBelgium flyke

    Vincent_Jo: I listed the entity types that my workaround code works for, taxonomy terms / vocabularies is none of them. Working entity types for my demo code are 'node', 'media', 'commerce_product'. Of course you can add more types in the swich statement inside the query() function if you add the correct db field that contains the id. For everyone else: my apologies for polluting this issue queue. I do think that this custom filter can help others (only as a workaround) who come here looking for a solution until this issue gets resolved.

  • @flyke, thank you very much anyway. Even it didnโ€˜t help in my (strange) case, it surely will help others :)

    best regards

  • ๐Ÿ‡บ๐Ÿ‡ธUnited States jsimonis

    Any update on this? I've tried the patch. It said it patched correctly, but I am not seeing any change on my page. As you can see, I get one item for the image, another for the body.

  • ๐Ÿ‡จ๐Ÿ‡ญSwitzerland mrupsidown

    #117 applies for me on 10.2.2 but I see no change whatsoever to my view results. I have tried with "Distinct" and "Disable automatic base field for relationships" or any combination of the two. I still have the exact same duplicates showing.

    My view is setup to show 18 items. I see one node 8 times, another node 3 times, 2 other nodes 2 times. In other words, only 3 nodes on my page are not duplicates! This is driving me nuts. The only thing I was able to achieve is by using hook_views_pre_render which allows me to remove duplicates but this will break the number of items displayed in the view. I see 7 items instead of 18, which corresponds to every non-duplicated item in my original view result.

  • ๐Ÿ‡จ๐Ÿ‡ญSwitzerland mrupsidown

    I finally got it working by altering the query this way :

    /**
     * Implements hook_query_alter().
     */
    function my_module_query_alter(Drupal\Core\Database\Query\AlterableInterface $query) {
      if ($query->hasTag('my_view_tag')) {
        // Add the nid field with a groupby function.
        $query->addField('node_field_data', 'nid', 'node_field_data_nid', ['function' => 'groupby']);
        // Group results by node ID.
        $query->groupBy('node_field_data.nid');
      }
    }
    

    I have added a specific tag in the View settings so I can identify the views I am after. Not sure whether this is the (temporary) way to go but it seems to work for my use case. I see no more duplicates and the number of results per page is correct.

  • ๐Ÿ‡จ๐Ÿ‡ฆCanada bensti

    @mrupsidown With this query_alter, i get a this error.

    this is maybe related to https://drupal.stackexchange.com/questions/316086/how-do-i-disable-only-... ?

    SQLSTATE[42000]: Syntax error or access violation: 1055 Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'databaseXXX.node_field_data.langcode' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by:

Production build 0.71.5 2024