Allow excluding unpublished references from index

Created on 20 February 2020, almost 5 years ago
Updated 10 February 2023, almost 2 years ago

Hello!

Unpublished comments are included into the rendered content and indexed. If I search for words included only in unpublished comments I get search results.

I use only Content datasource and Comments are included as a related entity in the content fields section of the index.

Feature request
Status

Active

Version

1.0

Component

General code

Created by

🇷🇴Romania trudog

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇪🇸Spain eduardo morales alberti Spain, 🇪🇺

    It should be a Search API processsor like the Content Access

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    Composer require-dev failure
  • @eduardo-morales-alberti opened merge request.
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    Composer require-dev failure
  • Issue was unassigned.
  • 🇦🇹Austria drunken monkey Vienna, Austria

    Thanks for starting work on this!
    When the code is ready for review (optimally already including automated tests), please remember to post a patch so the test bot can test it. (Unfortunately, it doesn’t work for issue forks in this project, see #3190024: Problem with test dependencies when testing issue forks .)

  • Status changed to Needs work over 1 year ago
  • 🇪🇸Spain eduardo morales alberti Spain, 🇪🇺

    Ok @drunken monkey, work in progress by the moment

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    Composer require-dev failure
  • 🇪🇸Spain eduardo morales alberti Spain, 🇪🇺

    First version ready I will upload the patch

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    Composer require-dev failure
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    Composer require-dev failure
  • 🇪🇸Spain eduardo morales alberti Spain, 🇪🇺

    Process unpublished filters from facets "extra_data" correctly.

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    Composer require-dev failure
  • 🇪🇸Spain eduardo morales alberti Spain, 🇪🇺

    Add doc comment to method getUnpublishedEntitiesFromValues and define an undefined variable.

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    Composer require-dev failure
  • 🇪🇸Spain eduardo morales alberti Spain, 🇪🇺

    Do not extract fields on the process result, only returns the fields that were previously set or extracted.

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    Composer require-dev failure
  • 🇪🇸Spain eduardo morales alberti Spain, 🇪🇺

    Not necessary to set the field on the result item because the field class maintain the reference.

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    Composer require-dev failure
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    Composer require-dev failure
  • 🇪🇸Spain eduardo morales alberti Spain, 🇪🇺

    After a lot of testing and development, seems that changing the result on the post-result process is not a valid solution because the field values are not available, and the way to get them alters the result.

    Using the result item, the get fields method, will return null if they were not extracted yet, and if you force the extraction it will alter the result, so I will try to develop it for the indexing stage instead of the post-result process stage.

    https://git.drupalcode.org/project/search_api/-/blob/8.x-1.x/src/Item/It...

      /**
       * Returns the item's fields.
       *
       * @param bool $extract
       *   (optional) If FALSE, only returns the fields that were previously set or
       *   extracted. Defaults to extracting the fields from the original object if
       *   necessary.
       *
       * @return \Drupal\search_api\Item\FieldInterface[]
       *   An array with the fields of this item, keyed by field identifier.
       */
      public function getFields($extract = TRUE);
    
    
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    Composer require-dev failure
  • 🇪🇸Spain eduardo morales alberti Spain, 🇪🇺

    I will try to cover it on the indexing instead of the postprocess:

    • Create a Search API field processor - Ok
    • Reindex item when the taxonomy status change
    • Use a cron process to do it
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    Composer require-dev failure
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    Composer require-dev failure
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    Composer require-dev failure
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    Composer require-dev failure
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    Composer require-dev failure
  • Status changed to Needs review over 1 year ago
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    536 pass, 7 fail
  • 🇪🇸Spain eduardo morales alberti Spain, 🇪🇺

    Added new patch.
    Steps to test:

    • Have an index with content with referenced entities
    • Enable the processor with label "Exclude field references entities unpublished from results."

    • Configure the processor, you should choose the fields and the content (can be aggregated fields)

    • As a recommendation, add the field status from the referenced entity on the index, for example field_category:entity:status, this will reindex the content when the reference entity changes its status.

    • Show the referenced field on an indexed view
    • Reindex all items
    • Check on the view that the unpublished entities are not present
    • Check it with different entity types, languages and status
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    Composer require-dev failure
  • Status changed to Needs work over 1 year ago
  • 🇪🇸Spain eduardo morales alberti Spain, 🇪🇺

    I am not sure why the test failed, any clue @drunken monkey?

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    Composer require-dev failure
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    Composer require-dev failure
  • Status changed to Needs review over 1 year ago
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    544 pass
  • 🇪🇸Spain eduardo morales alberti Spain, 🇪🇺

    Excluded also the not translated entities.

  • 🇪🇸Spain eduardo morales alberti Spain, 🇪🇺

    Track changes in references track all translations which is not efficient, the performance can be improved if it tracks only the updated language translation or all translations if the updated entity is the original translation.

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    Composer require-dev failure
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    544 pass
  • 🇪🇸Spain eduardo morales alberti Spain, 🇪🇺

    Track only the changed translation if is not the default translation to index faster the stale entities.

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    Composer require-dev failure
  • 🇪🇸Spain eduardo morales alberti Spain, 🇪🇺

    Get the original translation on the TrackerHelper.php on the method trackReferencedEntityUpdate.

        // Original entity, if available.
        $original = $deleted ? NULL : ($entity->original->getTranslation($entity->language()->getId()) ?? NULL);
    

    Because on the ContentEntity::getAffectedItemsForEntityChange will try to compare the original entity with the current entity and can be a translation.

          // We trigger re-indexing if either it is a removed entity or the
          // entity has changed its field value (in case it's an update).
          if (!$original_entity || !$items->equals($original_entity->get($relation_info['field_name']))) {
    
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    389 pass, 98 fail
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update over 1 year ago
    Composer require-dev failure
  • Status changed to Needs work over 1 year ago
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.5 + Environment: PHP 8.1 & pgsql-10.12
    last update over 1 year ago
    544 pass
  • Status changed to Needs review over 1 year ago
  • 🇪🇸Spain eduardo morales alberti Spain, 🇪🇺

    Ready to review

  • 🇫🇷France xavier.masson Haute-Normandie

    I've tried the patch but doesn't fully work as expected as the patch only work with node entity type instead all the entity types fieldables. I'll try to find time to add support on all the entities types.

  • Status changed to Needs work about 1 year ago
  • 🇦🇹Austria drunken monkey Vienna, Austria
  • 🇩🇪Germany mkalkbrenner 🇩🇪

    It is great to see some movement here.

    But beside the fact that the current patch only supports nodes, there're more issues.
    The processor seems to fail if the field isn't just a referenced entity, but a field of that entity.
    And the processor could not handle deeply nested references like entity 1 references entity 2 which references entity3.

    I think both issues could be solved if the processor "simply" checks the status of all entities along the property path.

  • Status changed to Needs review about 1 year ago
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update about 1 year ago
    Patch Failed to Apply
  • 🇩🇪Germany mkalkbrenner 🇩🇪

    Here's a proof of concept. Using a processor that acts on the entire property path isn't easy.
    I created a patch that checks the published state for all entities across a property path as soon as the EntityStatus processor is activated.

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update about 1 year ago
    Patch Failed to Apply
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update about 1 year ago
    541 pass, 2 fail
  • 🇦🇹Austria drunken monkey Vienna, Austria

    Thanks a lot, Markus, that does already look promising!
    Attached is a re-roll that applies to the latest dev version of the module.

    The main problem with this approach is, of course, that it’s super unclean – we add code specific to a single processor to a framework class, even to a service, and now have to pass the index in various places where it shouldn’t be needed. However, passing anything but the index would be even worse, of course, since then the logic of when we want unpublished entities excluded would have to be determined in multiple places in the code – and this place in the code is definitely the best suited, by far, to implement this change, you’re abolustely correct there.
    The probably cleanest way of implementing this would be to add a general method for processors to influence the field extraction process, but that’s of course way overkill just for solving this one problem. So, not really sure how to proceed. I’ll have to think about the least awful way forward here, I guess.

    Anyways, your patch is already very helpful, thanks again!
    If others interested in this issue could give it a try, that would be great. Even better would be if someone could already write some tests, since those will be useful even in case we go with an entirely different approach (which I don’t think likely, but still).

  • The last submitted patch, 42: 3114792-42--allow_excluding_unpublished_references.patch, failed testing. View results
    - codesniffer_fixes.patch Interdiff of automated coding standards fixes only.

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & sqlite-3.27
    last update about 1 year ago
    545 pass
  • 🇦🇹Austria drunken monkey Vienna, Austria

    Fixed test fail and code style.

  • 🇩🇪Germany mkalkbrenner 🇩🇪

    Even if this isn't a beautiful architecture, I would like to see that code to be committed. We could comment it and declare it as intermediate solution to solve this issue.

    We use this patch in production. It solved several issues in our searches. For example nothing was highlighted because the snippet belongs to some unpublished content. Or a facet item that leads to no results because the raw value is only used by not yet published content.

  • Status changed to Needs work about 1 year ago
  • 🇦🇹Austria drunken monkey Vienna, Austria

    OK, unless I can think of a better solution I will commit the current one, at least as a temporary workaround.
    However, this still needs tests either way, so NW for that.

  • 🇷🇺Russia qzmenko Novosibirsk

    We need to check if $entityAdapter is not null. It can be null if target entity not exists.

    Right now I recieve error during reindex:
    Error: Call to a member function getValue() on null in function Drupal\search_api\Utility\FieldsHelper->extractFields() (line 136 / /app/docroot/modules/contrib/search_api/src/Utility/FieldsHelper.php).

    Sorry, I don't have possibility to create an interdiff right now.
    Patch attached.

  • Ok so in the patch, that I attach, I got quite a few suspect things, so I will start with some disclaimers.

    Disclaimer 1: Please do not use this patch in production, unless you are sure it will not cause you any of the problems I will mention below. I have on purpose hidden the patches from display.

    Disclaimer 2: Some existing tests will fail, I have not taken the time to maintain them yet, since I would like some feedback first.

    With those out of the way, here is what I did.

    @qzmenko I based my patch on #44 and not on #47 because it is not clear to my when/why you would have a non existent entity if a reference to it exists.
    I am sure there are valid cases, I just don't have any in mind atm, and I am not sure how we should handle those cases. In #47 you simply let the FieldHelper do as it would before, which means index the fields normally, which makes no sense to me if the referenced entities, for said field, do not exist.

    @mkalkbrenner With #44 which is basically #41, The described functionality worked fine for nested field variations:entity:whatever:whatevercode but not for direct fields e.g. as simple taxonomy field field_brand.
    In my understanding, the first time extractFields is called, the ComplexDataInterface $item is the actual indexed entity. We don't care about that since it will be filtered out later, or should have already been filtered by the actual processor (Not sure atm at what point this happens, but does not rly matter). Then for that entity, the selected fields to index are split between nested and direct fields. For nested fields we have a recursive call, which then filters those fields based on status.
    For direct fields we don't do that, instead extractField is called. I extracted your logic in a helper function and used it in extractField as well... and here I had some thoughts.

    @drunken monkey I refrained from injecting the $index variable to the extractField function, since it already has FieldInterface $field as an argument, and we should be able to extract the index from there. I had to pass the index to the extractFieldValues function though, which does the actual work. Now here I got some issues.
    This funciton is used by stuff I could not test atm in my code, and I think would have weird side effects.

    E.g. It is used in AddHierarchy processor. I don't know what will happen if you stop indexing something that has published children. How do we handle the children? It would make sense to stop displaying them if their ancestor is not displayed, but maybe some would argue that it's ok to keep showing them without the ancestor. I don't know how something like that is already handled in the processor, and do not atm have a working environment with matching data to test this, so I made the index variable nullable in extractFieldValues and my helper function, which leads extractFieldValues calls to gracefully keep doing what they were doing until now.
    Maybe I am overthinking it, maybe not, idk.

    Same for the SearchApiFieldTrait::898 in `extractPropertyValues`... Did not handle that, since I am not sure If we want to.
    We could either decide for them here, or handle those cases in their separate follow ups, I am not so familiar with the entire codebase yet.

    Lastly I completely ignored existing tests for the extractFieldValues, until we decide if this is a right approach. Give me some feedback, and I could get to them after.

    TLDR:
    - Stuff on #47 were not clear to me, we should address this point, but I left it our for now.
    - Direct fields were not handled by the proposed logic only nested... Think I cover this in the attached patch now.
    - Did not cover every call of the extractFieldValues function, to avoid possible side-effects.

  • 🇦🇹Austria drunken monkey Vienna, Austria

    Thanks a lot for your work on this!
    Since we switched this module to use issue forks a few months ago, I created a new MR with the last two patches, plus a little clean-up by myself.

    In general, I do like your approach, so please feel free to work further on it. Once we have tests, we can hopefully also verify that #44 really didn’t cover some cases which your patch does.

    Some more remarks:

    • While I like the separate shouldExtractReference() method, I don’t think it has to be on the interface. It can just be protected, in my opinion. (Also causes less trouble with backwards-compatibility.)
    • Very good point about $field already containing a reference to an index. I think we should also use that approach to avoid the explicit $index parameter for extractFields(), if possible. Now that I think about it, it’s really impossible to add it as anything but the last parameter, so we’d have needed to change that anyways. (Might cause issues with some of the callers, though – let’s see.)

    Anyways, still in desperate need of test coverage.

  • 🇦🇷Argentina drupal-son

    I have this exact need.
    I'm using JSONAPI on an index indexing nodes with field realation to paragraphs.
    Some paragraphs could be unpublished and I would like to JSONAPI avoid representing these unpublished node on the result.

    FYI - As the consumer user don't have permissions for reading unpublished content, then JSONAPI only returns the entity type and the UUID without any other extra information, but at on the `meta` -> `omitted`section JSONAPI adds references of entites returned that don't have enough permissions on the `data` section.

    I'm trying to skip search api to index the unpublished related content, then JSONAPI not returning unpublished entities and no mention about them on the `omitted` section.

    Is the latest patch file tested or ready to be used on prod or merged into https://www.drupal.org/project/search_api/releases/8.x-1.x-dev ?

  • 🇦🇷Argentina drupal-son

    @eduardo-morales-alberti @qzmenko @drunken-monkey @mkalkbrenner @marios-anagnostopoulos

    I tested the patches #44 and #48 on the search_api 8.x-1.34 (not 1.35 due to https://www.drupal.org/project/search_api/issues/3454939 🐛 Declaration of Drupal\search_api_solr\Plugin\search_api\backend\SearchApiSolrBackend::__sleep() must be compatible Active ) running on Drupal 10.2.5 using Acquia Search API Solr Server.
    I have a search api index with multiple entity reference fields (pointing to content types and taxonomies).
    BTW, is Paragraph entity type supported?
    I have the option "Track changes in referenced entities" enabled.
    I still can't see the "Exclude field references entities unpublished from results." processor on my index.

    Does it matter what Search API server is used?
    What I am missing?

  • 🇳🇿New Zealand jonathan_hunt

    The patch in #48 works for me on Search API 8.x-1.35 but since my use case is to avoid indexing unpublished media entities referenced from a node, I needed to alter shouldExtractReference() to check for EntityAdapter, not just EntityReferenceItemInterface, e.g.:

      public function shouldExtractReference(TypedDataInterface $reference_item, ?IndexInterface $index = NULL): bool {
        if ($index?->isValidProcessor('entity_status')) {
          if ($reference_item instanceof EntityAdapter) {
            $referenced_entity = $reference_item->getEntity();
          }
          elseif ($reference_item instanceof EntityReferenceItemInterface) {
            $referenced_entity = $reference_item->get('entity')
              ->getTarget()
              ->getValue();
          }
          if ($referenced_entity instanceof EntityPublishedInterface) {
            return $referenced_entity->isPublished();
          }
          elseif ($referenced_entity instanceof UserInterface) {
            return $referenced_entity->isActive();
          }
        }
        return TRUE;
      }
    
  • 🇦🇹Austria drunken monkey Vienna, Austria

    @jonathan_hunt: Thanks for the feedback, did not know that this could be a direct EntityAdapter instance. But should be no harm in just checking for that, too – thanks.

    Still needs test coverage, please.

Production build 0.71.5 2024