Unable to delete outdated records from the pipelines

Created on 4 December 2023, 7 months ago
Updated 26 January 2024, 5 months ago

Problem/Motivation

Before 📌 Update the index logic and not to reindex whole dataset Fixed , for ES destination, we used to delete the index and create it again. We are missing that option now.

Proposed resolution

Add a new option to the invalid_values field to recreate the destination storage.

Remaining tasks

User interface changes

API changes

Data model changes

🐛 Bug report
Status

Fixed

Version

1.0

Component

Code

Created by

🇵🇰Pakistan jibran Sydney, Australia

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @jibran
  • 🇵🇰Pakistan jibran Sydney, Australia
  • 🇵🇰Pakistan jibran Sydney, Australia

    I think we can just get the existing count from the destination plugin and if the new count is lower than the existing count then we can remove the outdated one after indexing.

    Classifying it as a major bug and re- titling it as this is a regression.

  • 🇵🇰Pakistan jibran Sydney, Australia

    How about this for a start? I have added a few observations and questions.

    1. +++ b/data_pipelines.install
      @@ -319,3 +319,18 @@ function data_pipelines_update_8015(): void {
      +function data_pipelines_update_8016(): void {
      

      Added the hook as the allowed values are stored in the storage settings.

    2. +++ b/modules/data_pipelines_elasticsearch/src/Plugin/DatasetDestination/ElasticSearchDestination.php
      @@ -301,6 +301,28 @@ class ElasticSearchDestination extends DatasetDestinationPluginBase implements C
      +    $index_id = $this->configuration['prefix'] . $dataset->getMachineName();
      

      Because of the machine name I had to pass the whole object. Should I just pass the machine name?

    3. +++ b/modules/data_pipelines_elasticsearch/src/Plugin/DatasetDestination/ElasticSearchDestination.php
      @@ -301,6 +301,28 @@ class ElasticSearchDestination extends DatasetDestinationPluginBase implements C
      +        $result = $this->getClient()->count($index_params_base);
      

      The sample result is something like this:

      [
          "count" => 90,
          "_shards" => [
            "total" => 5,
            "successful" => 5,
            "skipped" => 0,
            "failed" => 0,
          ],
        ]
      
    4. +++ b/modules/data_pipelines_elasticsearch/src/Plugin/DatasetDestination/ElasticSearchDestination.php
      @@ -301,6 +301,28 @@ class ElasticSearchDestination extends DatasetDestinationPluginBase implements C
      +      $this->logger->error("Failed gto get a current count due to @message", [
      

      Needs to fix a typo.

    5. +++ b/src/Destination/DatasetDestinationPluginBase.php
      @@ -82,6 +82,13 @@ abstract class DatasetDestinationPluginBase extends PluginBase implements Datase
      +  public function getCurrentCount(DatasetInterface $dataset): int {
      

      Only override the method for ES and state destinations. We don't need one for NULL but do we need one for file? We have not overridden processCleanup for the file so I thought we don't need to override the count as well.

    6. +++ b/src/Destination/DatasetDestinationPluginBase.php
      @@ -82,6 +82,13 @@ abstract class DatasetDestinationPluginBase extends PluginBase implements Datase
      +    return 0;
      

      Should this be -1, NULL, or 0? I think 0 is the correct representation here.

    7. +++ b/src/Entity/Dataset.php
      @@ -179,7 +179,7 @@ class Dataset extends ContentEntityBase implements DatasetInterface {
      -        DatasetInterface::INVALID_REMOVE => new TranslatableMarkup('Remove records that are now invalid'),
      +        DatasetInterface::INVALID_REMOVE => new TranslatableMarkup('Remove records that are now invalid and outdated'),
      

      This change makes the intentions of the operation clearer.

    8. +++ b/src/Entity/Dataset.php
      @@ -335,6 +338,11 @@ class Dataset extends ContentEntityBase implements DatasetInterface {
      +        if ($count < $original_size[$destination_id] && $this->getInvalidValuesHandling() === self::INVALID_REMOVE) {
      

      Do we need to add the invalid value check? Should we not be doing this in either situation?

    9. +++ b/src/Entity/Dataset.php
      @@ -335,6 +338,11 @@ class Dataset extends ContentEntityBase implements DatasetInterface {
      +          $queue->createItem(new ProcessingOperation(ProcessingOperationEnum::PROCESS_CLEANUP, $destination_id, range($count, $original_size[$destination_id])));
      

      Should it be $count+1 or just $count is fine? Needs a test to verify.

  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & MySQL 8
    last update 7 months ago
    69 pass
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.x + Environment: PHP 8.1 & MySQL 8
    last update 7 months ago
    70 pass
  • 🇵🇰Pakistan jibran Sydney, Australia

    Added some tests to fix the logic but still to add failing tests.

  • Status changed to Needs review 6 months ago
  • 🇵🇰Pakistan jibran Sydney, Australia

    This is ready for a review.

  • Status changed to RTBC 6 months ago
  • 🇦🇺Australia larowlan 🇦🇺🏝.au GMT+10
  • 🇵🇰Pakistan jibran Sydney, Australia

    Marking it RTBC as per the PR comments from Lee.

    • jibran committed e848db21 on 1.x
      Issue #3405873 by jibran, tinny, larowlan, mortim07, nterbogt: Unable to...
  • Status changed to Fixed 6 months ago
  • 🇵🇰Pakistan jibran Sydney, Australia

    Merged the MR to 1.x

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.69.0 2024