Expire functionality broken

Created on 16 January 2021, over 4 years ago
Updated 19 May 2025, 16 days ago

Problem/Motivation

Since Feeds 8.x-3.0-alpha7 the expire feature was accidentally broken. This happened when doing some refactoring in #2811429: Switch to feed owner during manual import. .

Specifically, FeedsImportHandler used to have the following code:

/**
   * Finishes importing, or starts unfinished stages.
   *
   * @param \Drupal\feeds\FeedInterface $feed
   *   The feed.
   */
  public function batchPostProcess(FeedInterface $feed) {
    if ($feed->progressParsing() !== StateInterface::BATCH_COMPLETE) {
      $this->startBatchParse($feed);
    }
    elseif ($feed->progressFetching() !== StateInterface::BATCH_COMPLETE) {
      $this->startBatchFetch($feed);
    }
    elseif ($feed->progressCleaning() !== StateInterface::BATCH_COMPLETE) {
      $this->startBatchClean($feed);
    }
    else {
      $feed->finishImport();
      $feed->startBatchExpire();
    }
  }

See one of the last line, where it says $feed->startBatchExpire();. This line got vanished. Something like it should have been added to FeedsExecutable::finish() (or FeedsBatchExecutable::finish(), since the expire feature didn't even work on cron yet).

Steps to reproduce

  1. Create a feed type.
  2. On the feed type processor settings, set a value for "Expire content items". For example, set it to "after 1 hour".
  3. Create a feed and import data.
  4. Wait until the expire time is over (for example: 1 hour).
  5. Import an updated feed in which some items are missing.

Expected result: the items that were imported an hour ago and that were not updated, should have been deleted.
Actual result: the expire batch does not happen.

Proposed resolution

It would be cool if the expire feature works both on cron and in the UI. But if that's a lot of work, we can for now probably also restore the functionality by triggering the expiration in FeedsBatchExecutable::finish().

We do need a functional test to ensure that the expire batch gets triggered.

Remaining tasks

  1. Add test coverage for the expire feature using the UI. See steps to reproduce.
  2. Investigate how much work it is to also support the expire feature on cron runs.
  3. If the expire feature is made for cron in this issue, add test coverage for this case.
  4. Restore the expire feature (for UI and perhaps also for cron).

User interface changes

None.

API changes

Probably none.

Data model changes

None.

🐛 Bug report
Status

Needs work

Version

3.0

Component

Code

Created by

🇳🇱Netherlands megachriz

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇪🇨Ecuador jwilson3

    Follow. Not only is cron affected, but also drush feed-im is affected.

  • First commit to issue fork.
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.5 + Environment: PHP 7.4 & MySQL 5.7
    last update over 1 year ago
    703 pass
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.5 + Environment: PHP 7.4 & MySQL 5.7
    last update over 1 year ago
    703 pass
  • Pipeline finished with Canceled
    over 1 year ago
    #24716
  • Pipeline finished with Failed
    over 1 year ago
    #24718
  • Open in Jenkins → Open on Drupal.org →
    Core: 9.5.5 + Environment: PHP 7.4 & MySQL 5.7
    last update over 1 year ago
    716 pass
  • Pipeline finished with Failed
    over 1 year ago
    #81631
  • First commit to issue fork.
  • 🇬🇧United Kingdom AndyF

    Rebased against ea64d3b8118d288d77e84f03e8cb1f70f45046f0.

  • 🇳🇱Netherlands megachriz

    Thanks for working on this!

    I think that the Expire feature should be able to work without requiring an import. So it should get triggered on cron runs by itself. Do you think that makes sense? Is that doable without loading all feed types on each cron run?

  • 🇬🇧United Kingdom AndyF

    At the mo I've just rebased existing work and am currently taking a look through.

    I've noticed what I think is a problem with the current approach: it locks the feed in the first item and unlocks it in the final item, but as they're cron queue worker items, they might not all get processed in a single run. So I think with sufficiently large workloads this would lead to feeds being locked between cron runs.

    So it should get triggered on cron runs by itself. Do you think that makes sense?

    Yup!

    Is that doable without loading all feed types on each cron run?

    It's been many years since I last looked at the codebase and I'm just getting acquainted with things so bear with me. IIUC we need to load the feed to get expired IDs, so the only way I can think of to avoid loading all the feeds on every run would be to add a new base field to track the last time it had a cron clear executed on it.

    So a rough outline might be

    1. Add a new timestamp Feed base field eg. cleared and set it to 0 for existing feeds.
    2. On cron
      1. we execute a query to find up to entity_update_batch_size feeds that haven't been cleared recently (as indicated by cleared). For each of those feeds, if they have any expired IDs, we add the IDs to a per-feed expiration queue;
      2. the feed expiration cron queue worker clears any items in its queue.
    3. Add a new processor setting for how frequently to check for expired items?

    Question: in an approach like this, do you think it's ok to only get a lock when grabbing the expired IDs? Ie actually deleting the items associated with those IDs will happen by processing a queue and the feed won't be locked by the queue workers?

    Thanks!

  • 🇬🇧United Kingdom AndyF
Production build 0.71.5 2024