Large feeds do not import entirely

Created on 18 October 2024, 28 days ago

Problem/Motivation

The feed import processes and completes, but does not import the entire feed. The problem only occurs on large feeds that take multiple cron jobs. Testing only reproduced issue on load-balanced environments (Pantheon). The following error message is produced:
The feed "XX" encountered an error when processing: Drupal\feeds\Exception\FileNotFoundException: File <em class="placeholder">/tmp/feeds_http_fetcherXxx</em> does not exist. in Drupal\feeds\Result\FetcherResult->checkFile() (line 58 of /code/web/modules/contrib/feeds/src/Result/FetcherResult.php).
Downloading the temp file from the Feed log (or sftp) shows the temp file contains the entire feed.

Steps to reproduce

  1. Use a Drupal site on a load-balanced environment.
  2. Start import (in background) of feed with over 1000 entries.
  3. Wait for cron to run a few times to complete import.
  4. Check feed log at /feed/XX/log to see how many entries were imported.
  5. Feed log shows less than 1000 entries imported, usually a number such as 300, 700 or 900.
  6. DBLog at /admin/reports/dblog?type%5B%5D=feeds shows error as above.
  7. Feed log temp file can be downloaded and is viewable through sftp

Proposed resolution

Not sure if this is a permissions issue or it truly thinks the file is no longer in the location specified. The file is there and can be seen through sftp. I am willing to do testing when needed.

Remaining tasks

User interface changes

API changes

Data model changes

πŸ› Bug report
Status

Active

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States andreayaya

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @andreayaya
  • πŸ‡ΊπŸ‡ΈUnited States andreayaya
  • πŸ‡³πŸ‡±Netherlands megachriz

    I thought I had fixed this issue, see πŸ“Œ Missing temporary files in load balanced environments Fixed . /tmp/feeds_http_fetcherXxx is also the old location were the temporary file got stored. The file to import should be stored now at private://feeds/in_progress/[feed id]. If it is still /tmp/feeds_http_fetcherXxx then the most logical cause of that is that you updated from an older version of Feeds while there was still an import task on the queue. If so, unlock the feed in question and retry to start an import. If you cannot unlock the feed because it is not locked, make sure that all feeds tasks in the queue table are removed first. Then you should be able to start fresh with imports and temporary files will be stored at a location that hopefully doesn't cause issues on Pantheon.

    Can it be that you updated from 8.x-3.0-beta3 recently? Because the issues was fixed in 8.x-3.0-beta4, though only for new imports, not for unfinished ones that started before the update to 8.x-3.0-beta4 or later.

  • πŸ‡ΊπŸ‡ΈUnited States andreayaya

    This issue was fixed and did not appear until now, as I have been using this module for quite some time. The feed in question was set to inactive for a few months, then set to active on 9/20/24 . The upgrade on feeds from 3.0.0-beta4 => 3.0.0-rc2 was performed on 10/16/24.

    I am looking at the file location in the active feed log at /feed/XX/log which does show the source path private://feeds/log/XXXXX/source/feeds_http_fetcherXxx, but the DBLog error shows /tmp/feeds_http_fetcherXxx.

    I just added a New Feed to admin/content/feeds for this same feed. The source path for the active feed was private://feeds/log/XXXXX/source/feeds_http_fetcherXxx, and resulted in only 400 entries out of 1042 entries. Same error was logged as before. This particular feed does use feeds_fetcher_post module. Let me know if there is something else I should be checking.

  • πŸ‡³πŸ‡±Netherlands megachriz

    This particular feed does use feeds_fetcher_post module

    Ah, then that must be it. I see that feeds_fetcher_post does still uses the previous directory style, which thus can cause issues on Pantheon.

    Here that code can be seen:
    https://git.drupalcode.org/project/feeds_fetcher_post/-/blob/2.x/src/Fee...

    If feeds_fetcher_post could override less of the HttpFetcher than it does now, then this issue can also be fixed when that fetcher is used.

  • πŸ‡ΊπŸ‡ΈUnited States andreayaya

    Ah, thank you! I have created an issue for the feeds_fetcher_post module: https://www.drupal.org/project/feeds_fetcher_post/issues/3481810 πŸ› Large feeds do not import entirely due to missing temp file Active

  • πŸ‡³πŸ‡±Netherlands megachriz

    Okay, then I'm closing this one!

Production build 0.71.5 2024