- Issue created by @aron novak
- 🇭🇺Hungary aron novak Hungary, Budapest
We need to note that not all feeds were affected, 4 of ~20 only. However those generated ~50 GB disk usage.
- Status changed to Postponed: needs info
over 1 year ago 6:37pm 20 July 2023 - 🇳🇱Netherlands megachriz
Hm, after an import, the file from which content was imported, should get cleaned up. See the following:
FeedsExecutable::finish()
:$fetcher_result->cleanUp(); $feed->finishImport();
HttpFetcherResult::cleanUp
:/** * {@inheritdoc} */ public function cleanUp() { if ($this->filePath) { $this->fileSystem->unlink($this->filePath); } }
Feed::finishImport()
callsFeed::unlock():
and in that method there is:\Drupal::service('feeds.file_system.in_progress')->removeFiles((string) $this->id());
FeedsFileSystemBase::removeFiles()
:/** * {@inheritdoc} */ public function removeFiles(string $subdirectory): bool { $directory = $this->getFeedsDirectory() . '/' . $subdirectory; if (!is_dir($directory)) { // Directory does not exist. Abort. return FALSE; } $result = $this->fileSystem->deleteRecursive($directory); return $result ? TRUE : FALSE; }
Perhaps there is a case that I overlooked and then the file cleanup doesn't happen? I need more specific steps to reproduce. If possible, a test that demonstrates the issue would be nice.
- 🇭🇺Hungary aron novak Hungary, Budapest
I will try to come up with a test, what I concluded that the problematic Feeds have URLs that returns 404 error, not HTTP 200.
- 🇳🇱Netherlands megachriz
Feed urls that return a 404 is useful information! With that I'm able to reproduce the issue. I see that the issue exists only on cron runs, not when importing through the UI.
I see also that every time I run cron, one downloaded file gets added. This makes me think that this issue is closely related to 🐛 FetcherResult::checkFile() throws \RuntimeException, which is never caught Needs review . Also because it is about a RuntimeException. That issue is currently kinda stuck on the question: when to abort an import and when to retry at a later time? - Status changed to Active
over 1 year ago 12:49pm 24 July 2023 - Merge request !126WIP: in_progress filesystem grows indefinitely - test → (Merged) created by aron novak
- last update
over 1 year ago 695 pass, 1 fail - 🇳🇱Netherlands megachriz
A possible fix for this issue is added to 🐛 FetcherResult::checkFile() throws \RuntimeException, which is never caught Needs review . The test from here is moved over to there as well.
@Aron Novak
@bkosborne
Do you want to review the changes here? Perhaps you also have an opinion on the risk of imports getting aborted on temporary failures.
Personally, I think that a never ending import with files getting piled up is more serious than an import getting aborted when there's a timeout or a similar issue like that. - last update
11 months ago 715 pass - Status changed to Needs review
11 months ago 11:47am 16 December 2023 - 🇳🇱Netherlands megachriz
Because I think that 🐛 FetcherResult::checkFile() throws \RuntimeException, which is never caught Needs review requires some more thought, I've come up with a simpler fix for this issue. And that is: when a RequestException occurs in
HttpFetcher::get()
, delete the temporary created file immediately. This does not fix Feeds trying to fetch over and over again, but does fix the file system from growing indefinitely.That Feeds tries to fetch over and over again should be addressed in 🐛 FetcherResult::checkFile() throws \RuntimeException, which is never caught Needs review . The thing there that requires some more thought is in what situations Feeds should retry and in what situations it should abort the import.
Leaving the proposed fix here for review for a few weeks. I plan to commit it sometime in January, should there be no feedback.
- last update
11 months ago 715 pass - last update
7 months ago 718 pass - last update
6 months ago 718 pass -
MegaChriz →
committed 9ad1192c on 8.x-3.x authored by
Aron Novak →
Issue #3372368 by MegaChriz, Aron Novak: Fixed in_progress filesystem...
-
MegaChriz →
committed 9ad1192c on 8.x-3.x authored by
Aron Novak →
- Status changed to Fixed
6 months ago 2:08pm 19 May 2024 - 🇳🇱Netherlands megachriz
Okay, I merged this. And retested the scenario. When a resource that Feeds tries to fetch returns a 404, Feeds will keep trying on every cron run to fetch it over and over again. But with the changes from this issue it no longer stores a file on the file system.
It has been awhile for me, but I think that the other issue (Feeds endlessly tries to fetch a resource) should be handled in 🐛 FetcherResult::checkFile() throws \RuntimeException, which is never caught Needs review .
And with this change in, we can plan a new Feeds release: the last Feeds release with Drupal 9 support.
- 🇳🇱Netherlands megachriz
I see that the reported issue can technically still happen in case of a 304, but it seems only to happen in case fetching happens 'stand-alone', separate from the import process. Fixing that will be done in 🐛 Delete Empty Files Created If Empty/Error Exception is Returned Needs review .
Automatically closed - issue fixed for 2 weeks with no activity.