- Status changed to Needs work
almost 2 years ago 8:29pm 18 January 2023 - Status changed to Needs review
almost 2 years ago 7:58am 24 January 2023 - π³π±Netherlands megachriz
Setting this back to "Needs review" because I think that the issue reported here needs to be fixed in an other issue, possibly π FetcherResult::checkFile() throws \RuntimeException, which is never caught Needs review .
- πΊπΈUnited States smustgrave
So we are using redis for our queue so maybe the bad values are in there. Clearing the queue table doesn't do anything for us.
- π³π±Netherlands megachriz
@smustgrave
Hm, that's unfortunate. Since 8.x-3.0-beta3, Feeds automatically cleans up tasks for a specific feed from the queue table whenever that feed finishes an import or when the feed gets unlocked. But if that queue is external, Feeds cannot clean up tasks from that queue. The queue has no API method to clear a specific task from it. The queue API only provides a method to clear a whole queue. The Feeds queues are currently defined per feed type. Perhaps each feed should gets its own queue? That could potentially lead to a large amount of queues, not sure if that's a problem. - πΊπΈUnited States smustgrave
So have to dig more but if the feeds module is querying the queue table directly that seems wrong to me. If it's using the queue service then when a queue is cleared it will clear from what other source it is using.
Experiencing many times drupal default queue will fail and cause performance issues.
- π³π±Netherlands megachriz
@smustgrave
We should probably discuss the queue thing further in an other issue than this, but see the methodshasQueueTasks()
andclearQueueTasks()
in FeedImportHandler.hasQueueTasks()
is used by FeedsLockBackend to determine if an import still appears to be active.clearQueueTasks()
is called when finishing an import or when unlocking a feed. When unlocking a feed, it is important that any active imports for that feed get killed and that is done now by removing specific tasks from the queue table.The code:
/** * Checks if there are still tasks on the feeds queue. * * @param \Drupal\feeds\FeedInterface $feed * The feed to look for in the queue. * * @return bool * True if there are still tasks on the queue. False otherwise. */ public function hasQueueTasks(FeedInterface $feed): bool { // First check if the queue table exists. if (!$this->database->schema()->tableExists('queue')) { // No queue table exists, so no tasks can exist on it. return FALSE; } $result = $this->database->select('queue') ->fields('queue', []) ->condition('data', 'a:3:{i:0;i:' . $feed->id() . ';%', 'LIKE') ->countQuery() ->execute() ->fetchField(); return $result > 0; } /** * Removes all queue tasks for the given feed. * * @param \Drupal\feeds\FeedInterface $feed * The feed for which to remove queue tasks. */ public function clearQueueTasks(FeedInterface $feed): void { if (!$this->database->schema()->tableExists('queue')) { return; } $this->database->delete('queue') ->condition('data', 'a:3:{i:0;i:' . $feed->id() . ';%', 'LIKE') ->execute(); }
I didn't see an other way to cleanup stale queue tasks. The only other option is to remove the whole queue, but that would kill imports of other feeds of the same feed type as well. Queues are currently defined per feed type, not per feed.
Tested the MR and also experienced the RuntimeException error with /tmp/feeds_http_fetcher not existing in our load-balanced environment.
Still testing but I wonder if it has something to do with our site also using a s3fs for a remote file system, which is quite useful for load-balanced environments. I noticed the tempnam method in the FeedsFileSystemBase class was using fileSystem->realpath to get the path. The Drupal API docs say regarding realpath:
"Resolves the absolute filepath of a local URI or filepath. The use of this method is discouraged, because it does not work for remote URIs. Except in rare cases, URIs should not be manually resolved. Only use this function if you know that the stream wrapper in the URI uses the local file system, and you need to pass an absolute path to a function that is incompatible with stream URIs.
https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21File%21Fi...
- π³π±Netherlands megachriz
@rdivince
Thanks helping with testing! I've no experience with s3fs. Can you check what happens with that filesystem if you try to write a file the wayFeedsFileSystemBase::tempnam()
does that? So test that separately, outside of the Feeds context?also experienced the RuntimeException error with /tmp/feeds_http_fetcher not existing in our load-balanced environment
That's probably because there are still queue items left on the queue. So in order to check if the solution here works for future imports, try to empty these Feeds queues first. One way of doing that is by installing the Queue UI β module. It can also be done with the drush command
drush queue:delete
, but then you need to know the exact queue name, which you can see withdrush queue:list
. - π³π±Netherlands megachriz
@rdivince
I tried to find a way how I could test the patch with a S3 filesystem. It looks like I would need an account for Amazonβs Simple Storage Service (S3) for that. I tried to create an account there but I got stuck at the step asking for Credit Card details. In the Netherlands it is not self-evident to have that. So Iβm tempted to commit this without the S3 check. - πΊπΈUnited States smustgrave
FYI after I cleared my redis queues and applied this patch across all our environments we are no longer seeing the problem.
We are using Azure with a share file mount. Our prod and stage are scale up to multiple instances as well.
@MegaChriz thanks for the info! I was able to clear the queue with the drush commands provided. But it did not fix the import errors. I tested file writing with tempnam outside of the feeds context but did not have any luck. But I did get the MR to work as followed:
In
FeedsFileSystemBase::tempnam()
, instead of writing a temporary file with tempnam, I wrote a file w/\Drupal::service('file.repository')->writeData()
. This method should support local and remote stream wrappers. Then I got additional error information that the getAsync call inHttpFetcher::get()
appears to open a file with "w+" permissions by default which s3 (and perhaps other remote stream wrappers) do not seem to allow: https://docs.aws.amazon.com/sdk-for-php/v3/developer-guide/s3-stream-wra...You can upload larger files by streaming data using fopen() and a βwβ, βxβ, or βaβ stream access mode. The Amazon S3 stream wrapper does not support simultaneous read and write streams (e.g. βr+β, βw+β, etc). This is because the HTTP protocol doesnβt allow simultaneous reading and writing.
First opening the stream through
GuzzleHttp\Psr7\Utils::tryFopen()
with "w" mode and passing that filestream as the sink to getAsync worked perfectly. Not sure if other use cases require "w+"While these tweaks to the MR successfully resolved my errors with s3fs, I know mine is just one of many use cases and you'll know those better than me. So I've attached my tweaks as a patch and would like to hear your thoughts.
- Status changed to RTBC
over 1 year ago 2:42pm 22 March 2023 - πΊπ¦Ukraine pingwin4eg Zaporizhia πΊπ¦
The fix from the merge request works - the default private directory (private://feeds/in_progress/#) is used instead of the temporary one.
For those who still see the error with the tmp file, check also other fetcher classes. For example, in our client's project there are also the feeds_http_auth_fetcher module with it's own fetcher, and the custom one, and existing feed entities use all 3. So I had to change them too similar to the http fetcher.
-
MegaChriz β
committed 4062e01b on 8.x-3.x
Issue #2912130 by Fant0m, MegaChriz, artematem, n4r3n, merauluka, bburg...
-
MegaChriz β
committed 4062e01b on 8.x-3.x
- Status changed to Fixed
over 1 year ago 2:59pm 30 March 2023 - π³π±Netherlands megachriz
@rdivince
I had been giving Feeds a low priority for the past month, so I guess that for your patch I'll need to dive deeper to find out how remote file systems work. So because of that I'm merging the changes of this issue now and move your patch to a follow-up issue. Perhaps we can find a way to add test coverage for remote stream wrappers too? - π³π±Netherlands megachriz
@rdivince
I've opened β¨ Add support for remote file systems for http downloaded data Active . Automatically closed - issue fixed for 2 weeks with no activity.
- Status changed to Fixed
over 1 year ago 4:12pm 9 May 2023 - πΊπΈUnited States bkosborne New Jersey, USA
Glad to see this landed! thanks everyone who contributed.
@MegaChriz any idea when you might cut a new beta that includes this?
- π³π±Netherlands megachriz
@bkosborne
I hope within a month from now. There is a follow-up issue that I would like to include in the next release as well: π FetcherResult::checkFile() throws \RuntimeException, which is never caught Needs review . While the fix for this issue only aims to prevent imports ending abruptly for new imports, that other issue should help cleaning up existing imports that are (still) stuck in a loop as a result of the issue that was reported here. - π©πͺGermany c-logemann Frankfurt/M, Germany
I understand this new feature for fixing loadbalancing issues. But this new tmp files inside the private folder can also break some backup strategies. So where is the "configurable" part?
Put http downloaded files in a configurable directory and added service to interact with this directory - to fix issue with missing temporary files in load balanced environments.
- πͺπ¨Ecuador jwilson3
Use the
in_progress_dir
config element in the Feed module's feeds.settings.yml config file.The default location is now
private://feeds/in_progress
assuming you've configured a private files directory, otherwise, it ispublic://feeds/in_progress
.Review the commit from above for additional context:
https://git.drupalcode.org/project/feeds/-/commit/4062e01bb8471254d9daa5...
- π©πͺGermany c-logemann Frankfurt/M, Germany
@jwilson3 Thanks for the description. So it's a good candidate to set in settings.php
In many situations I think configuration via settings.php is enough especially for very special functionality. And this feature is such a special thing to solve load balancing problems but can rise problems for non advanced admins. So maybe it would be fine to make make this feature and a setting for it more visible. But it's just an idea.
For our customers we will just exclude this folder via backup scripts. This we already do with other temporary folders like image styles rendered css etc.