- Issue created by @keithkhl
- 🇺🇸United States cmlara
I have below url structure
- Supposed: https://cdn.example.com/bucket_name/s3fs-public/styles/thumbnail/public/...
- Current: https://my_drupal_url.com/s3/files/styles/thumbnail/public/.....This is normal if the s3fs does not see the file in its s3fs_file table when it generates the page. Normally s3fs will purge the page cache when its file controller is called allowing future page views to render the S3 bucket path.
After bulk update for existing images, I cannot see any thumbnail images at all.
Can you clarify what you mean 'bulk updating' ?
- 🇰🇷South Korea keithkhl
Bulk dating - In the 'Actions' tab, there is an option called 'Copy Local Files to S3'
I set 'Always', and clicked the 'Copy local public files to S3'. It uploaded all my existing files.
Based on your comment, it is because the thumbnail images are not uploaded to my S3.
Indeed many of them are missing, but it is not like 0%. I still have 20~30% of thumbnail images are available in the S3 bucket.I still cannot understand why most thumbnail images were failed to be uploaded.
Perhaps due to mis-configuration?At least, out of 15,000 images, about 10,000 images are uploaded. Most failure comes from the thumbnail image folders that are under style/image_type/ (where image_type is thumbnail). Since most thumbnails are not uploaded, and all URLs are with mydrupalurl.com/s3/files/styles.... instead of my S3's bucket name, all thumbnails in Contents -> Media are currently blank.
Given that other images are properly uploaded and all images that I use for ad link are uploaded w/ proper S3 link, I doubt any mis-configuration is an issue.
To above any url issue, I also have enabled 'Enable CNAME' and added the cdn.example.com (identical to 'Use a Custom Host').
Might not be related to this case, but in case this may help you.
I have had trouble integrating my MinIO S3 to Nextcloud, when I use aformentioned CDN url. I run my MinIO multi server, each of them has internal IP as the end point, like 192.168.1.1:9000 and 192.168.1.2:9000 with the CDN name as the public end point at proxy server. I usually don't have to use the internal IP, but for Nextcloud, it failed to connect with cdn.example.com. I only was able to have Nextcloud to use MinIO S3 w/ internal IP. Guess it is because Nextcloud does not use URL to find the S3 instance, but with API? I was lost after reading some artilces, and I don't use Nextcloud anymore. - 🇺🇸United States cmlara
Are they any errors in the Drupal logs?
Any 'Failed to upload" messages in the batch display?
Are you seeing the message that the batch successfully completed or are you experiencing any page timeouts?You mention debugging, did you look into S3fsFileMigrationBatch::class() as part of that debugging? if so does do the missing paths show up as part of S3fsFileMigrationBatch::dirScan() ?
- 🇰🇷South Korea keithkhl
@cmlara, thank you for the update.
I have not seen any error msg displayed on the screen. Perhaps it is because I have not enabled 'verbose' option in settings.php.
I will check again with another test site later on.When it comes to debugging, since it did not work on the website, I did try with command line option with drush, but it also failed with exactly the same situation. And, I have not seen any error log during image migration. The only thing I saw during the command line operation was completion % growing from 0 to 100%.
The command line option did not honor the domain module's domain source, so all images got uploaded to main site's S3, but the uploaded image file numbers match between command line and web UI cases.
I did have timeout case once, but that was 1 out of 10 cases, and it happened only when I tried it on the website.
I have not checkec with ::class() and ::dirScan() as I am not aware of such options. I will also check them on the test site w/ 'verbose' on.
For now, I am more concerned about incompatibility as for Nextcloud and MinIO. Can you please help me to understand how S3FS establish connection btwn Drupal and S3? For Nextcloud, it uploaded files if it was connected as an external drive but not as a main drive, both of which rely on different ways to integrate S3. For S3FS, since most files are uploaded properly, except thumbnails, I suspect a configuration mismatch with integration method.
- 🇺🇸United States cmlara
For now, I am more concerned about incompatibility as for Nextcloud and MinIO.
That really is out of our scope to assist with, you will need to check with those vendors to understand the underlying issues.
Can you please help me to understand how S3FS establish connection btwn Drupal and S3?
We provide the details to the AWS PHP SDK. It connects using standard networking at the address defined for the custom host, the value of this will depend upon your specific deployment.
I would suggest checking out your storage side first, if you are not 100% sure your storage is working correctly it may not be an s3fs issue. You may wish to test with a bucket provided by another service.
- 🇰🇷South Korea keithkhl
@cmlara
It seems like AWS SDK / MinIO issue, or to some extent, due to my mis-configuration.
- https://meta.discourse.org/t/s3-cdn-url-w-bucket-name-minio/338667Above link is my experience with MinIO integration with Discourse where I also mention prbbly the same issue with NextCloud.
Your comment that the current thumbnail image url is generic S3 url when S3FS cannot find the relelvant information helped me to see what's going on.
For Drupal, it is basically the same for Nextcloud and Discourse that I have to use internal IP/Port for API and use CDN w/ full bucket name as for the matching URL.
For now, I will compromise with the workaround, but when AWS SDK is updated or MinIO fixes the issue, I might have to re-configure the whole set up.
Anyways, thank you for the support. @cmlara.
For anyone with MinIO multi server, I hope my struggle help to figure out the issue. - 🇺🇸United States cmlara
Discourse link explains a few things
Some of the issues you describe sound like NAT hairpinning problems, which you should avoid by using the internal IP’s or hostnames.
“Besides, I prefer not to use internal IP as the ‘S3 endpoint’” I suggest you sit down with your network engineer and have a conversation for a solution that fits your needs.To my recollection officially path based buckets do not support CNAME’s,
It may work due to the simplistic nature in how we generate most URL’s however it is not API supported. I don’t guarantee it long term and strongly suggest you switch to DNS style buckets as the API will eventually drop support for path based buckets. IIRC our 4.x branch is dropping support for path based buckets and will only support domain names for the CNAME field. 4.x is by no means imminent for release, just a note for the long term future.For now, I will compromise with the workaround,
Are you saying when using internal IP’s all files are correctly copied ?
- 🇰🇷South Korea keithkhl
Well.. I have changed MinIO setting to virtual host so that it honors Amazon S3's default URL structure.
Thought it should work, so I activated S3FS again on my test site, a copy of live website.
Then, once I activate S3FS, all media files turn to blank with aforementioned URL structure, like mydomain.com/s3/files/....Hoping that simpler configuration may re-write the URL, I tried with another batch run for 'Copy Local Files to S3'.
Unfortunately, it does not work. Tried both GUI and command line.As before, most image files are uploaded to desired bucket in S3, but images under /style/ folder are only partly uploaded and the URL is broken.
I feel like there is a chunk of cache left in DB that is enabled when I activated S3FS again.
Can you plz help me to purge that cache and restart the bulk image upload?
At least, when I re-activate S3FS, the images in /contents/media should still be visible.If it is not about the existing DB cache, I am really puzzled. With the virtual host set up, it really is the same as Amazon S3, except the fact that the endpoint url is not Amazon's native one.
FYI, with the virtual host, all other open sources that I am working on creates zero issue. Except the fact that I have custom endpoint, Nextcloud, Discourse, Moodle, and WordPress work smoothly.
- 🇺🇸United States cmlara
Apply the following patch to your site and run a copy operation with all files.
Check /tmp/s3fs_scan_dir_output.txt for a path of a file that already exists on disk but was not uploaded to your s3 bucket, then check /tmp/s3fs_upload_list.txt, where you should see two entries of the paths.
Advise your results.
diff --git a/src/Batch/S3fsFileMigrationBatch.php b/src/Batch/S3fsFileMigrationBatch.php index 5b4080e..b9c1516 100644 --- a/src/Batch/S3fsFileMigrationBatch.php +++ b/src/Batch/S3fsFileMigrationBatch.php @@ -109,6 +109,7 @@ public function dirScan($dir) { } else { $output[] = $path; + file_put_contents('/tmp/s3fs_scan_dir_output.txt', $path . PHP_EOL, FILE_APPEND); } } } @@ -147,6 +148,7 @@ public static function copyOperation(array $config, array $file_paths, $total, $ $context['results']['errors'] = []; } foreach ($file_paths as $path) { + file_put_contents('/tmp/s3fs_upload_list.txt', "Processing: $path" . PHP_EOL, FILE_APPEND); $relative_path = substr_replace($path, '', 0, strlen($source_folder) + 1); $key_path = $target_folder . $relative_path; $uri = $scheme . '://' . $relative_path; @@ -215,6 +217,7 @@ public static function copyOperation(array $config, array $file_paths, $total, $ \Drupal::moduleHandler()->alter('s3fs_upload_params', $uploadParams); try { + file_put_contents('/tmp/s3fs_upload_list.txt', "Uploading: $path" . PHP_EOL, FILE_APPEND); $s3->putObject($uploadParams); } catch (\Exception $e) {