issue with S3fsFileService move method during migrations

Created on 1 March 2023, almost 2 years ago
Updated 31 July 2023, over 1 year ago

This patch is for a migration from a D7 to D9 site, where the D7 site has hundreds of thousands of S3 files, using D7's S3FS. For the migration, I left the S3 files in their bucket and am just migrating the references to them. Of these S3 files, many of them are images. After the migration, on D9, the images render in full size but thumbnails were not generating. What happened is the S3FS move method during the migration was trying to create directories out of the filenames instead of the directory names, resulting in prepare destination fails.

This patch makes sure that S3FS's move method creates the directory as needed to resolve the thumbnail issue.

I recommend you do not apply this patch unless you are experiencing the issue I describe, or something quite close to it. If you are one of these people, and this patch works for you, please comment here to let us know.

πŸ“Œ Task
Status

Postponed: needs info

Version

3.0

Component

Code

Created by

πŸ‡¨πŸ‡¦Canada Shiraz Dindar Sooke, BC

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @Shiraz Dindar
  • πŸ‡¨πŸ‡¦Canada Shiraz Dindar Sooke, BC

    Here's a simple patch for the issue I described above.

    (to the module maintainer, I started a fork, but ended up using an old-school patch, as it seems that's what's mostly happening on this project. If you'd prefer me to make a fork, just let me know!)

  • Status changed to Postponed: needs info almost 2 years ago
  • πŸ‡ΊπŸ‡ΈUnited States cmlara

    I'm good with either patches or MR's, feel free to use whichever you prefer (I do appreciate you took the time to look at the rest of the issues.)

    This doesn't seem right to me on a quick glance. The prepareDestination() method is copied from Drupal Core and to my knowledge it expects a filename not a directory (the filename is used to determine if it needs to rename a file) and inside prepareDestination() the prepareDirectory() method is called.

    Is it possible what you are experiencing is a side effect of your report in πŸ› do not query s3 for directory info when cache is disabled Fixed ?

    If this is an issue I would expect copy() to be broken as well.

  • πŸ‡¨πŸ‡¦Canada Shiraz Dindar Sooke, BC

    Hello maintainer!

    So the circumstances leading to my situation were exceptional enough that I didn't want to write them out as replication steps in this task. Specifically, this is for a migration from a D7 to D9 site, where the D7 site has hundreds of thousands of S3 files, using D7's S3FS. For the migration, I left the S3 files in their bucket and am just migrating the references to them. Of these S3 files, many of them are images. After the migration, on D9, the images render in full size but thumbnails were not generating. When I went tracing the issue (including looking closely at a number of issues on this project page) in the end I had to xdebug my way through it, and I came across the issue mentioned in this task. Between this issue, and the other one I posted, I was then able to get the imported S3 files to render thumbnails (ie. only with the two patches I've posted).

    This situation is exceptional enough that I do not necessarily expect you to merge this patch in. However, I wanted to share it here because it certainly seems possible that others might run into this, and maybe this patch would solve their issue.

    To be clear, in my xdebugging, I *definitely* came across cases where S3fsFileService's move function is being sent *both* directories and file URIs for $destination from other places within the S3FS module. This would happen within the same request in fact while it was trying to generate a thumbnail. But the filenames were definitely breaking things, and my patch definitely solved my problem.

    So ya, there's some context, and the reason why I shared the patch publicly

  • πŸ‡ΊπŸ‡ΈUnited States cmlara

    I took another look at the code and I stand corrected, prepareDestination() does accept both directories and filenames.

     if ($this->prepareDirectory($destination)) {
          // The destination is already a directory, so append the source basename.
          $destination = $this->streamWrapperManager->normalizeUri($destination . '/' . $this->basename($source));
        }
        else {
          // Perhaps $destination is a dir/file?
          $dirname = $this->dirname($destination);
          if (!$this->prepareDirectory($dirname)) {
            $this->logger->error("The specified file '%original_source' could not be copied because the destination directory '%destination_directory' is not properly configured. This may be caused by a problem with file or directory permissions.", [
              '%original_source' => $original_source,
              '%destination_directory' => $dirname,
            ]);
            throw new DirectoryNotReadyException("The specified file '$original_source' could not be copied because the destination directory '$dirname' is not properly configured. This may be caused by a problem with file or directory permissions.");
          }
        } 

    I'm still inclined to think that this might be related to πŸ› do not query s3 for directory info when cache is disabled Fixed since if the directory isn't found to exist or able to be created that would cause issues, especially with the following in prepareDirectory().

        if (!is_dir($directory)) {
          if (!($options & static::CREATE_DIRECTORY)) {
            return FALSE;
          }
    

    since it is called with only the option to change permissions, not to create the directory.

    Could be something else at play, however forcing $destination to a dirname implies a lower level issue in the protected method. If I were to guess I would suspect it is something to do with prepareDirectory() being called twice for a directory path, or stripping to a lower path which is responding different, perhaps because an aws object exists.

    print_r(dirname('public://path/to/dir/'));
    // Returns: public://path/to
    
  • πŸ‡¨πŸ‡¦Canada Shiraz Dindar Sooke, BC

    I think you may be right with your sense of why the issue was occurring, and I certainly don't want to waste your time too much with this. My patch (combined with the other) definitely did fix my issue, and this is after spending an inordinate amount of time xdebugging in there, so I know I did at least something right for my own use case, and again just thought it might be common enough to share this patch here. I appreciate your thoroughness.

  • πŸ‡©πŸ‡ͺGermany markaspot

    I can confirm that this fix worked for me as well. I encountered a similar issue while using S3FS to retrieve a remote image via code and the system_retrieve_file function, and then storing it in a subdirectory. However, I was unable to reproduce the bug when using the Drupal UI in Drupal 10.

  • πŸ‡¨πŸ‡¦Canada Shiraz Dindar Sooke, BC

    There was a bug in my patch that would cause S3FS uploads to fail when using media library widget and when the filename path had two or more directory components.

    I have updated the patch accordingly. It should *not* be committed, as my use case is quite unique. I will update the ticket title accordingly, as I do still feel this might benefit others.

  • πŸ‡¨πŸ‡¦Canada Shiraz Dindar Sooke, BC
  • πŸ‡¨πŸ‡¦Canada Shiraz Dindar Sooke, BC
  • πŸ‡¨πŸ‡¦Canada Shiraz Dindar Sooke, BC

    Whoops, patch in #8 was named incorrectly. Here is the same patch, correctly named.

Production build 0.71.5 2024