Error responses are stored when using the Download migration process

Created on 20 February 2020, almost 5 years ago
Updated 8 October 2024, about 2 months ago

Following on from the work done in the parent issue addressing error handling.

The Download process doesn't currently handle any clean up in terms of deleting the destination file when the error occurs.

For example, downloading a http://example.com/example.jpg file which 404s will throw an exception, but will keep the destination example.jpg file on disk and have it contain the raw HTML for the 404 response, which might not be expected behaviour.

This is especially hard to work with when using the file_exists: 'use existing' option.

After discussion with Framwork managers, we should delete the file on a 400 response, and there is no need to preserve current behavior, as saving a 404 response to a jpg file is a bug.

Additional notes. There's currently some differing opinions on how to proceed here. I currently see three options.

  1. Do nothing, potentially add a log message or warning if we detect a non-200 response while downloading a file.
    • Benefits: it's easy.
    • Downside: you may have files in your filesystem named my_file.jpg that contain the body of a 404 or 403 response.
  2. Delete the file on non-200 responses.
    • Benefits: No error messages stored in files with an image/document extension after the migration completes.
    • Downside: downloading and then deleting a file from the filesystem. The file will have to exist in the public file system, for at least a short while. Could we accidentally delete a file we don't want to? could we potentially overwrite a good file with a bad one and then delete it? What are the edge cases to handle?
  3. Download files to the temporary file system, verify integrity, then move to the final destination.
    • Benefits: safest option, could eliminate files with bad responses from ever being in the filesystem, could also do integrity checks against the file to prevent malicious scripts or XSS attacks from being migrated into the filesystem. Currently migrated files are only as safe as your file source.
    • Downsides: potential slowdown of file migrations (which already can take a long time for large filesystems), particularly if the temp directory and the public directory are on different filesystems, which is often likely to be the case.
πŸ› Bug report
Status

Needs work

Version

11.0 πŸ”₯

Component

migration system

Created by

Live updates comments and jobs are added and updated live.
  • Needs framework manager review

    It is used to alert the framework manager core committer(s) that an issue significantly impacts (or has the potential to impact) multiple subsystems or represents a significant change or addition in architecture or public APIs, and their signoff is needed (see the governance policy draft for more information). If an issue significantly impacts only one subsystem, use Needs subsystem maintainer review instead, and make sure the issue component is set to the correct subsystem.

Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

Production build 0.71.5 2024