Node form with two file fields can cause mystery file deletions.

Created on 17 September 2019, about 5 years ago
Updated 1 August 2024, 4 months ago

I am posting this issue a critical core bug because I believe it results in data loss. This issue is very closely related to #2869855: Race condition in file_save_upload causes data loss β†’ and that issue may be further along to the point this is simply closed this, but considering the circumstances around these bugs are different, I thought it would make sense to post separately.

Problem/Motivation

We have been dealing with mystery "deleted documents" on our site for a few months now, and have traced the issue back to what we believe is the problem. The cron setting to delete temporary files is ACTUALLY deleting our production files that are in use on active revisions. We believe, and have done some experimentation to prove a scenario that fits this hypothesis, that at some point during the file upload process (node updates), there are 2 file entity entries being created for a single node revision, 1 permanent, and 1 temporary, both with the exact same file URI. Cron comes around to clean up files, finds the temporary entry, and deletes the file at that file location, and you now have an orphaned permanent file entry that does not point to a valid file on the server.

The scenario where we believe this happens...
We have a node with 2 file fields, one for a document (PDF/Word/etc), and one for an image. Consider a node that has both fields filled in.

  1. A content manager goes into the node edit screen, and needs to add a new PDF and remove the old image, a very common practice for our content team.
  2. The CM will click the "Remove" button for the existing PDF, and is then presented with a new file upload field. The user browses to the new file and selects it to be uploaded, which triggers an AJAX request to the server.
  3. The user then clicks the "Remove" button for the existing image. This initiates another AJAX request to the server. If the previous file upload response has not yet been received by the page, the payload in the "remove image" request will ALSO include the file upload field from the previous step.

Step 3 above can cause Drupal to create a 2nd temporary file with the exact same URI as the request in step 2, but with a different fid. When you actually SUBMIT the node edit, only 1 of those fid's is referenced on the form, and only that file gets converted from temporary to permanent, but there is a temporary file left behind.

We are estimating this happens to our content managers maybe 1 out of 40 documents or so uploaded. So it's not every time, but obviously the deletion of any production data is a very serious problem.

We have replicated this problem with by using a delay in a hook_file_validate method to simulate a slower file upload. While our production code has no such delays, we have guessed maybe some kind of network latency, or files coming from shared drives, or various other scenarios could maybe be the culprit, but don't have concrete evidence as to WHAT is causing the delay. Regardless, it seems there should be code put in place to prevent this possibility from happening.

The related issue here seems to take a similar approach as well, using a test with a file unlinked in code to show it's possible. There seem to be some frontend issues in that case, documented in comment #59 β†’ over there that may not be considered while fixing that problem. Part of the reason to open this new issue is just to provide considerations for this front-end approach to encounter that error.

Proposed resolution

Would disabling multiple AJAX file field operations from running at the same time be an option?
Should the back-end storage of files be fixed? (which may be done in the linked issue)
Other?

Remaining tasks

N/A

User interface changes

N/A

API changes

N/A

Data model changes

N/A

πŸ› Bug report
Status

Closed: cannot reproduce

Version

11.0 πŸ”₯

Component
File systemΒ  β†’

Last updated about 11 hours ago

Created by

πŸ‡ΊπŸ‡ΈUnited States dpagini

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡¦πŸ‡ΊAustralia acbramley

    This came up as a BSI triage issue again today. The issue mentioned in the IS has now been committed, I would like to see if this is reproducible on 11.x with a set of steps starting from a fresh Drupal installation.

    Are you able to provide that @dpagini?

  • Status changed to Closed: cannot reproduce 4 months ago
  • πŸ‡¦πŸ‡ΊAustralia acbramley

    Thanks for reporting this issue. We rely on issue reports like this one to resolve bugs and improve Drupal core.

    As part of the Bug Smash Initiative, we are triaging issues that are marked "Postponed (maintainer needs more info)". This issue was marked "Postponed (maintainer needs more info)" more than 1 year ago.

    Since we need more information to move forward with this issue, I am closing it.

    Please feel free to reopen with more information.

Production build 0.71.5 2024