I am posting this issue a critical core bug because I believe it results in data loss. This issue is very closely related to
#2869855: Race condition in file_save_upload causes data loss β
and that issue may be further along to the point this is simply closed this, but considering the circumstances around these bugs are different, I thought it would make sense to post separately.
Problem/Motivation
We have been dealing with mystery "deleted documents" on our site for a few months now, and have traced the issue back to what we believe is the problem. The cron setting to delete temporary files is ACTUALLY deleting our production files that are in use on active revisions. We believe, and have done some experimentation to prove a scenario that fits this hypothesis, that at some point during the file upload process (node updates), there are 2 file entity entries being created for a single node revision, 1 permanent, and 1 temporary, both with the exact same file URI. Cron comes around to clean up files, finds the temporary entry, and deletes the file at that file location, and you now have an orphaned permanent file entry that does not point to a valid file on the server.
The scenario where we believe this happens...
We have a node with 2 file fields, one for a document (PDF/Word/etc), and one for an image. Consider a node that has both fields filled in.
- A content manager goes into the node edit screen, and needs to add a new PDF and remove the old image, a very common practice for our content team.
- The CM will click the "Remove" button for the existing PDF, and is then presented with a new file upload field. The user browses to the new file and selects it to be uploaded, which triggers an AJAX request to the server.
- The user then clicks the "Remove" button for the existing image. This initiates another AJAX request to the server. If the previous file upload response has not yet been received by the page, the payload in the "remove image" request will ALSO include the file upload field from the previous step.
Step 3 above can cause Drupal to create a 2nd temporary file with the exact same URI as the request in step 2, but with a different fid. When you actually SUBMIT the node edit, only 1 of those fid's is referenced on the form, and only that file gets converted from temporary to permanent, but there is a temporary file left behind.
We are estimating this happens to our content managers maybe 1 out of 40 documents or so uploaded. So it's not every time, but obviously the deletion of any production data is a very serious problem.
We have replicated this problem with by using a delay in a hook_file_validate method to simulate a slower file upload. While our production code has no such delays, we have guessed maybe some kind of network latency, or files coming from shared drives, or various other scenarios could maybe be the culprit, but don't have concrete evidence as to WHAT is causing the delay. Regardless, it seems there should be code put in place to prevent this possibility from happening.
The related issue here seems to take a similar approach as well, using a test with a file unlinked in code to show it's possible. There seem to be some frontend issues in that case,
documented in comment #59 β
over there that may not be considered while fixing that problem. Part of the reason to open this new issue is just to provide considerations for this front-end approach to encounter that error.
Proposed resolution
Would disabling multiple AJAX file field operations from running at the same time be an option?
Should the back-end storage of files be fixed? (which may be done in the linked issue)
Other?
Remaining tasks
N/A
User interface changes
N/A
API changes
N/A
Data model changes
N/A