[META] Modernise file upload logic

Created on 1 July 2021, over 3 years ago
Updated 13 May 2024, 7 months ago

Problem/Motivation

We currently duplicate the file field upload logic across REST, JSON API and the contrib GraphQL module.

This should perhaps be in the meta

We have 3 types of upload:

  1. 'Unmanged' file upload: e.g. config import tarball form upload
  2. File entity upload (not a field on another entity). Don't think we actually do this anywhere πŸ€”
  3. File entity upload (a field on an entity). The default

We have 4 kinds of validation:

  1. Symfony UploadedFile validation
  2. Form API 'File Upload' validation
  3. File field configured validation
  4. File Entity validation

We have two modes:

  1. Form upload: we can rely on the Symfony request object to get the UploadedFile objects, check for php limit errors etc
  2. Direct API upload: we manually parse the 'content-disposition' header and handle the upload stream ourselves

Important Note: the Symfony UploadedFile objects are created from the $_FILES superglobal. The $_FILES superglobal is only populated when the Content-Type header in the request uses application/x-www-form-urlencoded or multipart/form-data. Since REST and JSON-API only support application/octet-stream so we cannot use the Symfony UploadedFile objects for these.

See: https://www.php.net/manual/en/reserved.variables.files.php

Places in core doing file upload:

  • file_managed_file_save_upload(): form callback for File element.
  • file_save_upload(): more API level called from multiple places incl. contrib. Uses UploadedFiles.
  • \Drupal\file\Plugin\rest\resource\FileUploadResource::post() does direct file uploads via content-disposition and php:://input
  • \Drupal\jsonapi\Controller\FileUpload::handleFileUploadForExistingResource() ::handleFileUploadForNewResource almost identical code to REST FileUploadResource. Does direct file uploads via content-disposition and php:://input
  • \Drupal\ckeditor5\Controller\CKEditor5ImageController::upload() Uses UploadedFiles.

This meta issue provides the high-level overview of the tasks:

Steps to reproduce

Proposed resolution

Core Managed File Upload Proposed Resolution

Here's a proposed plan for replacing:

  • file_managed_file_save_upload
  • _file_save_upload_from_form
  • file_save_upload

With:

FileFormHelper:

  • Replaces file_managed_file_save_upload and _file_save_upload_from_form
  • provides callback for Element\ManagedFile
  • handles all form interactions
  • getting info from form elements
  • setting form errors
  • setting status messages

FileFormUploader

  • Replaces file_save_upload
  • is passed an array of UploadedFiles rather than having to lookup from Request again
  • handles caching of uploads
  • Loops through uploads with multiple files
  • handles UploadedFile::isValid checks
  • Calls FileUploadHandler::handleFileUpload() for each file
  • catches exceptions for each file upload
  • Adds status messages for each file upload
  • Returns an array of fid=>file to FileFormHelper

REST and JSON API Proposed Resolution

  • Introduce a DirectFileFieldUploadHandler that does most of the work
  • Introduce a ContentDispositionFilenameParser for abstracting out the work to grab the filename from the request headers
  • Checking access still seems complex and we need to drill down on whether there are differences between REST and JSON API. Core file upload doesn't do that at this layer. Maybe we can move it somewhere further up?
  • Introduce a new InputStreamWriter to abstract out writing php://input to a temp file. This should return an instance of UploadedFileInterface
  • We still need to parse the destination directory which could contain tokens. Not sure where core file field uploads do this?
  • We need to keep existing locks. Can these be moved into the
  • We can re-use the existing FileUploadHandler

REST

Proposed:

JSON API file upload sequence

Proposed:

Remaining tasks

User interface changes

API changes

Data model changes

Release notes snippet

Core file_managed_file_upload sequence:

Proposed:

GraphQL upload sequence

CKEditor image upload sequence

πŸ“Œ Task
Status

Active

Version

11.0 πŸ”₯

Component
File systemΒ  β†’

Last updated 3 days ago

Created by

πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    Added a sequence diagram to show the current state.

    Mermaid definition is:

    sequenceDiagram
        autonumber
        participant mf as Element\ManagedFile
        participant mfsu as file_managed_file_save_upload
        participant Request
        participant suff as _file_save_upload_from_form
        participant fsu as file_save_upload
        participant fuh as FileUploadHandler
        mf->>+mfsu: valueCallback()
        mfsu->>+Request: get files to be uploaded for form element
        Request-->>-mfsu: files from request
        mfsu->>+suff: with form element and state
        suff-->>suff: get upload validators from form
        suff->>+fsu: with upload validators
        fsu->>+Request: get files to be uploaded for form element (again)
        Request-->>-fsu: files from request
        loop for each file
        fsu->>+fuh: upload file with validators
        fuh-->>fuh: Call UploadedFile::isValid()
        fuh-->>fuh: Sanitize filename
        fuh-->>fuh: Call FileValidator::validate() with validators
        fuh-->fuh: Move uploaded file
        fuh-->fuh: Call $file->validate()
        fuh-->>fuh: Save file
        fuh-->>-fsu: upload result
        end
        fsu-->>-suff: array of fid=>file
        suff-->>-mfsu: array of fid=>file
        mfsu-->>-mf: array of fid=>file
    
    
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    Here's a sequence diagram of the REST file upload

    Mermaid definition:

    sequenceDiagram
        autonumber
        participant fur as FileUploadResource
        participant Request
        participant efm as EntityFieldManager
        participant pi as php://input
        participant fv as FileValidator
        participant fs as FileSystem
        fur->>+Request: getHeaders()
        Request-->>-fur: content disposition headers
        fur-->>fur: get filename
        fur->>+efm: getFieldDefinitions()
        efm-->>-fur: field definition
        fur-->>fur: get destination
        fur-->>fur: get upload validators from definition
        fur-->>fur: sanitize filename
        fur->>+pi: stream php://input to temp file
        pi->>-fur: file
        fur-->>fur: lock file for save
        fur-->>fur: create file entity
        fur->>+fv: validate()
        fv-->>-fur: violations
        fur-->>fur: set filename
        fur->>+fs: move file to destination
        fur-->>fur: validate file entity
        fur-->>fur: save file entity
        fur-->>fur: release lock
    
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    Here's a sequence diagram of the somewhat more complex json api file upload.

    Mermaid definition:

    sequenceDiagram
        autonumber
        participant fuc as FileUpload controller
        participant efm as EntityFieldManager
        participant tffu as TemporaryFileFieldUploader
        participant acm as AccessControlManager
        participant Request
        participant pi as php://input
        participant Lock
        participant mtg as MimeTypeGuesser
        participant fv as FileValidator
        participant fs as FileSystem
        fuc->>+efm: get field definitions
        efm-->>-fuc: field definition
        fuc->>+tffu: check access
        tffu->>+acm: check access
        acm-->>tffu: access result
        tffu-->>-fuc: access result
        fuc->>+tffu: parse content disposition headers
        tffu->>+Request: get headers
        Request-->>-tffu: headers
        tffu-->>-fuc: filename
        fuc->>+tffu: handle file upload for field (with definition)
        tffu-->>tffu: get destination
        tffu->>fs: prepare destination
        tffu-->>tffu: get validators
        tffu-->>tffu: santize filename
        tffu->>+pi: stream php://input to temp file
        pi->>-tffu: file
        tffu->>Lock: acquire lock for save
        tffu->>+mtg: Guess mime type
        mtg-->>-tffu: Mime type
        tffu-->>tffu: Create file entity
        tffu->>+fv: validate
        fv-->>-tffu: violations
        tffu-->>tffu: set filename
        tffu->>fs: move file
        tffu-->>tffu: validate entity
        tffu-->>tffu: Save file
        tffu->>Lock: release lock
        tffu-->>-fuc: File
    
    
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    Tweaking the sequence diagrams to include more details

  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    Updating core file_managed_file_upload sequence

  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    Adding the graphql upload sequence.

    Interestingly, GraphQL uploads raise some questions:

    1. Use Symfony UploadedFile: this indicates they rely on $_POST varables being populated, therefore the Content-Type header in the request uses application/x-www-form-urlencoded or multipart/form-data. Would be good to verify this?
    2. It does extra validation for the extension, and dispatches the sanitize filename event twice. Would good to understand why?
    3. Although it uses Symfony UploadedFile, it doesn't call UploadedFile::move() which in turn calls PHP move_uploaded_file() which is recommended. Again not sure why?
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    Another question on direct file uploads compared with Symfony UploadedFile / $_POST file uploads:

    Are we checking for the php errors? e.g. UPLOAD_ERR_INI_SIZE, UPLOAD_ERR_FORM_SIZE, UPLOAD_ERR_PARTIAL, UPLOAD_ERR_NO_FILE, UPLOAD_ERR_EXTENSION etc?

    Looks like we are just adding a FileSizeLimit / max_filesize validator. Does that cover the cases above?

  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    Other questions:

    • Should we support multiple file uploads?
    • Why do we stream upload php://input when Symfony Request::getContent() does this for us?
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    Catching up on @Wim's blog post (now unavailable) https://web.archive.org/web/20230522044139/http://wimleers.com/blog/api-...

  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    After reading Wim's blog post, I think I answered a few of my own questions.

    Are we checking for the php errors? e.g. UPLOAD_ERR_INI_SIZE, UPLOAD_ERR_FORM_SIZE, UPLOAD_ERR_PARTIAL, UPLOAD_ERR_NO_FILE, UPLOAD_ERR_EXTENSION etc?

    Looks like we are just adding a FileSizeLimit / max_filesize validator. Does that cover the cases above?

    Why do we stream upload php://input when Symfony Request::getContent() does this for us?

    We are intentionally wanting to avoid PHP upload size limits by using php://input. Request::getContent() looks like it just just returns fopen('php://input', 'r') for binary files anyway.

  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    Here's a proposed plan for replacing:

    • file_managed_file_save_upload
    • _file_save_upload_from_form
    • file_save_upload

    With:

    FileFormHelper:

    • Replaces file_managed_file_save_upload and _file_save_upload_from_form
    • provides callback for Element\ManagedFile
    • handles all form interactions
    • getting info from form elements
    • setting form errors
    • setting status messages

    FileFormUploader

    • Replaces file_save_upload
    • is passed an array of UploadedFiles rather than having to lookup from Request again
    • handles caching of uploads
    • Loops through uploads with multiple files
    • handles UploadedFile::isValid checks
    • Calls FileUploadHandler::handleFileUpload() for each file
    • catches exceptions for each file upload
    • Adds status messages for each file upload
    • Returns an array of fid=>file to FileFormHelper
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    Here's the proposed REST and JSON API sequence diagrams. Updated the proposed section in the IS.

  • πŸ‡«πŸ‡·France andypost

    would be great to unify progress tracking as well

  • πŸ‡ΊπŸ‡¦Ukraine Matroskeen πŸ‡ΊπŸ‡¦ Ukraine, Lutsk

    I'm not diving into the details, but I hope the contributors will keep in mind the support of resumable file uploads. Example: https://www.drupal.org/project/tus β†’ .

  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    @Matroskeen is there anything in the proposed architecture that could be modified to make it easier to support TUS in contrib?

  • πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10
  • πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    Adding the sequence for \Drupal\ckeditor5\Controller\CKEditor5ImageController::upload().

    As pointed out in πŸ› [PP-1] CKEditor file upload sets file URI prior to validation, causing validators to be unable to find the file. Postponed the order of validating the file and then moving it is the wrong way around.

  • πŸ‡¦πŸ‡ΊAustralia larowlan πŸ‡¦πŸ‡ΊπŸ.au GMT+10

    Checking access still seems complex and we need to drill down on whether there are differences between REST and JSON API. Core file upload doesn't do that at this layer. Maybe we can move it somewhere further up?

    Can we open a separate issue to try and move that to the routing layer via an access callback.
    We can postpone this on that.

    We still need to parse the destination directory which could contain tokens. Not sure where core file field uploads do this?

    This happens in \Drupal\file\Plugin\Field\FieldType\FileItem::getUploadLocation. Perhaps we can reuse that if we have a file item field (file reference) in scope?

    We need to keep existing locks. Can these be moved into the

    This sentence is missing a conclusion πŸ˜€

    It's not stated in the issue summary, but I assume the plan for CKEditor is the same for JSONAPI/Rest? If so, can we update the issue summary there too.

    I think the proposed resolutions make sense. Cast everything to an UploadedFileInterface value object seems like a good way to approach homogenizing these APIs.

    I think we should explore moving the access to routing, just to confirm it can be done.

    I think it would also be worth exploring a param converter at the routing level to see if we can upcast the raw POST data into UploadedFiles too, making things even more consistent.

    Amazing work on the diagrams too.

  • πŸ‡ΊπŸ‡¦Ukraine Matroskeen πŸ‡ΊπŸ‡¦ Ukraine, Lutsk

    Re #29, Based on some work I have done in Tus issues queue ( #3313657: Use file uploader service from Drupal core β†’ ), I can't find anything specific. I see I added some todos, that are already planned to be addressed in Drupal core.

    Speaking of specific issues, I do remember that it was somehow difficult to overcome the file size validation - by default, it's looking into environment limitations, but it should not be the case when small chunks of the file are uploaded. Anyway, it seems to be out of the issue scope.

  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    It's not stated in the issue summary, but I assume the plan for CKEditor is the same for JSONAPI/Rest? If so, can we update the issue summary there too.

    CKEditor5 is more like graphql in that it uses an UploadedFile object, rather than direct uploaded files via php://input. (Still thinking of the correct name for this!)

    However, CKEditor5 should switch to FileUploadHandler.

  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    We could do these tasks first before trying to swap out the whole file uploading for REST and JSON API at the same time:

  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia

    \Drupal\ckeditor5\Controller\CKEditor5ImageController uses UploadedImage from the request, so it just needs to make use of FileUploadHandler. πŸ“Œ Make CKEditor5ImageController reuse FileUploadHandler Active

  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia
  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia
  • πŸ‡§πŸ‡ͺBelgium wim leers Ghent πŸ‡§πŸ‡ͺπŸ‡ͺπŸ‡Ί

    πŸ“Œ [PP-4] Unify file upload logic of REST and JSON:API Postponed is done! 🀯

    I believe that means we can close this?!

  • πŸ‡«πŸ‡·France andypost

    I bet there's a few issues left mostly about upload progress (apcu, vs native)

  • πŸ‡¦πŸ‡ΊAustralia kim.pepper πŸ„β€β™‚οΈπŸ‡¦πŸ‡ΊSydney, Australia
Production build 0.71.5 2024