Downloading submission csv and .zip file archive incomplete.

Created on 23 June 2023, over 2 years ago

Problem/Motivation

Our site has a webform with hundreds of submissions. Each submission attaches an image file. We attempted to download all submissions as a csv and all attached image files in a compressed .zip archive, but with inconsistent results. What I noticed is only the first batch of files are downloaded with the csv. If I check the /tmp directory, there is another .zip file (with the same name as the one I downloaded) that contains the second batch of submission files but no csv.

Steps to reproduce

This happens on both the actual site, and my dev version with Drupal version 9.5.9 on Ubuntu 20.04 VM.
To reproduce create a webform with several hundred submissions. Each submission has an image file field and an uploaded image file. These image files range from a few hundred KB to 3 or 4 MB. The entire downloaded archive is 500 MB (when it works).

Proposed resolution

I am not familiar with the code, but from a high level review of WebformResultsExportController.php it appears a BatchProcess instantiates a submission_exporter, writes submission records to the csv file, and adds associated submission files to the zip file. There can be multiple BatchProcesses. A BatchFinish is called when the entire batch finishes and instantiates a submission_exporter to write the completed csv file to the zip archive.

The problem: Each instance of the submission_exporter class instantiates a ZipArchive class and opens the archive file. However the BatchProcess never closes the archive to finalize the write before it is opened again by the next BatchProcess or BatchFinish. Closing the file after each batch is necessary to save the changes prior to the next BatchProcess opening the file. In my case, closing the archive after each BatchProcess solved the above problem. The archive is completed in full with each download, and there are no leftover .zip files left in the /tmp directory.

Remaining tasks

User interface changes

API changes

Data model changes

🐛 Bug report
Status

Active

Version

6.1

Component

Code

Created by

🇨🇦Canada drupalthings

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • 🇺🇸United States jrockowitz Brooklyn, NY

    Are you using a load balancer?

  • Status changed to Postponed: needs info over 2 years ago
  • Status changed to Active over 2 years ago
  • 🇨🇦Canada drupalthings

    No, only a single windows 10 dev machine running an Ubuntu 20.04 install. Drupal and mysql installed on the ubuntu vm with the public and private filesystem located on the Ubuntu file system. No load balancing.

  • How can we set up the precise situation to reproduce this?

  • 🇨🇦Canada drupalthings

    Good Point. If you accept the premise that the archive needs to be closed before it is opened again, you could put in some print statements to the logfile each time the archive is created and opened and a file is added. You would need a webform with at least a single file upload field that accepts image files (probably any large file would do). You could get away with a smaller batch size if you changed the minimum size of the batch to less than 100 (I think it might be hardcoded as such). It's interesting to note in the documentation and comments on zipArchive at https://www.php.net/manual/en/ziparchive.close.php#93322.

    Pay attention, that ZipArchive::addFile() only opens file descriptors and does not compress it. And only ZipArchive::close() compress file and it takes quite a lot of time. Be careful with timeouts.

    This little program creates some temporary 2MB files and uploads them in two batches of 10 files each. On my system at least, only the last 10 get added if I comment out the close. If I close the file after each batch all 20 get added:

    <?php
    class BatchArchive
    {
        private $zip;
        private $archiveName;
    
        public function __construct($archiveName)
        {
            $this->archiveName = $archiveName;
            $this->zip = new ZipArchive();
    
            $flags = (file_exists($this->archiveName)) ? NULL : ZipArchive::CREATE | ZipArchive::OVERWRITE;
            echo "Opening archive {$this->archiveName} with flags: {$flags}\n";
            if ($this->zip->open($this->archiveName, $flags) !== true) {
                die("Failed to create or open archive: $this->archiveName");
            }
        }
    
        public function addBatchToArchive($files)
        {
            foreach ($files as $file) {
                $localName = basename($file);
                $this->zip->addFile($file, $localName);
                echo "Added file to archive: $localName\n";
            }
        }
    
        public function closeArchive()
        {
            //$this->zip->close();
            //echo "Archive closed: $this->archiveName\n";
        }
    }
    
    $numberOfFiles = 20;
    $filesPerBatch = $numberOfFiles / 2;
    
    // Create temporary files
    $tempFiles = [];
    for ($i = 1; $i <= $numberOfFiles; $i++) {
        $tempFileName = tempnam(sys_get_temp_dir(), 'tempfile' . $i);
        $fileSize = 2 * 1024 * 1024; // 2MB
        $randomData = openssl_random_pseudo_bytes($fileSize);
        file_put_contents($tempFileName, $randomData);
        $tempFiles[] = $tempFileName;
        echo "Created temporary file: $tempFileName\n";
    }
    
    // Divide files into batches
    $batches = array_chunk($tempFiles, $filesPerBatch);
    
    // Create batch objects and add batches to archive
    $archiveName = 'archive.zip';
    if (file_exists($archiveName)) {
        unlink($archiveName);
        echo "Deleted existing archive: $archiveName\n";
    }
    
    foreach ($batches as $batch) {
        echo "Adding batch to archive\n";
        $batchArchive = new BatchArchive($archiveName);
        $batchArchive->addBatchToArchive($batch);
        $batchArchive->closeArchive();
    }
    ?>

    And the output with the close archive commented out is:

    dpostle-tech@drupal-develop:~/projects/temp$ php test9.php
    Created temporary file: /tmp/tempfile1jeztvO
    Created temporary file: /tmp/tempfile2OfQlwM
    Created temporary file: /tmp/tempfile3raKBLO
    Created temporary file: /tmp/tempfile42DgOhM
    Created temporary file: /tmp/tempfile5phKybP
    Created temporary file: /tmp/tempfile6qlSfsP
    Created temporary file: /tmp/tempfile7mzDsHM
    Created temporary file: /tmp/tempfile8jTrHKN
    Created temporary file: /tmp/tempfile9P5qLGM
    Created temporary file: /tmp/tempfile10H0hH2O
    Created temporary file: /tmp/tempfile11VKTCdP
    Created temporary file: /tmp/tempfile12CnPzUL
    Created temporary file: /tmp/tempfile13Uz0dxN
    Created temporary file: /tmp/tempfile14BEo6SM
    Created temporary file: /tmp/tempfile15wyjHFM
    Created temporary file: /tmp/tempfile16poB9cP
    Created temporary file: /tmp/tempfile17epq1yM
    Created temporary file: /tmp/tempfile180irQ0O
    Created temporary file: /tmp/tempfile19yr5BnO
    Created temporary file: /tmp/tempfile20SK5v3M
    Deleted existing archive: archive.zip
    Adding batch to archive
    Opening archive archive.zip with flags: 9
    Added file to archive: tempfile1jeztvO
    Added file to archive: tempfile2OfQlwM
    Added file to archive: tempfile3raKBLO
    Added file to archive: tempfile42DgOhM
    Added file to archive: tempfile5phKybP
    Added file to archive: tempfile6qlSfsP
    Added file to archive: tempfile7mzDsHM
    Added file to archive: tempfile8jTrHKN
    Added file to archive: tempfile9P5qLGM
    Added file to archive: tempfile10H0hH2O
    Adding batch to archive
    Opening archive archive.zip with flags: 9
    Added file to archive: tempfile11VKTCdP
    Added file to archive: tempfile12CnPzUL
    Added file to archive: tempfile13Uz0dxN
    Added file to archive: tempfile14BEo6SM
    Added file to archive: tempfile15wyjHFM
    Added file to archive: tempfile16poB9cP
    Added file to archive: tempfile17epq1yM
    Added file to archive: tempfile180irQ0O
    Added file to archive: tempfile19yr5BnO
    Added file to archive: tempfile20SK5v3M
    
    dpostle-tech@drupal-develop:~/projects/temp$ unzip -l archive.zip
    Archive:  archive.zip
      Length      Date    Time    Name
    ---------  ---------- -----   ----
      2097152  2023-06-27 13:58   tempfile11VKTCdP
      2097152  2023-06-27 13:58   tempfile12CnPzUL
      2097152  2023-06-27 13:58   tempfile13Uz0dxN
      2097152  2023-06-27 13:58   tempfile14BEo6SM
      2097152  2023-06-27 13:58   tempfile15wyjHFM
      2097152  2023-06-27 13:58   tempfile16poB9cP
      2097152  2023-06-27 13:58   tempfile17epq1yM
      2097152  2023-06-27 13:58   tempfile180irQ0O
      2097152  2023-06-27 13:58   tempfile19yr5BnO
      2097152  2023-06-27 13:58   tempfile20SK5v3M
    ---------                     -------
     20971520                     10 files
    dpostle-tech@drupal-develop:~/projects/temp$

    The key point here is even though the first batch was added to the file, it didn't exist and so was created again when adding the second batch

  • 🇨🇦Canada drupalthings

    As for a precise situation for reproducing, I was able to reproduce using a simple test form with one field for uploading files.

    1. I used the test functionality of the form to create 12 forms,
    2. I then created a temporary file of 2MB and copied it to the private/default/webforms/test_downloads submission directories for each uploaded file,
    3. I changed "Batch Export Size" to 10 under Configuration->Advanced webform settings (the min size is not limited to 100, I misread that),
    4. made sure the tmp directory was clear of any partial archives,
    5. Used drush to clear the cache and downloaded the results with "Download uploaded files" checked.

    There appears to be a timing aspect related to the size of the uploaded files -- I couldn't get it to happen with the tiny uploaded files created by the test form tab without replacing the uploaded files with the larger 2MB versions.

  • 🇨🇦Canada hargurpreet Kitchener

    I have also experienced the same issue while exporting the both submissions and uploaded files. To fix it, I have created this patch which works fine for me. Thanks!

  • 🇺🇸United States jrockowitz Brooklyn, NY
  • 🇺🇸United States fizcs3 Omaha, Nebraska; USA

    We did also have an very odd issue downloading a zip archive whereas it would skip submissions. It didn't happen for all webforms, and really can't characterize it any better than it involved choosing:
    * Export Format: PDF documents
    * Download uploaded files: checked/yes

    All I can say is applied the small patch in #8 and it fixed it.
    Am confirming the patch applies on Drupal 10.2.6 with webform 6.2.2.
    Thank you @hargurpreet

  • Status changed to Needs review over 1 year ago
  • 🇺🇸United States jrockowitz Brooklyn, NY

    I am unsure if the patch is getting to the root cause. It seems that the archive is being closed and needs to be reopened when zipping large files.

    @see https://stackoverflow.com/questions/16121885/php-zip-archive-memory-ram-...

    I would be more comfortable with a patch that checks if the archive is closed and then reopens it. The current patch is reopening the archive with every file.

  • 🇧🇷Brazil gfbarbosa

    #8 patch works for me as well as using the query param "filename" provided after the download redirect which downloads the complete file correctly.

    I also agree with @jrockowitz in #11

  • First commit to issue fork.
  • Pipeline finished with Failed
    2 months ago
    Total: 1833s
    #575692
  • 🇺🇸United States damienmckenna NH, USA

    How about one of these approaches? (created as patches as it's easier to review them individually before going through the effort to create a MR)

    Note: I haven't tested them yet, I need to have a meeting for a bit but will follow up afterwards.

  • 🇺🇸United States damienmckenna NH, USA

    FYI there's a related issue where the number of files is different to the number of records: Provide indication if file count is different to expected Active

  • 🇨🇦Canada Liam Morland Ontario, CA 🇨🇦

    Please re-roll for 6.3.x.

  • 🇨🇦Canada Liam Morland Ontario, CA 🇨🇦
  • 🇮🇳India divyansh.gupta Jaipur

    Working on the reroll

  • 🇮🇳India divyansh.gupta Jaipur

    divyansh.gupta changed the visibility of the branch 3369136-batch-download to hidden.

  • 🇮🇳India divyansh.gupta Jaipur

    Created a new MR targeting 6.3.x.
    Please review MR 719

  • Pipeline finished with Failed
    about 2 months ago
    Total: 623s
    #585000
  • Pipeline finished with Failed
    about 2 months ago
    Total: 271s
    #593624
  • Pipeline finished with Success
    about 2 months ago
    #593647
  • Pipeline finished with Success
    about 2 months ago
    Total: 214s
    #593650
Production build 0.71.5 2024