Webform submission import: failed to import a lot of records executed in multiple batches

Created on 16 January 2024, over 1 year ago

Problem/Motivation

I cannot import a large number of records that are executed in multiple batches.

If the number of records to be imported exceeds 100, drush wfi command splits the task into several batches. When handling 2nd or subsequent batches, the process should skip over the records that have already been imported. However, WebformSubmissionExportImportImporter skips by line rather than by CSV record.

while ($index < $offset && !feof($handle)) {
  // => We should use fgetcsv() here.
  fgets($handle);
  $index++;
}

The problem and cause is similar to this:
https://www.drupal.org/project/webform/issues/3386895 πŸ› Upload count is incorrect in getTotal method of WebformSubmissionExportImportImporter Fixed

Steps to reproduce

  1. Install Webform and Webform Submission Export/Import
  2. Create a webform
  3. Insert ~200 records to the form
  4. Export the submission data by drush wfx --exporter=webform_submission_export_import
  5. Import the exported files

Proposed resolution

Skip processed records by CSV record as https://www.drupal.org/project/webform/issues/3386895 πŸ› Upload count is incorrect in getTotal method of WebformSubmissionExportImportImporter Fixed .

Remaining tasks

Write tests

User interface changes

N/A

API changes

N/A

Data model changes

N/A

πŸ› Bug report
Status

Active

Version

6.2

Component

Code

Created by

πŸ‡―πŸ‡΅Japan kensuke-imamura

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @kensuke-imamura
  • Nice! I came across the same issue. I used a similar patch and it solved my problem. The only difference is when checking for an empty line, I copied the logic from L422. I'm not sure if there are situations where this will actually make a difference, but I'll upload it anyway.

  • Corrected the patch format (missing the first line) in the previous 2 patches.

  • Status changed to Needs review over 1 year ago
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 10.1.4 + Environment: PHP 8.2 & MySQL 8
    last update over 1 year ago
    Patch Failed to Apply
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 10.1.4 + Environment: PHP 8.2 & MySQL 8
    last update over 1 year ago
    536 pass
  • Open in Jenkins β†’ Open on Drupal.org β†’
    Core: 10.1.4 + Environment: PHP 8.2 & MySQL 8
    last update over 1 year ago
    Patch Failed to Apply
  • πŸ‡ΊπŸ‡ΈUnited States jrockowitz Brooklyn, NY

    Does the CVS that is being imported have empty lines?

    Can you upload and example CSV that is having this issue?

  • Status changed to Postponed: needs info 6 months ago
  • πŸ‡ΊπŸ‡ΈUnited States jrockowitz Brooklyn, NY
  • πŸ‡§πŸ‡ͺBelgium kubrick

    Patch #3 solved an issue I was having with big CSV's.

  • πŸ‡ΊπŸ‡ΈUnited States jwineichen

    Patch #3 also worked for me. I was importing a CSV with 1900 lines. Without the patch, the completion message said it created a little under 300 records and then updated 1600 or something, even though I added sid and uuid columns so they should have all been recognized as unique. With the patch, the the upload completed successfully and I've got 1900 submissions now.

  • Status changed to Needs review 1 day ago
  • πŸ‡«πŸ‡·France pacproduct

    I think ths issue is not specific to big CSVs but to big CSVs that contain entries with line breaks, as they get interpreted as new lines with fgets where they should not.

    I was facing the following errors in my case because of multiline webform elements:

    ...
    >  [warning] Line nΒ°101: 28 values expected and only 3 found.
    >  [warning] Line nΒ°102: 28 values expected and only 5 found.
    ...
    

    Using fgetcsv as suggested seems to be the way to go, and I can confirm that the patch in #3 does solve the issue as it counts lines in a similar way to \Drupal\webform_submission_export_import\WebformSubmissionExportImportImporter::getTotal, although the latter uses if (!empty($line) && !is_null(array_pop($line))) { instead of if (!empty($line) && $line !== ['']) {, I'm not sure which approach is the best.

    Thanks! :)

Production build 0.71.5 2024