Webform submission import: failed to import a lot of records executed in multiple batches

Created on 16 January 2024, over 1 year ago

Problem/Motivation

I cannot import a large number of records that are executed in multiple batches.

If the number of records to be imported exceeds 100, drush wfi command splits the task into several batches. When handling 2nd or subsequent batches, the process should skip over the records that have already been imported. However, WebformSubmissionExportImportImporter skips by line rather than by CSV record.

while ($index < $offset && !feof($handle)) {
  // => We should use fgetcsv() here.
  fgets($handle);
  $index++;
}

The problem and cause is similar to this:
https://www.drupal.org/project/webform/issues/3386895 šŸ› Upload count is incorrect in getTotal method of WebformSubmissionExportImportImporter Fixed

Steps to reproduce

  1. Install Webform and Webform Submission Export/Import
  2. Create a webform
  3. Insert ~200 records to the form
  4. Export the submission data by drush wfx --exporter=webform_submission_export_import
  5. Import the exported files

Proposed resolution

Skip processed records by CSV record as https://www.drupal.org/project/webform/issues/3386895 šŸ› Upload count is incorrect in getTotal method of WebformSubmissionExportImportImporter Fixed .

Remaining tasks

Write tests

User interface changes

N/A

API changes

N/A

Data model changes

N/A

šŸ› Bug report
Status

Active

Version

6.2

Component

Code

Created by

šŸ‡ÆšŸ‡µJapan kensuke-imamura

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @kensuke-imamura
  • Nice! I came across the same issue. I used a similar patch and it solved my problem. The only difference is when checking for an empty line, I copied the logic from L422. I'm not sure if there are situations where this will actually make a difference, but I'll upload it anyway.

  • Corrected the patch format (missing the first line) in the previous 2 patches.

  • Status changed to Needs review over 1 year ago
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.4 + Environment: PHP 8.2 & MySQL 8
    last update over 1 year ago
    Patch Failed to Apply
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.4 + Environment: PHP 8.2 & MySQL 8
    last update over 1 year ago
    536 pass
  • Open in Jenkins → Open on Drupal.org →
    Core: 10.1.4 + Environment: PHP 8.2 & MySQL 8
    last update over 1 year ago
    Patch Failed to Apply
  • šŸ‡ŗšŸ‡øUnited States jrockowitz Brooklyn, NY

    Does the CVS that is being imported have empty lines?

    Can you upload and example CSV that is having this issue?

  • Status changed to Postponed: needs info 7 months ago
  • šŸ‡ŗšŸ‡øUnited States jrockowitz Brooklyn, NY
  • šŸ‡§šŸ‡ŖBelgium kubrick

    Patch #3 solved an issue I was having with big CSV's.

  • šŸ‡ŗšŸ‡øUnited States jwineichen

    Patch #3 also worked for me. I was importing a CSV with 1900 lines. Without the patch, the completion message said it created a little under 300 records and then updated 1600 or something, even though I added sid and uuid columns so they should have all been recognized as unique. With the patch, the the upload completed successfully and I've got 1900 submissions now.

  • Status changed to Needs review 26 days ago
  • šŸ‡«šŸ‡·France pacproduct

    I think ths issue is not specific to big CSVs but to big CSVs that contain entries with line breaks, as they get interpreted as new lines with fgets where they should not.

    I was facing the following errors in my case because of multiline webform elements:

    ...
    >  [warning] Line n°101: 28 values expected and only 3 found.
    >  [warning] Line n°102: 28 values expected and only 5 found.
    ...
    

    Using fgetcsv as suggested seems to be the way to go, and I can confirm that the patch in #3 does solve the issue as it counts lines in a similar way to \Drupal\webform_submission_export_import\WebformSubmissionExportImportImporter::getTotal, although the latter uses if (!empty($line) && !is_null(array_pop($line))) { instead of if (!empty($line) && $line !== ['']) {, I'm not sure which approach is the best.

    Thanks! :)

  • šŸ‡ØšŸ‡¦Canada Liam Morland Ontario, CA šŸ‡ØšŸ‡¦
  • First commit to issue fork.
  • Pipeline finished with Success
    16 days ago
    Total: 930s
    #523197
  • šŸ‡ÆšŸ‡µJapan kensuke-imamura

    Patch #3 and the merge request do not seem to correctly recognize empty lines.
    According to https://www.php.net/manual/en/function.fgetcsv.php, fgetcsv() returns an array containing only NULL ([NULL]) for empty lines.
    In this code, it's strictly comparing with [''], so it will always return false.

    $fp = fopen('php://memory', 'rw+');
    fwrite($fp, <<<CSV
    a,b,c
    
    1,2,3
    CSV);
    rewind($fp);
    
    while (($line = fgetcsv($fp))) {
      if (!empty($line) && $line !== ['']) {
        // This is always output.
        echo "not empty\n";
      } else {
        echo "empty\n";
      }
    }
    

    The first patch I submitted does not have this issue. The empty line detection method in that code is aligned with the existing condition in modules/webform_submission_export_import/src/WebformSubmissionExportImportImporter.php.

  • Pipeline finished with Success
    8 days ago
    Total: 530s
    #529350
Production build 0.71.5 2024