large site, migrate import is very slow

Created on 19 September 2024, 4 months ago

Problem/Motivation

I have a site with maybe 40 000 pages and 20 000 files, and importing slows at 74 % to about 1 node a minute ("x of 80 tasks done"). I think the full import will take about 28 days, at that rate.

Steps to reproduce

Set up empty D10 (or D11) and database, run Migrate UI, wait. At 70 % (with book module on D10 enabled) or at 74 % (without book module) the migrate import slows to a crawl. That's at about 5000 imported nodes.

Proposed resolution

I have drupal logs, if that helps? There's nothing in the apache2 error log.

I absolutely love this module, and will dedicate a laptop to this import for 28 days if needed, but could it be made faster?

Error message in the localhost drupal log say
"Deprecated function: mb_convert_encoding(): Passing null to parameter #1 ($string) of type array|string is deprecated in Drupal\Component\Transliteration\PhpTransliteration->transliterate() (line 142 of /var/www/drupal/core/lib/Drupal/Component/Transliteration/PhpTransliteration.php) "

"Deprecated function: str_contains(): Passing null to parameter #1 ($haystack) of type string is deprecated in Drupal\Component\Transliteration\PhpTransliteration->transliterate() (line 135 of /var/www/drupal/core/lib/Drupal/Component/Transliteration/PhpTransliteration.php) "

"Warning: Undefined array key "text_processing" in Drupal\text\Plugin\migrate\field\d7\TextField->getFieldType() (line 75 of /var/www/drupal/core/modules/text/src/Plugin/migrate/field/d7/TextField.php) "

Remaining tasks

User interface changes

API changes

Data model changes

🐛 Bug report
Status

Active

Version

2.0

Component

Miscellaneous

Created by

🇫🇮Finland hbaa

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @hbaa
  • 🇪🇸Spain Carlos Romero

    Hi hbaa,

    First of all, thank you for your interest in the migrate wizard module.

    The module simply generates the yml files for the migrations that will later be used by the drupal (core) migrate module.

    If your portal contains so many nodes, it will inevitably take a long time to migrate.

    Tips to make it take less time:

    Use a machine with more resources, I often program and test the migrations in my local environment of the projects I work on, but then when launching the migrations in the pre-production environments, they run faster because I have machines with more resources.

    File migration with migrate wizard has some peculiarities:
    the files are copied one by one on the fly when migrating the nodes, you do not need to migrate the files or media first, although if you do, this will be scalable and each entity will be reused when appropriate so that downloaded files or entities pointing to the same file are not repeated.
    In any case, to speed up this part, I recommend placing the files in the destination folder, so that in the migration process, each file will not be downloaded and the file or media entities will simply be created when appropriate.
    This is done this way because in some cases you may not want to migrate all the files that exist in the source; when migrating the nodes, only the files that are used in the migrated content will be copied.
    That is, if you have 20,000 files in the source but you only want to migrate those that are being used in a type of content, do not migrate the files, only migrate the type of content you want and only the files of that type of content will be copied, for example.

    Regarding the errors you mention, I would need more data from the trace, to see which part of my module it comes from, although from what I see in the error, from afar, it may be due to your version of php, or the data you are trying to import.

    In any case, please investigate the error and if you see that it is due to the module, open a specific issue with it.

    I'm closing the issue, best regards and good luck with your migration.

  • If you are using DDEV with "docker desktop" on macos, go to settings>resources and set the cpu limit to the maximum. You might want to up some of the other resources too.

    There is also a replacement for docker desktop called OrbStack, that "should" run it faster, although I haven't tried it yet.

Production build 0.71.5 2024