- Issue created by @pacproduct
- Open on Drupal.org โCore: 10.1.x + Environment: PHP 8.1 & MySQL 8last update
over 1 year ago Not currently mergeable. - @pacproduct opened merge request.
- last update
over 1 year ago 22 pass, 2 fail - @pacproduct opened merge request.
- Merge request !44Improve performances when using the --sync option. โ (Merged) created by pacproduct
- last update
over 1 year ago 22 pass, 2 fail - ๐ซ๐ทFrance pacproduct
I'm not sure what makes tests fail. Seems like the new database table is not found? When I install the module locally, it does get created correctly though.
What am I missing? Any help for fixing the CI would be appreciated :)
- last update
over 1 year ago 36 pass - Status changed to Needs review
over 1 year ago 2:38pm 30 July 2023 - ๐ซ๐ทFrance pacproduct
Yay! Tests pass :)
Switching status to "needs review". - ๐ซ๐ทFrance pacproduct
Just added a commit.
I noticed that the
migrate_tools_migrate_prepare_row
hook is running during the PRE_IMPORT phase, and also during the IMPORT phase. That was duplicating all SyncID entries although we need them during the PRE_IMPORT phase. So I added a little something so IDs get recorded during the PRE_IMPORT phase only. Then, they stay available in database during the actual migration phase just in case another module would need them, and they get cleaned during the POST_IMPORT phase.Also added some missing comments.
- last update
over 1 year ago 36 pass - last update
over 1 year ago 36 pass - last update
over 1 year ago 36 pass - last update
over 1 year ago 36 pass - ๐ซ๐ทFrance denix
dear all,
we are following closely this patch because when working with large datasets the operation is incredibly expensive in terms of performances. We are running tests on our side and we can only appreciate better performances and no drawbacks for the time been. - ๐บ๐ธUnited States Jorge Navarro
I tested it in a large migration and indeed, migration time become a fraction, looking forward to see it merged!
- Status changed to RTBC
over 1 year ago 8:22am 6 September 2023 - ๐ซ๐ทFrance simon georges Rouen, France
Considering several people have reported it working without drawbacks, let's try to move it forward ;-)
- ๐ซ๐ทFrance MaxPah
Just for information, with this patch (who is working fine) there is a log on reports/status page :
READ-COMMITTED
For this to work correctly, all tables must have a primary key. The following table(s) do not have a primary key: migrate_tools_sync_source_ids. See the setting MySQL transaction isolation level page for more information.I think it's just this line missing in hook_schema declaration:
'primary key' => ['migration_id'],
- ๐ซ๐ทFrance pacproduct
@MaxPah I cannot reproduce your issue. Also,
migration_id
cannot be used as primary key as this column doesn't contain unique entries.We could add an arbitrary serial 'id' as unique identifier and as primary key. Although it won't be of any use, I guess it's good practice anyway.
I'll rework that shortly.
- last update
over 1 year ago 36 pass - Status changed to Needs review
over 1 year ago 2:09pm 2 October 2023 - ๐ซ๐ทFrance pacproduct
Because branch `6.0.x` recently changed, I had to rebase my MR and resolve conflicts. However, now the MR itself doesn't apply anymore to version `6.0.2` of this module.
So here is an attached patch specfically for version `6.0.2`, so people can keep applying it easily if needed.
- ๐จ๐ฆCanada emixaam
Successfully tested Merge request 44 with migrate_tools 6.0.4 and migrate_plus 6.0.2, on Drupal 10.2.3. On a local environment, I tested removing 10000 items from the source of a custom migration and executing ยดdrush mim --syncยด. The items were correctly removed, and the time to complete the sync import was much faster than without the patch.
Was there more thoughts given into the discussion #mr44-note197623 about using MigrateTools as a service vs moving to a dedicated service? I think it would make sense but I'm new on the issue.
- ๐ซ๐ทFrance pacproduct
@emixaam I'm not sure what the best approach would be.
If the issue is that the service name is weird, I guess we could:
- Rename the class and its corresponding service. That could impact some code elsewhere in the project, I did not check.
- As the services currently declared by this module don't always match the name of their corresponding classes, another approach would be to rename the service only but not the class itself (e.g. "migrate_tools.helper").If the issue is that the source code is not well-enough organized, that would require a bit more rework but maybe we could:
-Move all these functions into a separate service, but I'm not sure how to call it: these are utility helper functions for tracking source IDs during the first phase of a migration, insuring the module knows which entries to delete. But there's already a service with a name suggesting it handles that sync task (among other things): "migrate_tools.migration_sync" (Drupal\migrate_tools\EventSubscriber\MigrationImportSync
).
- Therefore, another consideration would be to completely remove classDrupal\migrate_tools\MigrateTools
(which initially contained a small static functionbuildIdList
) and put everything into the existing service "migrate_tools.migration_sync"I feel like the decision is up to the maintainer on what the best approach for their module is at this point, but what do you guys think?
- ๐ง๐ชBelgium BramDriesen Belgium ๐ง๐ช
Uploading a static patch from the MR since the patch of #18 No longer applies to the latest release (6.0.4)
- ๐บ๐ธUnited States sassafrass
Using: drupal/migrate_tools": "6.0.x-dev@dev"
Cannot apply patch https://www.drupal.org/files/issues/2024-03-11/3378047-21.patch โ
Cannot apply patch https://www.drupal.org/files/issues/2024-03-11/3378047-21.patch โ State service uses a cache collector for performance since Drupal 10.3 https://www.drupal.org/node/3177901 โ
Though on existing sites this feature needs to be enabled explicitly through the settings flag
$settings['state_cache'] = TRUE;
, this is the default behavior for new sites (in default.settings.php), and in Drupal 11+ the setting is removed and state cache is permanently turned on.There are 2 queries presented in the change record to check current state usage and then decide to opt-in/out state cache depending on results.
Sites with more than 100 entries and/or a total value size/length of 100,000 should review their specific entries...
Size/length can easily exceed 100,000 on thousands migration items. Setting the issue priority to Major...
- ๐ซ๐ฎFinland vermario
Agree with the previous poster, this will become a problem in Drupal11 for sure. Sounds like it would be great to push this change forward.
- ๐ฌ๐งUnited Kingdom scott_euser
Thanks very much for this! We have tested this out quite thoroughly as we were similarly hitting an issue syncing roughly 40,000 entities from a CRM twice per hour and the sync was hanging. We have been running this for several days without issue as we figured its fairly safe since we could always remove the patch and re-sync the content the ~20 minute way.
We also did notice the issue of multiple migrate --sync's also having this problem:
Also as a bonus, I'm pretty sure the current system using states doesn't allow to run safely several migrations in parallel, so I've added the migration_id to the dedicated table so each migration has its own separated set of tracked IDs.
Which is sorted by this MR.
The only thing I can suggest further is a subsequent follow-up where we could add a cron job just to check if there are any closed migrate IDs in the new table that still have data in the table itself. Essentially the code without this patch was causing our database to pile up each time in the key-value whenever it timed out, with orphan key-value data since the clean-up phase was never reached. While this is far more performant, its I suppose still possible some failure leaves orphans in the table if the clean-up step in this MR is never reached (ie,
::cleanSyncData()
). But definitely a follow-up thing IMO. - ๐ฌ๐งUnited Kingdom scott_euser
Cleaned up patches so its clear what we tested (ie, the visible MR)
- ๐บ๐ธUnited States hestenet Portland, OR ๐บ๐ธ
This would be helpful for Drupal.org's site migrations and the time period that we have Modern Drupal and D7 running in parallel.
- ๐ช๐ธSpain fjgarlin
For context, our user migration is +2 million records, so holding that number of IDs in memory is not an option.
- First commit to issue fork.
-
heddn โ
committed 5cc29289 on 6.0.x authored by
pacproduct โ
Issue #3378047 by pacproduct, scott_euser, maxpah, heddn: drush...
-
heddn โ
committed 5cc29289 on 6.0.x authored by
pacproduct โ
Automatically closed - issue fixed for 2 weeks with no activity.
- Status changed to Fixed
27 days ago 4:24pm 24 December 2024 - ๐บ๐ธUnited States loopy1492
I just pulled our database from prod which is running migrate_tools version 6.0.4, then ran updates locally from version 6.0.4 to version 6.0.5 (according to composer.lock). `drush updb` is throwing:
> [notice] Update started: migrate_tools_update_10000
> [error] Table 'migrate_tools_sync_source_ids' already exists.
> [error] Update failed: migrate_tools_update_10000The only thing I could think of is that it was set to -dev recently, but that is not the case.
- ๐ฌ๐งUnited Kingdom scott_euser
You could have had the patch installed ages ago too, it existed over a year ago I believe.
Anyways probably you could open a follow-up issue to add a check if table not exists into the update hook.
- ๐บ๐ธUnited States loopy1492
I was able to sync the database from one environment to the other upstream and get a successful updb. I'm not entirely sure what the deal is with syncing down to local, but I'm guessing it's something with my provider or my computer. Or possibly disparate MySQL versions, though that's a pie in the sky guess. Thanks for your response.
- ๐ฌ๐งUnited Kingdom mjpa
I came across this issue after upgrading to 10.4 and looking into the state cache flag.
Our state values are all relatively small, except
migrate_tools_sync
that's pretty large.We've updated the module and ran updates etc which has created the new table - all fine there.
However, should the old state entry still exist? Should it not be removed as part of the update hook?
I can't find any code leftover that would remove it so it looks like it'll persist forever unless we delete it ourselves.
- heddn Nicaragua
Can you open a follow-up to remove the old state entry? We should get rid of it.
- ๐ฌ๐งUnited Kingdom mjpa
Yep, no problem. I've created ๐ Remove the old migrate_tools_sync state entry Active