- π«π·France andypost
I did re-roll of π Move memory management from MigrateExecutable to an event subscriber Needs review
But is it still a blocker?
- π·πΊRussia qzmenko Novosibirsk
This is still a problem, but in our case for nodes migration.
We need to migrate ~2 million nodes. At the beginning of the migration, ~10 nodes per second are imported. After 50k imported nodes, the speed becomes ~2 nodes per second.
I tried changing the batch_size in the migration, but it did not affect the migration speed at first glance.
- πͺπΈSpain fjgarlin
I'm affected by this as well. In this case user's migration. Memory keeps creeping up (around 2 million users).
I've tried different options and no luck. The last thing I am trying came from this article, where it tries to play with the limit option in a loop for the migration as seen in the script suggested.
This is currently running so I don't know the result of it. It's still not ideal, because when using the
--limit
option, it stills tries to do some gathering of the previous runs.For example, if I run
drush migrate:import my_user_migration --limit 100
, the output the first time would beMigration my_user_migration [100 inserted, 0 updated...]
But then, on the second run,
drush migrate:import my_user_migration --limit 100
, the output would beMigration my_user_migration [0 inserted, 0 updated...] Migration my_user_migration [100 inserted, 0 updated...]
Note the
0 inserted, 0 updated
.--
I even tried with a postSave event subscriber where I'd crear some caches but it would still not make a difference. This is what I tried:
// '@config.factory', '@entity.memory_cache', '@entity_type.manager' $this->memoryCache->deleteAll(); $this->configFactory->clearStaticCache(); // Entity storage can blow up with caches so clear them out. foreach ($this->entityTypeManager->getDefinitions() as $id => $definition) { $this->entityTypeManager->getStorage($id)->resetCache(); }
- π¨πSwitzerland berdir Switzerland
There might be some other module that keeps things in memory, due to post processing.
resetCache() is a persistent cache clear, so it's fairly expensive and adds costs on its own. It will not add anything useful on top @entity.memory_cache->resetAll() which you do as well.
However, that can only clear the usage of those objects within the entity storage, if anything else holds on to these objects, they will remain in memory. Pretty impossible to say what it would be in your case, probably would require some kind of profiling with xhprof or blackfire or something like that. If it is specific to users, you could try to look for user presave/insert/update hook implementations.
- heddn Nicaragua
For a 2M user migration, I stripped down the user source plugin so it only pulls back uids. Then I moved the actual gathering of data into a prepareRow. It had an amazing effect on the speed and memory usage of the user migration. By default the user source does what is essentially a
select * from users
. What you want is something more likeseelct uid from users
. - πͺπΈSpain fjgarlin
@heddn - this is the migration and plugin that I am using:
- Migration: https://git.drupalcode.org/project/drupalorg_migrate/-/blob/1.0.x/migrat...
- User plugin: https://git.drupalcode.org/project/drupalorg_migrate/-/blob/1.0.x/src/Pl...So, your suggestion would be to override the
User::query
method to:public function query() { return $this->select('users', 'u') ->fields('u', ['uid']) ->condition('u.uid', 0, '>'); }
Then in the
prepareRow
, do you do:
- Aselect * from users where uid=$uid
- And then several$row->setSourceProperty
for each property?I am going to try the above locally but wanted to also ask about the approach to make sure I understood you correctly.
- πͺπΈSpain fjgarlin
For what is worth, I am not seeing any significant increase in speed after doing the above.
Before the change it was around 1100 records per minute
After the change it seems to be around 1150 records per minuteBut this difference might just be the output number of the migration or me just looking a second late/early.
The code I did:
public function query() { // Query by UID earlier to speed up queries. return $this->select('users', 'u') ->fields('u', ['uid']) ->condition('u.uid', 0, '>'); } public function prepareRow(Row $row) { // Try to determine early if this row needs to be skipped. $prepare_row = SourcePluginBase::prepareRow($row); if ($prepare_row) { $uid = $row->getSourceProperty('uid'); // Set all properties here as we only queried by UID earlier. $row_data = $this->select('users', 'u') ->fields('u') ->condition('u.uid', $uid) ->execute() ->fetchAssoc(); foreach ($row_data as $field => $value) { $row->setSourceProperty($field, $value); } return parent::prepareRow($row); } return FALSE; }
- heddn Nicaragua
Speed should be about the same, especially in the beginning of the migration. But by the time you get to the 1M row mark, you're memory usage should be in a better place. That's where this alternative approach (which you outlined well) really starts to shine.
- πͺπΈSpain fjgarlin
Great. Thanks for the info.
I went ahead and committed the above here https://git.drupalcode.org/project/drupalorg_migrate/-/commit/044bdebd94... and I will trigger again the full migration and monitor things.