- Issue created by @byrond
- 🇺🇸United States byrond
It looks like it's making a separate API call for every media entity that needs to be migrated? Could we just patch the migration process to skip queueing the items and then run a full sync after the migration completes? It's my understanding that the new connector requires this sync to make sure the version_id and external_id are available: https://git.drupalcode.org/project/media_acquiadam/-/blob/2.x/src/Batch/...
- 🇺🇸United States byrond
The "Bad Gateway" response is happening while saving the entity, so skipping queueing won't likely change anything.
> [error] TypeError: Drupal\acquia_dam\Client\AcquiaDamClient::getAsset(): Argument #2 ($version_id) must be of type string, null given, called in /mnt/www/html/dhsinternetode10/docroot/modules/contrib/acquia_dam/src/Plugin/media/Source/Asset.php on line 234 in Drupal\acquia_dam\Client\AcquiaDamClient->getAsset() (line 170 of /mnt/www/html/dhsinternetode10/docroot/modules/contrib/acquia_dam/src/Client/AcquiaDamClient.php) #0 /mnt/www/html/dhsinternetode10/docroot/modules/contrib/acquia_dam/src/Plugin/media/Source/Asset.php(234): Drupal\acquia_dam\Client\AcquiaDamClient->getAsset() > #1 /mnt/www/html/dhsinternetode10/docroot/core/modules/media/src/Entity/Media.php(438): Drupal\acquia_dam\Plugin\media\Source\Asset->getMetadata() > #2 /mnt/www/html/dhsinternetode10/docroot/core/modules/media/src/MediaStorage.php(27): Drupal\media\Entity\Media->prepareSave() > #3 /mnt/www/html/dhsinternetode10/docroot/core/lib/Drupal/Core/Entity/EntityBase.php(354): Drupal\media\MediaStorage->save() > #4 /mnt/www/html/dhsinternetode10/docroot/modules/contrib/media_acquiadam/src/Batch/MediaTypeProcessBatch.php(475): Drupal\Core\Entity\EntityBase->save() > #5 /mnt/www/html/dhsinternetode10/docroot/modules/contrib/media_acquiadam/src/Batch/MediaTypeProcessBatch.php(74): Drupal\media_acquiadam\Batch\MediaTypeProcessBatch::updateMediaItems()
- 🇺🇸United States byrond
Initially, we were migrating some types using the "embed" method. When we changed them to "sync" (the only method available and used by the old module), we stopped getting the Bad Gateway errors.
We did start seeing these errors related to the purger:
> [error] Drupal\purge\Plugin\Purge\Invalidation\Exception\TypeUnsupportedException exception during file invalidation. File id: 232251 . File url: public://acquia_dam_thumbnails/0968372d-a84a-45a5-9a90-c6d3f5cfe4da/1a3ac635-a96f-48a6-87e3-ef0c6faab7ad.png. Error: wildcardurl
It's possible this is due to our testing on an ODE (development environment) where not all of the files exist.
- 🇺🇸United States byrond
I suspect the version_id is being returned as null from getFinalizedVersion() because the API call is failing due to the Bad Gateway response.
From the method:
"Cannot get the version list from the API for asset of ID %asset_id. The error is: %message"Adding the patch you mentioned may allow the processing to continue, but I don't think it would really be successful.
The big question here is: Why is a separate API call being made for every item that needs to be migrated, and can it be done more efficiently to allow customers with large numbers of assets to complete a migration successfully?
- 🇮🇳India vipin.mittal18 Greater Noida
Hello Byron,
While Media Acquia DAM does not retain asset versions and external IDs, Acquia DAM does. As a result, a cron job has been scheduled to call the API, retrieve the version ID and external ID, and store them in the table.Could you please apply the solution suggested by Chandan and confirm if there are any errors preventing you from continuing with the migration?
- 🇺🇸United States byrond
Thanks. I have applied the update, refreshed the database from Prod, and started the migration. I was getting errors from Purge, so I disabled it during testing and just assume we can clear all caches manually after the process completes.
So far, I haven't seen any errors. However, my SSH session timed out after 4 hours during the remote Drush command. The command is still running on the remote environment, as I see items being queued in the log. So far, it has queued about 8500 out of 27,500 items after 13 hours. I'm not sure if Acquia will kill that process after a certain amount of time, but at that rate, it will take over 39 hours just to queue the items for migration. I suspect it will take days to finish processing the queue.
Is there any way to optimize this? The module documentation recommends against using both Media: Acquia DAM and Acquia DAM at the same time. However, it will be inevitable with this a migration this slow that they will both are active for a long period of time. What recommendations do you have for managing that? Disable the sync process for one or both modules? Anything else? What did you find in your pre-release testing with a large (but I would think fairly common) number of assets?
- 🇺🇸United States byrond
Our migration failed last night after 29.5 hours with a "MySQL server has gone away" error (presumably because the connection finally timed out after that long). It queued 17,039 items (more than 10,000 short of the total).
Has anyone reported a successful migration of a similar number of assets? I noticed that the dev version we are currently using for testing was just released as "stable" while this issue is open. This makes it seem like the migration tool is ready for production use. Are you sure that is the case?
- First commit to issue fork.
- 🇮🇳India rajeshreeputra Pune
Major Performance Improvements
- Chunked Processing: Instead of loading all media items at once, the code now processes them in smaller batches of 100 items.
- Proper Batch API Usage: The batch operations are now structured to handle each chunk separately, preventing memory exhaustion and timeouts.
- Improved Progress Tracking: Added detailed progress tracking using Drupal's state API, making the process more robust and providing better user feedback.
- Optimized Resource Usage: Service instances like time and current user are now retrieved once at the beginning of each batch operation rather than for each media item.
Implementation Details
- Added new batch operation methods (initializeMediaBatch, batchUpdateMediaItems, finalizeMediaBatch) to handle different phases of the migration process.
- Implemented paged database queries to fetch only the media items needed for each batch operation.
- Added proper progress tracking that persists between batch operations.
- Maintained all the existing functionality while making the process much more efficient.
Expected Results
With these changes, the migration should:- Complete without timing out.
- Use significantly less memory.
- Provide better progress information to users.
- Likely reduce the total processing time.
- Merge request !44Improve migration batch for performance optimization. → (Open) created by rajeshreeputra
- 🇺🇸United States byrond
Thank you! We are going to start testing this today and will report back.
- 🇺🇸United States byrond
We updated to the latest version and applied the patch from the MR in #14. We can confirm that items are processed in batches of 100, and progress is reported accordingly.
However, the processing rate of the items appears to be the same. We started a migration yesterday morning, and this morning it is still running after only processing 12,900 items in about 21 hours. At that rate, the migration would finish after running 44 hours (assuming the platform doesn't kill the process first).
We also noticed some of the following error in the log. Before the update referenced by #3, the migration process would have stopped. It is still running after encountering the error, but it may not have processed those items correctly.
Server error: `GET https://api.widencollective.com/v2/assets/d725e93b-8331-4615-8fd5-4032bcbf2e8f?expand=asset_properties,embeds,file_properties,metadata,metadata_info,metadata_vocabulary,security,thumbnails` resulted in a `500 Internal Server Error` response: { "error": true, "response_code": 500, "error_message": "An internal server error occurred.", "stack_trace": nul (truncated...)
- 🇺🇸United States byrond
We tried with the latest commit in the MR. Attempting to start queueing for migration immediately resulted in:
> [notice] No queue items to process. > [warning] Undefined variable $migrated_data MediaTypeProcessBatch.php:661 > [notice] Message: > [notice] Message: Acquia DAM migration completed successfully.
- 🇮🇳India rajeshreeputra Pune
Updated MR with database operations instead of media entity resave, resulting in reduced CPU overhead and improved memory efficiency. This change has led to faster execution times, which should significantly enhance the migration process.
Please review the changes and share your feedback.
- 🇺🇸United States byrond
We did some initial testing of the latest updates here, and it seems to be vastly improved. It took minutes instead of hours/days to complete the migration running
drush acquiadam:migrate
.We will do some further testing and code review early next week and report back here.
- 🇺🇸United States byrond
After testing, it seems like the migration is successful. We noticed that images aren't displaying with the correct image style, but that could be a separate issue.
I'm also curious how asset_id, version_id, external_id are handled with this new migration method. Will a full sync need to be run manually after the migration to update them? I initially read somewhere that these are required for the new connector but can't seem to find that in the documentation now. Is it still true?
- 🇮🇳India rajeshreeputra Pune
Hello @byrond,
You are correct —
version_id
andexternal_id
are not required during the migration process. After migration, the cron in the Acquia DAM → module should handle updatingversion_id
andexternal_id
regularly and automatically.Thank you for bringing this up!