- Issue created by @luke.leber
- First commit to issue fork.
- Merge request !6929Avoid loading block content revisions and get the uid directly instead. β (Open) created by catch
- Status changed to Needs review
10 months ago 9:37pm 5 March 2024 - Status changed to RTBC
10 months ago 9:43pm 5 March 2024 - π¦πΊAustralia acbramley
I originally had a direct query running (with the min expression) but it was weirdly buggy and thankfully caught by the test coverage I added due to swapping back to a lower uid. https://git.drupalcode.org/project/drupal/-/merge_requests/5500#note_234004
I'm not sure why I didn't think of simply using a sort + limit of 1 but this makes a lot more sense.
Given that tests are passing I see this as an easy win.
- πΊπΈUnited States luke.leber Pennsylvania
Strangely enough with the MR changes applied I get:
> [notice] Update started: block_content_update_10301 > [notice] Installed uid field for BlockContent entities. > [notice] Update completed: block_content_update_10301 > [notice] Update started: block_content_post_update_set_owner > [error] SQLSTATE[42000]: Syntax error or access violation: 1140 In aggregated query without GROUP BY, expression #1 of SELECT list contains nonaggregated column 'drupal.bcr.revision_user'; this is incompatible with sql_mode=only_full_group_by: SELECT "bcr"."revision_user" AS "uid", MIN(revision_id) AS "revision_id" > FROM > "block_content_revision" "bcr" > WHERE "id" = :db_condition_placeholder_0; Array > ( > [:db_condition_placeholder_0] => 21 > ) > > [error] Update failed: block_content_post_update_set_owner [error] Update aborted by: block_content_post_update_set_owner [error] Finished performing updates. real 0m11.731s user 0m6.789s sys 0m0.518s
Just using the off-the-shelf mysql configuration for Ubuntu 22.04; I've not ran this in a hosting provider environment.
- π¦πΊAustralia acbramley
@Luke.Leber the MR removes the MIN expression. Are you sure you're using the latest changes and it applied correctly?
- πΊπΈUnited States luke.leber Pennsylvania
I have some metrics. I don't know how meaningful they are, but...
The initial run that spurred me into making this issue:
real 28m32.144s user 9m47.665s sys 0m55.545s
Now that I've ran this a couple more times, I'm assuming that this long run can be attributed to Microsoft Defender running on the Linux environment that was running the update. (Don't ask me to explain why it's a good idea for Microsoft Defender to run on Linux environments -- ask the PSU security office...)
More closely monitored runs -- where MS Defender was not engaged:
Before the patch:
real 14m4.685s user 5m2.253s sys 0m11.007s
After the MR is applied:
real 16m12.257s user 6m41.996s sys 0m14.335s
I'm not sure that the optimization has a meaningful impact based on these numbers. Performance testing is pretty tough to nail down given the sample size and my ability / time to run upgrades against the environment.
- Status changed to Needs work
10 months ago 1:24am 6 March 2024 - πΊπΈUnited States luke.leber Pennsylvania
If (that's a big if) I have time tomorrow, I'll try to run this with xdebug profiling to help visualize what's taking the most time.
As somewhat of a side-note, I've found that the Entity API is not very well suited for crunching tons of data. I've found that using the database API for updates has been one of the most meaningful impacts on performance. Perhaps we can use the database API for setting the owner versus calling
$blockContent->setOwnerId('blah')->setSyncing(TRUE)->save()
to trim minutes off?Setting back to NW based on the (flaky?) real world test runs.
- π¦πΊAustralia acbramley
Testing on our project with 3.5k blocks (35k revisions)
Before:
> time drush updb -y .... > [notice] Processed Block Content Entities (3590/3590) > [notice] Update completed: block_content_post_update_set_owner [success] Finished performing updates. real 5m12.711s user 1m27.592s sys 0m6.600s
After:
> time drush updb -y .... > [notice] Update completed: block_content_post_update_set_owner [success] Finished performing updates. real 4m54.783s user 1m18.913s sys 0m5.651s
So yeah, very little improvement on that. What @Luke.Leber has said is correct - to get meaningful performance improvments with these types of updates you really have to bypass ->save() entirely.
On our project we've started heavily utilising direct db updates/inserts for content migrations. E.g we could do something like this:
$tableMapping = $blockContentStorage->getTableMapping(); foreach ($tableMapping->getAllFieldTableNames($fieldName) as $tableName) { $database->update($tableName) ->fields(['uid' => $uid]) ->condition('id', $blockContent->id()) ->execute(); }
With that we could go further and not even load the blocks to begin with. I'll push something up shortly to demonstrate.
- Status changed to Needs review
10 months ago 2:54am 6 March 2024 - π¦πΊAustralia acbramley
Testing with the latest changes:
> [notice] Processed Block Content Entities (3600/3590) [success] Finished performing updates. real 0m15.351s user 0m6.604s sys 0m0.914s
- Status changed to Needs work
10 months ago 3:24am 6 March 2024 - Status changed to Needs review
10 months ago 4:05am 6 March 2024 - π¦πΊAustralia acbramley
Bug was in the array_slice, should've been splice!
New timing, I assume the last was so fast because it was updating the same rows over and over.
real 1m5.031s user 0m7.018s sys 0m1.013s
- π¬π§United Kingdom catch
Direct db update makes sense here. Would be good to know if/how this affects the contrib mongodb driver. Kicked off sqlite and postgres tests. Ignore the 8.3 ones, we have 8.3-specific test failures.
- π¬π§United Kingdom longwave UK
For things like this I wonder if the database driver can hint whether we can do raw SQL updates, and we have both paths in the update hook, falling back to entity save in non-SQL cases?
- π¬π§United Kingdom catch
Could we just check if the entity storage is
SqlContentEntityStorage
?If it's not, then we'd also need to do the select queries with pure entity queries too (which should be doable with a small adjustment to https://git.drupalcode.org/project/drupal/-/merge_requests/6929/diffs?co...).
- π·π΄Romania amateescu
@acbramley, could you run those tests with the MR from π Return early in \Drupal\layout_builder\InlineBlockEntityOperations::handlePreSave if the entity is syncing Needs review instead of this one?
- π³π±Netherlands daffie
For things like this I wonder if the database driver can hint whether we can do raw SQL updates, and we have both paths in the update hook, falling back to entity save in non-SQL cases?
Sounds like a good idea to me. Raw queries are great for performance, only you need to know exactly how the entity data is stored in the database. Creating a fallback for non-sql databases is a great idea. It will be needed after we have a non-sql database driver like the one for MongoDB. It is not in core yet and a fallback will only needed for changes after there are sites running on MongoDB. For now, yust not needed. The database driver for MongoDB can mark the update test as skipped.
Could we just check if the entity storage is SqlContentEntityStorage?
Sorry, but no. The database driver for MongoDB uses SqlContentEntityStorage for its entity storage.
- Status changed to Needs work
10 months ago 10:12pm 7 March 2024 The Needs Review Queue Bot β tested this issue. It no longer applies to Drupal core. Therefore, this issue status is now "Needs work".
This does not mean that the patch necessarily needs to be re-rolled or the MR rebased. Read the Issue Summary, the issue tags and the latest discussion here to determine what needs to be done.
Consult the Drupal Contributor Guide β to find step-by-step guides for working with issues.
- π¦πΊAustralia acbramley
β¨ Add owner to the BlockContent entity type RTBC has been reverted so I guess we move these changes/discussions over there and close this off?
Re #18 - no there was basically no change in times after applying that patch.
- Status changed to Closed: outdated
10 months ago 12:02am 8 March 2024 - π¦πΊAustralia acbramley
Have applied the changes to https://git.drupalcode.org/project/drupal/-/merge_requests/5500