- Issue created by @Nikhil_110
- Status changed to Needs review
8 months ago 5:13pm 13 November 2023 - 🇺🇸United States greggles Denver, Colorado, USA
There's a patch here, so it needs a review - adjusting status.
- 🇺🇸United States brooke_heaton
This ticket could likely also use an update hook to purge the data on any previusly deleted user accounts.
- Merge request !11Add hook_user_delete and update hook to remove delted users from login_history. → (Open) created by brooke_heaton
- 🇺🇸United States brooke_heaton
Added update hook to remove data on deleted users form login_history.
- 🇺🇸United States greggles Denver, Colorado, USA
Thanks for that contirbution! Does the original patch contribution by Nikhil_110 look right to you?
I have some concerns about scalability of the solution. The first query on 112 would load all the records for all uids, even if there are multiple records (right?). And then I believe doing an entity load of the user object to test if the user exists can be pretty resource intense on both the database queries and the RAM of the data.
What do you think about making the first query a left join from login_history to the users table that looks for cases where users.uid is null? That query could also be made distinct to limit the trips through the loop deleting records. Then there's no need to test if the user load succeeds.
- 🇺🇸United States brooke_heaton
Sure, I can revise this to make this a left join instead. Let me see if I can rig that up. Might take a bit.
- 🇺🇸United States brooke_heaton
Ok, I pushed up a new approach that should vastly improve the speed and efficiency of this. I'm doing a leftJoin between login_history and the user table for any NULL values in the user table then I am able to fetch the array keys into one array and run a delete query with the set of deleted user IDs.
/** * Deletes records of deleted user from login_history. */ function login_history_update_8004() { $connection = Database::getConnection(); $query = $connection->select('login_history', 'lh'); $query->condition('lh.uid', 0, '<>')->addField('lh', 'uid'); $query->leftJoin('users', 'u', 'lh.uid = u.uid'); $query->isNull('u.uid'); $results = $query->distinct()->execute()->fetchAllAssoc('uid'); $results = array_keys($results); $count = count($results); if ($results) { $message = "Removing $count deleted user id from login_history table."; \Drupal::logger('login_history')->notice($message); $query = $connection->delete('login_history') ->condition('uid', $results, 'IN') ->execute(); } $message = "Removal complete."; \Drupal::logger('login_history')->notice($message); }
Feedback welcome on the PR.
- 🇺🇸United States greggles Denver, Colorado, USA
Sorry for not responding sooner on the question in comment #7, but it looks like you got it all sorted. Thanks!
That new select query looks much more performant and scalable to me compared to the original query and user load approach.
I believe the IN condition has a practical limitation and may fail on a site that has many thousands of uids to delete. I tried a site with 150 users and it worked fine, but tens of thousands or hundreds of thousands would probably fail. That would be a pretty big site to have that many DELETED users.
I think a more scalable approach would be to do these as single deletes and (in my experience) the performance of running many delete queries with a single = condition this is not much slower than running one big IN query. I've pushed a commit with that change.
I also think the indentation in login_history_user_delete was a little incorrect so I tried to fix that, but I'm also using a freshly setup IDE so I might have made a mistake there.
Can you give that a review, @brooke_heaton?
- 🇺🇸United States brooke_heaton
This all looks good to me @greggles. Appreciate the second look and fix on those indentations. Might have been me editing on Gitlab via the interface there. This is good to go from my end unless you think it needs more testing.