New Checks/NamePasswords does not scale with a large number of user accounts

Created on 28 January 2023, almost 2 years ago
Updated 26 April 2023, over 1 year ago

Problem/Motivation

The new check for users with a password matching their user name that was added in 2.0.1 ( ✨ test password strength by comparing the password to the username Fixed ) is consistently failing when trying to run it via the Drupal admin UI. The batch never completes, and probably due to some weirdness with Drupal's Batch API, no error gets logged to the database. If I had to guess, it's likely this that's causing the issue:

    // Fetch all users and test if their password is the same as their username.
    $user_ids = \Drupal::entityQuery('user')
      ->accessCheck()
      ->condition('uid', 0, '<>')
      ->execute();
    $users = User::loadMultiple($user_ids);

Why? The last line there loads literally every user entity on the site. On a small-ish site, this shouldn't be a problem, but we have roughly a thousand user accounts, and that number will only grow in the future, so even with Drupal's entity cache and other optimizations, it's very likely hitting resource limits for the PHP process.

Steps to reproduce

Run the above check via the Drupal admin UI on a site with a thousand or more user accounts.

Proposed resolution

I'm not that familiar with the codebase of this module, but from my experience with Drupal's Batch/Queue APIs, you should be able to load just a handful of user accounts per run, iterating over the total UIDs until you're done, rather than loading them all at once. A good example of how to do this can be found in the Warmer module β†’ .

Remaining tasks

Implement the above.

User interface changes

Maybe none? Maybe a field to specify how many accounts to load at once?

API changes

None?

Data model changes

None?

πŸ› Bug report
Status

Fixed

Version

2.0

Component

Code

Created by

πŸ‡¨πŸ‡¦Canada ambient.impact Toronto

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Not all content is available!

It's likely this issue predates Contrib.social: some issue and comment data are missing.

  • Issue created by @ambient.impact
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    We had to do something for the field check for a large amount of entries. Wonder if we could do something similar for this check too.

    In the mean time you can disable that specific check from the settings

  • πŸ‡¨πŸ‡¦Canada ambient.impact Toronto

    Already disabled the check, thanks. Might be a good opportunity to implement this in a reusable way so any check can iterate over all entities of a type, batched by a certain number at a time?

  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    If someone wants to do that for sure. But probably out of scope of the time I would have for such a feature. And each check is unique, though some similar, that 1 size fit all could be difficult.

    For this I think the path forward is

    - config setting to choose how many users to load at once.
    - then use array_chunk to break the array of NIDs up

    Just tested this with 5000 users locally and it worked fine.

    But still need the config setting, schema update, and update path.

    Can't say when that will be done but if you wanted to use a local patch.

    $user_chunks = array_chunk($user_ids, 100);
        foreach ($user_chunks as $chunk) {
          $users = User::loadMultiple($chunk);
    
          /** @var \Drupal\user\UserAuth $user_auth */
          $user_auth = \Drupal::service('user.auth');
    
          /** @var \Drupal\user\Entity\User $user */
          foreach ($users as $user) {
            if ($user_auth->authenticate($user->getDisplayName(), $user->getDisplayName())) {
              $findings[] = $user->getDisplayName();
            }
          }
    

    This is roughly what I'm thinking.

    Am aiming for another release in March at the latest and this will be included.

  • πŸ‡¨πŸ‡¦Canada ambient.impact Toronto

    @smustgrave That could work, sure. I'd have to test it out when I get a chance.

    Just tested this with 5000 users locally and it worked fine.

    I'm fairly sure my local dev laptop can also handle that, but unless you're running with a production PHP config (more strict time limits and memory limit), it isn't going to be an accurate test. With our roughly thousand users database, it completed fine locally so that's why I only took a closer look at the code after deploying to production.

  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Cool let me know! I just started the Meta for the next release and included this ticket.

  • πŸ‡ΊπŸ‡ΈUnited States smustgrave

    Plan on working on this one soon. Were you able to test that snippet?

  • πŸ‡¨πŸ‡¦Canada ambient.impact Toronto

    @smustgrave Sorry, haven't had the time. I'll let you know if/when I can, but feel free to go ahead in the meantime. At a glance, the snippet seems fine from a memory usage point of view since you're not loading all accounts at once, but might still run into PHP's time limit; some kind of batching like Drupal's Batch API and/or Queue API would solve that problem but that's likely a good deal of work and out of scope.

  • Status changed to Needs review over 1 year ago
  • πŸ‡ΊπŸ‡ΈUnited States ednark

    Tested on 10.1.x-dev with a standard install
    module drupal/security_review:2.0.x-dev@dev
    branch 3337293-new-checksnamepasswords-does

    This branch definitely has better performance and I can run it on my own local system without issues until I get to about 15k users, then timeout occur. Even though there are still timeout issues I think this is a great incremental step and better than the previous code.

    I suggest this branch gets merged and maybe we open another issue for something better in the future.

    Possible further solutions:
    #1 More Queue. Each test is already being processed as a queued item, I'm not sure how we would then create and run another dependent queue as part of this tests execution. If the original test could populate a new queue of users, and then use it's time to process that queue, we could go as slow as we want and scale forever.

    #2 Skip all the user loading that costs overhead by leaving authService out of the picture. We could then pull only name/password from db and then jump straight to the important part of the code where the PasswordInterface's check() function is called. We could just call that ourselves and hopefully skip a bunch of intermediary function calls that might incur sql queries of their own.

    #3 Skip drupal by pushing the logic all into sql. Since the heart of the password check is an md5 hash comparison (ignoring old style D7 passwords), we can check the existing stored hash against and the username as part of the query directly. This might theoretically be faster but would have a fragile assumption that the md5 in use by the databaser is comparable to the md5 used by the php crypt library.

  • πŸ‡ΊπŸ‡ΈUnited States smustgrave
  • πŸ‡ΊπŸ‡ΈUnited States smustgrave
  • Status changed to Fixed over 1 year ago
  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024