"Periodic update" mechanism takes too long and kills cron

Created on 16 February 2023, about 2 years ago

My site cron is failing while waiting for ldap_user's "periodic update" cron task to complete.
I don't have a high volume of users - about 1300 total.

I haven't looked into how this process works, but it needs to batch the users or have a configurable timeout or something.
This is causing major issues.

πŸ› Bug report
Status

Active

Version

4.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States AaronBauman Philadelphia

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @AaronBauman
  • πŸ‡ΊπŸ‡ΈUnited States AaronBauman Philadelphia

    Appears to be related to πŸ› More graceful handling in GroupUserUpdateProcessor on invalid configuration Fixed where a duplicate email address is confusing ldap module and causing a fatal error during cron, killing the whole process.

  • πŸ‡ΊπŸ‡ΈUnited States AaronBauman Philadelphia

    There is a "timeout" of 10 seconds set up on the LDAP query, but the query appears to run for 60+ seconds and doesn't time out.

  • I'm running into this issue as well, have about 10,000 users. Running on the latest dev branch that fixes the issue of stopping when email exists, but it will just randomly stop with no logged message every time, after only 5-10 minutes or so. Anyone have any ideas? Thanks

  • πŸ‡ΊπŸ‡ΈUnited States bluegeek9

    Which 'periodic update' feature are you using?

    There is an orphan processor to handle accounts provisioned from LDAP but are no longer in LDAP. "Periodic orphaned accounts update mechanism". This has a limit of the number of users to check, along with a

    There is also an update that requires an LDAP query. "Periodic user update mechanism". This one has an interval. What is your interval?

    You can also run the cron with drush and not have a timeout.

  • πŸ‡ΊπŸ‡ΈUnited States AaronBauman Philadelphia

    This is the periodic update feature I'm referring to from ldap_user_cron():

    function ldap_user_cron() {
      ...
      $ldapUpdateQuery = \Drupal::config('ldap_user.settings')->get('userUpdateCronQuery');
      if (
        \Drupal::moduleHandler()->moduleExists('ldap_query') &&
        $ldapUpdateQuery != NULL &&
        $ldapUpdateQuery !== 'none'
      ) {
        /** @var \Drupal\ldap_user\Processor\GroupUserUpdateProcessor $processor */
        $processor = \Drupal::service('ldap.group_user_update_processor');
        if ($processor->updateDue()) {
          $processor->runQuery($ldapUpdateQuery);
        }
      }
    }
    

    Specifically, $processor->runQuery($ldapUpdateQuery); issues an LDAP query which may not return within various lower-level time limits (between 59 - 120 seconds on Pantheon, for example), which means my users don't get updated.

    If I lower the time limit for the query, and the query fails to complete in time, then my users still don't get updated.

    Is there a way to paginate the LDAP queries, or maybe update the mechanism to look for recent changes only, or another way you can think of that would reduce the footprint here?

  • πŸ‡ΊπŸ‡ΈUnited States bluegeek9

    The
    Query.php does have pagination features. It is not as simple as providing an offset. It appears it serializes the query object.

  • πŸ‡ΊπŸ‡ΈUnited States bluegeek9

    It looks like pagination is implemented like this:

    use Symfony\Component\Ldap\Ldap;
    use Symfony\Component\Ldap\Adapter\ExtLdap\Adapter;
    use Symfony\Component\Ldap\Adapter\ExtLdap\Query;
    
    $config = [
        'host' => 'ldap.example.com',
        'port' => 389,
    ];
    
    $adapter = new Adapter($config);
    $ldap = new Ldap($adapter);
    
    $ldap->bind('uid=my_user,ou=users,dc=example,dc=com', 'my_password');
    
    $query = $ldap->query('ou=users,dc=example,dc=com', '(uid=my_user)');
    
    // Enable pagination with a page size of 50.
    $query->setOption(Query::OPT_PAGINATION_ENABLED, true);
    $query->setOption(Query::OPT_PAGINATION_SIZE, 50);
    
    $results = $query->execute();
    
    foreach ($results as $entry) {
        echo $entry->getAttribute('cn')[0];
    }
    
  • πŸ‡ΊπŸ‡ΈUnited States AaronBauman Philadelphia

    OK, we don't need to use pagination, we can take advantage of Drupal queue.

    Instead of iterating the entire LDAP query result set in \Drupal\ldap_user\Processor\GroupUserUpdateProcessor::runQuery, this method should create a queue entry for each item.

    Then the queue process can do the iteration without blocking cron, and will easily pick up wherever it leaves off when the batch takes too long.

    Should actually be relatively straightforward to untangle these.

  • Pipeline finished with Failed
    about 1 month ago
    Total: 169s
    #435239
  • πŸ‡ΊπŸ‡ΈUnited States AaronBauman Philadelphia

    I'll have to keep an eye on how it performs, but i *think* this will solve the issue for me. (MR105)

    Rather than the initial LDAP user query, it's the subsequent processing that was taking the most time.
    The system queue processor has a timer to prevent overrunning cron, so this should be a nice and tidy solution.

  • Pipeline finished with Failed
    about 1 month ago
    Total: 9892s
    #435245
Production build 0.71.5 2024