Timeout on large sites nodes during install

Created on 25 October 2023, over 1 year ago

Problem/Motivation

On large sites for example with 16k content nodes I am having an error due to trying to populate the URL alias table all in one php process, I hit a message about 60 second timeout during install, or on cloud environments deployment logs such as acquia a "killed" is shown in the logs.

Steps to reproduce

Have a large site, install the module.

Proposed resolution

Use the batch API to chunk out all the work of populating the views url alias table.

Remaining tasks

User interface changes

API changes

Data model changes

πŸ› Bug report
Status

Active

Version

2.0

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States NicholasS

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @NicholasS
  • πŸ‡ΊπŸ‡ΈUnited States NicholasS

    So I tried batch and it works over the UI but does not populate during a drush cim install of the module.

  • @nicholass opened merge request.
  • πŸ‡ΊπŸ‡ΈUnited States NicholasS

    So tested this locally and MR 16 is just the Queue API change, but I had another issue open and need that as well so that is MR 17 since I can't have 2 patches pathing the same lines.

    Please review MR 16

  • Status changed to Needs review over 1 year ago
  • πŸ‡ΊπŸ‡ΈUnited States NicholasS
  • πŸ‡ΊπŸ‡ΈUnited States NicholasS
  • πŸ‡ΊπŸ‡ΈUnited States NicholasS

    I have done a lot of testing on my site and I think MR17 should be reviewed it fixes multiple things with this module and works as intended.

  • First commit to issue fork.
  • πŸ‡¨πŸ‡¦Canada joel_osc

    Great patch everyone! Necessary on my site in order to use this module. I noticed that some nodes could not be found by alias, in looking at it I found the queueing code was storing paths that it had already done and checking that before queueing. They key the code used did not have the langcode of the path so I was only getting each node in one of two languages. Small fix committed above.

  • πŸ‡¨πŸ‡¦Canada dstorozhuk Chicago πŸ‡ΊπŸ‡Έ, Toronto πŸ‡¨πŸ‡¦, Kyiv πŸ‡ΊπŸ‡¦
  • πŸ‡¨πŸ‡¦Canada dstorozhuk Chicago πŸ‡ΊπŸ‡Έ, Toronto πŸ‡¨πŸ‡¦, Kyiv πŸ‡ΊπŸ‡¦

    Queue options might not work for people who has cron disabled for some reason.
    I think the right option here is Batch operation for views_url_alias_rebuild_path(). But also module installation should use batch somehow.

  • πŸ‡§πŸ‡ͺBelgium michaelsoetaert

    I've rerolled the changes from MR#17 on branch 8.x-2.x-with-issue-3396154 on the latest version of the 3.x branch.

    We needed multilingual support (different URL aliases for different translations), which branch 3.x provides, but we were also getting timeouts because of the size of the website (where the changes in this issue come into play).

  • πŸ‡§πŸ‡ͺBelgium michaelsoetaert

    Sadly, the patch in comment #13 πŸ› Timeout on large sites nodes during install Needs review still resulted in timeouts on our higher environments (due to different PHP values). The issue seemed to be the large amount of data being loaded in views_url_alias_rebuild_path, since it's retrieving the complete Entity-object of each path alias.

    I decided to try the approach @dstorozhuk suggested (using the Batch API). Only loading the path alias IDs in views_url_alias_rebuild_path, splitting the list in chunks and only loading the Entity-objects of the given path alias IDs in each batch operation. That seems to have fixed the timeouts.

    Attached patch with the described functionality.

  • πŸ‡¬πŸ‡§United Kingdom steven jones

    I'm evaluating this module for a feature that I need to implement for a site, and so this might be a bit of a 'drive-by' contribution if I decide to not really use it, but I would like to commend the approach in #14 to use the Batch API.

    Can I suggest that the approach this module has of maintaining a separate index table of data feels a lot like the node access API (or maybe Search API) and so can I suggest that you should be inspired by those systems. In particular I'd recommend switching from even trying to do the work on install of the module and instead make a robust batch API or queue process that does the indexing needed. Then on install pop a message up informing users that there's something else they need to do (unless there are no aliases in the DB) and then also a hook_requirements message that also informs administrators.

    Also the patch in #14 doesn't apply to the latest 3.1.0 as far as I could tell, so this probably Needs Work.

  • Pipeline finished with Success
    2 months ago
    Total: 142s
    #403772
  • πŸ‡¬πŸ‡§United Kingdom steven jones

    I've applied the patch from #14 to a 3.x-dev version and I'll open a MR shortly with those changes.

    From my testing if you enable the module through the web UI you get a nice progressbar and batch process for building the index.
    If you install via Drush then you don't get a progressbar, but it does work, and is using the batch to build up the table, nice! No timeouts etc.

    I suppose that for environments that would timeout that Drush command, it's still not a great experience tbh. maybe we should move to setting some kind of 'rebuild' flag on module install, and then detect that and clear it when rebuilding etc.

  • Pipeline finished with Success
    2 months ago
    Total: 154s
    #403798
  • Pipeline finished with Success
    2 months ago
    Total: 152s
    #403804
  • πŸ‡¬πŸ‡§United Kingdom steven jones

    Merge request !29 now contains a 'flag' version whereby it's super quick to install the module, but you have to a one-time process in the webUI to build up the index table.

    Does this need a Drush command maybe?

  • Pipeline finished with Success
    2 months ago
    Total: 145s
    #403852
  • πŸ‡¬πŸ‡§United Kingdom steven jones
  • πŸ‡¬πŸ‡§United Kingdom steven jones

    steven jones β†’ changed the visibility of the branch 8.x-2.x to hidden.

  • πŸ‡¬πŸ‡§United Kingdom steven jones

    steven jones β†’ changed the visibility of the branch 8.x-2.x-with-issue-3396154 to hidden.

  • Pipeline finished with Success
    2 months ago
    Total: 142s
    #403863
Production build 0.71.5 2024