429 Too Many Requests

Created on 17 April 2024, 9 months ago

Problem/Motivation

When importing even a small sheet (50 rows), I am hitting the Google API rate limit. The migration errors out reporting a "429 Too Many Requests" response on HTTP requests. On a non-paying account, the Google Sheets limit is 300 requests per minute, but even when importing 50 items, I hit the limit almost immediately.

I discovered that when running a migration, the Google Sheet is requested once per row. Hitting the limit even with 50 items, my guess is that 300 req/minute is an average. When making ~15 requests per second, the request limit is hit.

Steps to reproduce

- Fresh D10 install
- Import a sheet into a simple node type like Basic Page with more than 50 rows
- The import has to be fast in order to hit the limit quickly. On my local it creates 35 nodes in about 3 seconds before it hits the rate limit.

Proposed resolution

Two things would help:

1) Import more than a single row per batch operation.
2) Cache the HTTP responses from Google, since the entire sheet is returned, it can be reused within a complete batch process.

πŸ› Bug report
Status

Active

Version

2.1

Component

Code

Created by

πŸ‡ΊπŸ‡ΈUnited States quicksketch

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @quicksketch
  • πŸ‡ΊπŸ‡ΈUnited States quicksketch
  • πŸ‡ΊπŸ‡ΈUnited States quicksketch

    On further investigation, this might be an issue with Migrate Tools module. When running through the API, only a single item is imported per batch request. But running through Drush seems to circumvent the problem. Note when running through Drush, When using Drush, only 2 HTTP requests are made total.

  • πŸ‡ΊπŸ‡ΈUnited States quicksketch

    Having now spent a couple of weeks regularly running into this error of "Too many requests" from Google's API, and having failed to request a rate limit increase from Google themselves, I took a hand at resolving this issue by extending this module's plugin class. Having put it together, I think solving the problem here through a configuration option would be the best solution.

    I would suggest that a YAML configuration option be added like so:

      # Specify a cache lifetime in seconds. This will persist a local cache of
      # remote spreadsheet API request responses. This can be used to lower
      # the number of API requests to fit within rate limits and avoid a
      # "429 Too Many Requests" response from Google. Omit or set to 0 to
      # disable the cache.
      cache_lifetime: 10
    
  • πŸ‡ΊπŸ‡ΈUnited States karlshea Minneapolis πŸ‡ΊπŸ‡Έ

    Do you have the cache lifetime implemented in your plugin? If you open a MR I can take a look, I've seen the same myself when hitting through the UI but so far no issues just running with Drush.

  • πŸ‡ΊπŸ‡ΈUnited States quicksketch

    I've seen the same myself when hitting through the UI but so far no issues just running with Drush.

    Yeah this issue crops up very easily in the UI. For drush, you need to have a lot of migrations (20+) running via Drush for it to hit the limit, but we ran into that a fair amount also.

    Sorry this fell off my radar since we got it working for our project, but we may be revisiting this soon.

  • πŸ‡ΊπŸ‡ΈUnited States karlshea Minneapolis πŸ‡ΊπŸ‡Έ

    Sure! The project I'm using this on has 17 migrations so I'm not hitting rate limits, but I can take a look at whatever you've got!

  • πŸ‡ΊπŸ‡ΈUnited States karlshea Minneapolis πŸ‡ΊπŸ‡Έ
  • πŸ‡ΊπŸ‡ΈUnited States karlshea Minneapolis πŸ‡ΊπŸ‡Έ

    In the MR I changed getSourceData() to more closely match what the parent Json::getSourceData() is doing, but it looks like executing it through the UI is where the issues occur.

    When executing with Drush I'm only seeing two HTTP requests with or without these changes.

  • πŸ‡ΊπŸ‡ΈUnited States karlshea Minneapolis πŸ‡ΊπŸ‡Έ

    Adding two related issues about caching, in #2826938: Integrate HTTP fetcher with Guzzle Cache if available β†’ heddn suggests using the guzzle_cache project, but that would rely on cache headers and it looks like the Sheets API is returning Cache-Control: max-age=0

  • πŸ‡ΊπŸ‡ΈUnited States karlshea Minneapolis πŸ‡ΊπŸ‡Έ

    Caching is opt-in, cache_lifetime: n needs to be added to the source configuration.

Production build 0.71.5 2024