Add some kind of quota management system

Created on 1 February 2025, 3 months ago

In the world of agencies it's not uncommon that the web agency manages the API keys of the projects of their clients.
The clients pay for a fixed fee and they have a limited amount of API requests for that specific key.

OpenAI, gemini, ... all have a system of spending limits, but they don't have a system to restrict limits per API key.
This makes it hard for an agency to manage abusers.

It would be really helpfull if you could restrict the number of API calls for a specific API key. API calls and billing are not 100% correlated but it's enough to monitor and limit the usage within a project.

On top of that, some kind of overview with the number of API calls/month would be needed as well.
We could provide some resources to help building this system but it would be nice if we could have input and guidance from the maintainers and the community.

Feature request
Status

Active

Version

1.1

Component

AI Core module

Created by

🇧🇪Belgium aspilicious

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

  • Issue created by @aspilicious
  • 🇩🇪Germany marcus_johansson

    So the PostGenerateResponseEvent ans PostStreamingResponseEvent should kind of cover this, if you check https://project.pages.drupalcode.org/ai/developers/events/#example-2-pos... and https://project.pages.drupalcode.org/ai/developers/events/#example-3-str.... Meaning this could even be an external module. Both expose metadata, where most of the time token usage resides, when available.

    There would however be some problems I can see right away with this:

    • API Keys are not on purpose available there. You would also need to listen to PreGenerateResponseEvent and use requestThreadId to try to connect it with the PostGenerateResponseEvent. The PreGenerateResponseEvent have authentication mechanisms available.
    • API Keys is not something generic that exists on all providers - also the PreGenerateResponseEvent is made for any logic where you might want to "load balance" requests to different endpoints, api keys etc. - so a solution would need a narrow scope.
    • This also touches on metadata - we have not normalized this at the moment. For 2.0.0 release something we want to look into is normalize normal configuration like temperature and normalized normal metadata like input and output tokens.

    But a limited solution that works for instance with OpenAI, Gemini, Anthropic, Fireworks and some of the "easy-to-setup" services would be possible to create for sure.

  • 🇧🇪Belgium aspilicious

    @marcus thank you, this answer is really helpfull. As we probably can't fix this for all providers, we probably will create a seperate module.

    How would you block the requests when the limits are reached?

  • 🇬🇧United Kingdom MrDaleSmith

    I believe the easiest way would be to create a custom exception and throw it, allowing other code to react to the event if it needs to.

  • 🇩🇪Germany marcus_johansson

    @aspilicious - as Paul writes, an exception will work to stop the call, you can see how its done in the AI External Moderation here: https://git.drupalcode.org/project/ai/-/blob/1.0.x/modules/ai_external_m...

  • 🇩🇪Germany marcus_johansson

    First step should be coming in 1.1.x, see Abstract token usage Active .

    In think the full implementation that stips something needs to go into an external module. But this would at least provide a framework for it.

Production build 0.71.5 2024