Abstract token usage

Created on 6 April 2025, about 1 month ago

Problem/Motivation

Currently we only save the raw dump of Metadata, but the actual input and output tokens is of interest as normalized comparable data.

Steps to reproduce

Proposed resolution

Add methods for providers to store
* total token usage
* input token usage
* output token usage
* reasoning token usage
* cached token usage

All should be optional.
Best place is ChatOuput object.

Add to logging.

Remaining tasks

User interface changes

API changes

Data model changes

Feature request
Status

Active

Version

1.1

Component

AI Core module

Created by

🇩🇪Germany marcus_johansson

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @marcus_johansson
  • 🇧🇪Belgium aspilicious

    I'm investigating this at this moment.
    Did you start coding?

  • 🇧🇪Belgium aspilicious

    Here is a starting point, allows us to discuss if this is the direction you want.

  • First commit to issue fork.
  • Pipeline finished with Failed
    24 days ago
    Total: 344s
    #474026
  • 🇬🇧United Kingdom MrDaleSmith

    Added as a fgork to allow tests to run, and you have some test fails so this will need further work.

  • Pipeline finished with Failed
    24 days ago
    Total: 591s
    #474083
  • Pipeline finished with Failed
    24 days ago
    Total: 262s
    #474096
  • Pipeline finished with Failed
    24 days ago
    Total: 242s
    #474104
  • Pipeline finished with Success
    24 days ago
    Total: 235s
    #474113
  • 🇧🇪Belgium aspilicious

    I learned a lot about contributing 2.0.

    The token functions are only available on chat level at this moment.
    If it's needed on other output classes we probably should move these to a trait.

  • 🇮🇳India vakulrai

    Just to add to the above , can we also think of a helper method to Add tracking for retry token usage and retry reasons in AI responses.

    My thought is :
    we are tracking the mentioned properties but it does not explicitly track retries — which can silently increase token usage and costs when outputs are malformed or invalid (e.g., bad JSON, failed function/tool calls, hallucinated responses, timeouts, etc.).

    These retries consume additional tokens and can skew both performance and cost reporting if left untracked.

    While total input and output tokens might include retries, but they dont tell:

    • How many times a retry occurred
    • Why each retry happened
    • Which prompt caused it

    Can we do it as a feature in AI and tke it ofrard in a seperate ticket if this really can be a good add on.
    Open for suggestions

    Thanks !

  • 🇧🇪Belgium aspilicious

    This merge request, together with https://www.drupal.org/project/ai_provider_openai/issues/3519302 Abstract token usage support Active
    Allowed me to create this module: https://www.drupal.org/project/ai_usage_limits

  • 🇩🇪Germany marcus_johansson

    @vakulrai - we have this already in review that keeps track of parent and child unique id, maybe we should create a retry id or something similar as well, so you can track when multiple request are re-done because of validation errors. This means truly errors that fails writing the response as you like, not that its in an agent loop where the output quality is bad and a validation agent asks it to retry (this fails under normal parent/child hierarchy).

    Edit: I should link also :) https://www.drupal.org/project/ai/issues/3515879 📌 Add thread id and parent id to AI calls. Active

  • 🇩🇪Germany marcus_johansson

    Thanks @aspilicious - added one small comment, could you have a look and fix and then we will merge it.

  • 🇩🇪Germany marcus_johansson

    Set the tag to ddd2025, so we can track user contributations that happened during Leuven :)

  • Pipeline finished with Success
    21 days ago
    Total: 275s
    #476883
  • 🇩🇪Germany marcus_johansson

    Hi @aspilicious, based on your comment I added some comments - its probably good if we allow null to be returned to be able to check for the difference between 0 being an actual value and it not being set at all by the provider.

  • Pipeline finished with Failed
    4 days ago
    Total: 212s
    #489261
  • Pipeline finished with Failed
    4 days ago
    Total: 216s
    #489266
  • Pipeline finished with Failed
    4 days ago
    Total: 446s
    #489269
  • 🇧🇪Belgium aspilicious

    Tests are failing but I don't think this patch is causing it.

Production build 0.71.5 2024