Tiktoken does not support every model

Created on 8 August 2024, 5 months ago
Updated 8 September 2024, 4 months ago

Problem/Motivation

Currently Tiktoken (https://github.com/yethee/tiktoken-php) doesn't supoport every model and will not be able to do. This leads to errors when someone chooses this model for tokenization count.

Since this would only show the providers/models available depending on setup, this could be one model from Ollama for instance that TikToken doesn't recognize. A temporary solution is that it chooses 3.5 by default if it throws an error here.

Steps to reproduce

Proposed resolution

Somes solutions I can think of:
1. Research and take whatever model that produces the most amount of tokens per text mass and set as default.
2. Count on words with 5-10% padding so it doesn't become to large.
3. Just show TikTokens models in the dropdown list and the user chooses what is closest to the actual value.

Remaining tasks

User interface changes

API changes

Data model changes

🐛 Bug report
Status

Fixed

Version

1.0

Component

AI Search

Created by

🇩🇪Germany marcus_johansson

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Comments & Activities

Production build 0.71.5 2024