Discuss: Translation operator type

Created on 8 July 2024, 7 months ago

There are many generative AI translation services that takes a text, (optional) input language and output language.

Takes

string
TranslationInput

Outputs

raw data
TranslationOutput

TranslationInput

getString()
setString()
getSourceLanguage() takes ISO-3166
setSourceLanguage()
getTranslationLanguage()
setTranslationLanguage()
toString()

TranslationOutput

getRaw()
getNormalized() returns a text
getMetadata()

🌱 Plan
Status

Active

Version

1.0

Component

AI Core module

Created by

🇩🇪Germany marcus_johansson

Live updates comments and jobs are added and updated live.
Sign in to follow issues

Merge Requests

Comments & Activities

  • Issue created by @marcus_johansson
  • 🇱🇹Lithuania mindaugasd

    Unclear what is the goal.
    Can this be proper abstraction without detailed research about translation services?
    What and why is within scope of AI module?
    For example, I used translations in array a lot, translating 1000 sentences in a batch for example.

  • 🇱🇹Lithuania mindaugasd

    The scope is probably:

    • automator string in, string out
    • ✨ ai_translate submodule Needs work for translating entities. But entities can have many fields to translate.
  • 🇩🇪Germany marcus_johansson

    I think general usage is source, destination language and translation string for any abstracted version of it. Any other configs the translation API's have, they do not have in common.

    Its a little bit hard to define if the translation should be part of the AI module since it is a functionality that has historically been working without AI, but if we are talking about transformers and generative AI translation is the origin of why it exists from Google research for Google Translate. Deepl and Google Translate are much better tools due to transformers/AI.

    The other question is if anyone ever need to hotswap these and I don't actually know that - I can imagine that certain language combinations works better with certain services and being able to setup this is great. And for single text translation strings or simpler modules as the ai_translate module, I think it still can have its purpose.

    At the same time tmgmt already has plugin system and a great tool for more complex translation workflows.

    For AI Automator it has a purpose, but that doesn't necessarily need abstraction and can work without it - its not a dealbreaker. Deepl is enough for 99% of all purposes and OpenAI the otehr ones.

  • 🇬🇧United Kingdom yautja_cetanu

    My view is that we should include a couple of good example modules of obvious things you could do with AI. If an AI translation module gets better and more powerful in contrib we could move it out.

  • 🇱🇹Lithuania mindaugasd

    A common case can be a chain: translate to English before doing TextToImage, because textToImage models are not multi-lingual.

    I just tested dall-e-3 with languages.
    It converted a prompt to another prompt behind the scenes.

    • If I write "KÄ—dÄ— ant stalo" in Lithuanian it translates by itself behind the scenes to English "Chair on the table"
    • But if I write "Chain on the table" it converts behind the scenes to "A wooden chair with intricately carved details on the backrest is precariously balanced on..."

    Providing it in English is still better to begin with. All more important for Stable Diffusion.

    But still regular "chat" type can translate this fine as already coded in ai_translations module.

  • 🇱🇹Lithuania mindaugasd

    I used to do translations chaining quite a lot, because AI models were not good in Lithuanian and unusable, but today, as we see, AI models do chaining by themselves behind the scenes.

  • 🇱🇹Lithuania mindaugasd

    More practical examples were people would use it?

  • 🇩🇪Germany marcus_johansson

    The question is where it would be used as you write - CKEditor would be one thing I think, even having a pure translate widget in the edit form as well or using it together with an evaluation tool to figure out which service provides the best translations for your use case. But I'm just making up stuff here, without knowing if they will be used. Automator is the only place where I know it is already being used for a Fortune 500 company.

    At the same time OpenAI is better than the 3 big translation tools on major western languages, but it "only" covers around 40 languages and not all combinations of these, while Deepl and Google Translate covers translations for less used languages. Also Alibaba Translate is by most sources the best English-Chinese translation tool.

    In the end an abstraction layer is just to make it possible to do these things, its not necessarily important that they are used from day one. Actually developing this takes 30 minutes. And arguing for it being an AI functionality is very easy, because it is the original AI transformers technology that made LLM's and more possible.

    I think maybe the most important argument though: https://huggingface.co/models?pipeline_tag=translation. In theory there are models that works for specific language combinations built by generators that you can run locally. I would argue that this is the main reason this should be available, because this might be a future/current use case.

  • 🇱🇹Lithuania mindaugasd

    The question is where it would be used as you write

    My first wonder was how extensive functionality it will be.

    Actually developing this takes 30 minutes. And arguing for it being an AI functionality is very easy

    My second wonder if it will be used is indeed dull one. I am convinced, it should it done. On the other hand, like in recent years, in the future we can expect AI to continue get better, and this separation line between translation and AI will continue to blur even more, but for now, it still make sense.

  • First commit to issue fork.
  • Merge request !102New operation type: translate_text → (Merged) created by jhuhta
  • Pipeline finished with Failed
    4 months ago
    Total: 252s
    #293926
  • Pipeline finished with Failed
    4 months ago
    Total: 280s
    #293947
  • Pipeline finished with Success
    4 months ago
    Total: 216s
    #293958
  • Pipeline finished with Success
    4 months ago
    #293965
  • 🇫🇮Finland jhuhta

    Should be ready now.

  • 🇩🇪Germany marcus_johansson

    Looks good, getting merged.

  • Automatically closed - issue fixed for 2 weeks with no activity.

Production build 0.71.5 2024