Gettext LLM
Automated translations
Translating from one language to another requires the translator to be accurate and to maintain a consistent tone/persona and style. Scaling these to multiple languages is challenging. Automated translation with LLM's provide a "good enough" alternative in many use cases.
Elixir Gettext LLM library allows you to translate all Gettext PO folders/files in your project using any LLM endpoint supported by langchain.
The library provides several mix tasks that can be run directly in your Elixir/Phoenix project from the command line (ie. locally on the dev machine) or part of a CI/CD pipeline.
gettext_llm provides configurable tone/persona and style. This allows you to "shape" your resulting translations into something that is compatible with your app audience & brand.
Installation
The package can be installed by adding gettext_llm to your list of dependencies in mix.exs:
def deps do
[
{:gettext_llm, "0.2.4", only: [:dev, :test]}
]
endUsage
1. Use gettext to extract & merge
gettext_llm translates PO files. Use gettext to extract all the translated messages from your app into POT files & merge them into their respective PO files
mix gettext.extract
mix gettext.merge priv/gettext --no-fuzzy
2. Add the gettext_llm in your config.exs
gettext_llm uses langchain to call the LLM endpoints. As such gettext_llm can translate using any LLM endpoint supported by langchain. gettext_llm reads the endpoint specific config and passes it directly to langchain.
Example configuration with OpenAI
# General application configuration
import Config
config :gettext_llm, GettextLLM,
# ignored_languages: ["en"] <--- Optional but good to skip translating your reference language
persona:
"You are translating messages for a website that connects people needing help with people that can provide help. You will provide translation that is casual but respectful and uses plain language.",
style:
"Casual but respectul. Uses plain plain language that can be understood by all age groups and demographics.",
endpoint: LangChain.ChatModels.ChatOpenAI,
endpoint_model: "gpt-4",
endpoint_temperature: 0,
endpoint_config: %{
"openai_key" =>
"<YOUR_OPENAI_KEY>",
"openai_org_id" => "<YOUR_ORG_ID>"
}Example configuration with Anthropic
# General application configuration
import Config
config :gettext_llm, GettextLLM,
# ignored_languages: ["en"] <--- Optional but good to skip translating your reference language
persona:
"You are translating messages for a website that connects people needing help with people that can provide help. You will provide translation that is casual but respectful and uses plain language.",
style:
"Casual but respectul. Uses plain plain language that can be understood by all age groups and demographics.",
endpoint: LangChain.ChatModels.ChatAnthropic,
endpoint_model: "claude-3-5-sonnet-latest",
endpoint_temperature: 0,
endpoint_config: %{
"anthropic_key" =>
"<YOUR_ANTHROPIC_KEY>"
}
3. Run gettext_llm mix task for translation
Run using the default gettext location (ie. priv/gettext)
mix gettext_llm.translate translateRun using a specific gettext location
mix gettext_llm.translate translate my_path/gettext
4. Run gettext_llm mix task for validating variables in translations
Run using the default gettext location (ie. priv/gettext)
mix gettext_llm.translate validateRun using a specific gettext location
mix gettext_llm.translate validate my_path/gettext
Other gettext_llm mix task
Check that your configuration is correct
mix gettext_llm.translate infoDisplay help
mix help gettext_llm.translate Why do we need validation?
LLM's are probabilistic and as such there are cases when gettext variables are translated instead of being kept as they are in the original.
For example a message like <I'm %{year} old> should be translated in dutch <Ik ben %{year} oud>. This is correct both as translation but also the dutch translation keeps the original variable names. This way gettext message templates work also with the translation.
In some cases the LLM can mistakenly translate also the variable names like this <Ik ben %{jaar} oud>. When that happens the translation template ends up broken and we get a runtime error when the template is used.
What can you do when you discover these validation errors?
- tweak your translation prompt in config
- last resort - fix variable names by hand in translation po files
Documentation
Documentation can be be found at https://hexdocs.pm/gettext_llm.
Misc
For some apps or languages LLM's are not good enough. In these cases you will probably be better off with a human translator. The human translator could work on it's own or part of a hybrind setup. A typical setup has the draft translation version proposed by an LLM and the final approval (and corrections) are performed by the human. Good open source solutions for such a setup are Kanta or Weblate.
Thanks
Special thanks to Adrian Codausi & Goran Codausi for inspiring me to build this. They have build an earlier prototype of a similar functionality in another project.