Skip to content

Officially Supported Languages

Right now, Orama supports 30 languages out of the box in 8 different alphabets.
For every language, Orama provides a default tokenizer, stop-words, and stemmer.

Latin Alphabet

LanguageTokenizerStop-wordsStemmer
Danish
Dutch
English
Finnish
French
German
Hungarian
Indonesian
Irish
Italian
Norwegian
Portuguese
Romanian (*)
Serbian (**)
Slovenian
Spanish
Swedish
Turkish

(*) = also uses a few additional diacritic marks
(**) = uses both Cyrillic and Latin scripts

Cyrillic Alphabet

LanguageTokenizerStop-wordsStemmer
Bulgarian
Russian
Serbian (*)
Ukrainian

(*) = uses both Cyrillic and Latin scripts

Greek Alphabet

LanguageTokenizerStop-wordsStemmer
Greek

Devanagari Script

LanguageTokenizerStop-wordsStemmer
Hindi
Nepali
Sanskrit

Arabic Script

LanguageTokenizerStop-wordsStemmer
Arabic

Armenian Alphabet

LanguageTokenizerStop-wordsStemmer
Armenian

Tamil Script

LanguageTokenizerStop-wordsStemmer
Tamil

Chinese Characters (Logographic Script)

LanguageTokenizerStop-wordsStemmer
Chinese (Mandarin)
Japanese