Right now, Orama supports 30 languages out of the box in 8 different alphabets.
For every language, Orama provides a default tokenizer, stop-words, and stemmer.
Latin Alphabet
Language
Tokenizer
Stop-words
Stemmer
Danish
✅
✅
✅
Dutch
✅
✅
✅
English
✅
✅
✅
Finnish
✅
✅
✅
French
✅
✅
✅
German
✅
✅
✅
Hungarian
✅
✅
✅
Indonesian
✅
✅
✅
Irish
✅
✅
✅
Italian
✅
✅
✅
Norwegian
✅
✅
✅
Portuguese
✅
✅
✅
Romanian (*)
✅
✅
✅
Serbian (**)
✅
✅
✅
Slovenian
✅
✅
✅
Spanish
✅
✅
✅
Swedish
✅
✅
✅
Turkish
✅
✅
✅
(*) = also uses a few additional diacritic marks
(**) = uses both Cyrillic and Latin scripts