Skip to content

Stemming

Orama can analyze the input and perform a stemming operation, which allows the engine to perform more optimized queries, as well as save indexing space.

When stemming is enabled, Orama uses the English language analyzer, but we can override this behavior by setting the property language at database initialization, and importing a custom stemmer.

import { create } from "@orama/orama";
import { stemmer, language } from "@orama/stemmers/italian";
const db = create({
schema: {
author: "string",
quote: "string",
},
components: {
tokenizer: {
stemming: true,
language,
stemmer,
},
},
});

Right now, Orama supports 30 languages and stemmers out of the box:

  • Arabic
  • Armenian
  • Bulgarian
  • Chinese (Mandarin - stemmer not supported)
  • Danish
  • Dutch
  • English
  • Finnish
  • French
  • German
  • Greek
  • Hindi
  • Hungarian
  • Indonesian
  • Irish
  • Italian
  • Mandarin (stemmer not supported)
  • Nepali
  • Norwegian
  • Portuguese
  • Romanian
  • Russian
  • Sanskrit
  • Serbian
  • Slovenian
  • Spanish
  • Swedish
  • Tamil
  • Turkish
  • Ukrainian