Skip to content

Stop-words

Orama provides support for stop-words removal via the @orama/stopwords package.

Terminal window
npm install @orama/stopwords

Enabling stop-words removal

By default, Orama does not remove any stop-word, as this is intended to be an explicit action from the user. To enable stop-words removal, you can use the stopWords property when creating a new Orama instance:

import { create } from "@orama/orama";
const db = create({
schema: {
author: "string",
quote: "string",
},
components: {
tokenizer: {
stopWords: ["foo", "bar"], // Enable custom stop-words
},
},
});

Using the default stop-words list

By installing the @orama/stopwords package, you can use the default stop-words list for a given language:

import { create } from "@orama/orama";
import { stopwords as englishStopwords } from "@orama/stopwords/english";
const db = create({
schema: {
author: "string",
quote: "string",
},
components: {
tokenizer: {
stopWords: englishStopwords,
},
},
});

Using the default stop-words list is the recommended way to enable stop-words removal, as it is the most efficient way to do so.

Extending the default stop-words list

You can always extend the default stop-words list by adding or removing words:

import { create } from "@orama/orama";
import { stopwords as italianStopwords } from "@orama/stopwords/italian";
const db = create({
schema: {
author: "string",
quote: "string",
},
components: {
tokenizer: {
stopWords: [...italianStopwords, "ciao", "buongiorno"],
},
},
});

Supported languages

As for now, Orama supports 30 languages when it comes to stop-words removal:

  • Arabic
  • Armenian
  • Bulgarian
  • Chinese (Mandarin - stemmer not supported)
  • Danish
  • Dutch
  • English
  • Finnish
  • French
  • German
  • Greek
  • Hindi
  • Hungarian
  • Indonesian
  • Irish
  • Italian
  • Nepali
  • Norwegian
  • Portuguese
  • Romanian
  • Russian
  • Sanskrit
  • Serbian
  • Slovenian
  • Spanish
  • Swedish
  • Tamil
  • Turkish
  • Ukrainian