Stop-words
Orama provides support for stop-words removal via the @orama/stopwords
package.
npm install @orama/stopwords
yarn add @orama/stopwords
pnpm install @orama/stopwords
Enabling stop-words removal
By default, Orama does not remove any stop-word, as this is intended to be an explicit action from the user. To enable stop-words removal, you can use the stopWords
property when creating a new Orama instance:
import { create } from "@orama/orama";
const db = create({ schema: { author: "string", quote: "string", }, components: { tokenizer: { stopWords: ["foo", "bar"], // Enable custom stop-words }, },});
Using the default stop-words list
By installing the @orama/stopwords
package, you can use the default stop-words list for a given language:
import { create } from "@orama/orama";import { stopwords as englishStopwords } from "@orama/stopwords/english";
const db = create({ schema: { author: "string", quote: "string", }, components: { tokenizer: { stopWords: englishStopwords, }, },});
Using the default stop-words list is the recommended way to enable stop-words removal, as it is the most efficient way to do so.
Extending the default stop-words list
You can always extend the default stop-words list by adding or removing words:
import { create } from "@orama/orama";import { stopwords as italianStopwords } from "@orama/stopwords/italian";
const db = create({ schema: { author: "string", quote: "string", }, components: { tokenizer: { stopWords: [...italianStopwords, "ciao", "buongiorno"], }, },});
Supported languages
As for now, Orama supports 30 languages when it comes to stop-words removal:
- Arabic
- Armenian
- Bulgarian
- Chinese (Mandarin - stemmer not supported)
- Danish
- Dutch
- English
- Finnish
- French
- German
- Greek
- Hindi
- Hungarian
- Indonesian
- Irish
- Italian
- Nepali
- Norwegian
- Portuguese
- Romanian
- Russian
- Sanskrit
- Serbian
- Slovenian
- Spanish
- Swedish
- Tamil
- Turkish
- Ukrainian