Skip to content

Using Chinese with Orama

At the time of writing, Orama supports Chinese (Mandarin) via a custom tokenizer, which is part of the @orama/tokenizers package.

To get started, make sure to install all the dependencies you need:

Terminal window
npm install @orama/orama @orama/tokenizers

If you want to add Mandarin stop-words as well, install the @orama/stopwords package too:

Terminal window
npm install @orama/stopwords

Now you’re ready to get started with Orama:

import { create, insert, search } from "@orama/orama";
import { createTokenizer } from '@orama/tokenizers/mandarin'
import { stopwords as mandarinStopwords } from "@orama/stopwords/mandarin";
const db = await create({
schema: {
name: "string",
},
components: {
tokenizer: await createTokenizer({
stopWords: mandarinStopwords,
}),
},
});
await insert(db, { name: "北京" }); // Beijing
await insert(db, { name: "上海" }); // Shanghai
await insert(db, { name: "广州" }); // Guangzhou
await insert(db, { name: "深圳" }); // Shenzhen
await insert(db, { name: "成都" }); // Chengdu
await insert(db, { name: "杭州" }); // Hangzhou
await insert(db, { name: "南京" }); // Nanjing
await insert(db, { name: "北京大学" }); // Peking University
await insert(db, { name: "上海交通大学" }); // Shanghai Jiao Tong University
await insert(db, { name: "广州中医药大学" }); // Guangzhou University of Chinese Medicine
const results = await search(db, {
term: "广州",
threshold: 0,
});
console.log(results);
// {
// "elapsed": {
// "raw": 89554625,
// "formatted": "89ms"
// },
// "hits": [
// {
// "id": "36666208-3",
// "score": 4.210224897276653,
// "document": {
// "name": "广州"
// }
// },
// {
// "id": "36666208-10",
// "score": 1.9335268122510698,
// "document": {
// "name": "广州中医药大学"
// }
// }
// ],
// "count": 2
// }