Components

Orama can be completely customized and extended by using its components architecture.

Depending on the case, a component can be a simple as a function or a slightly more complex interface. All components can be synchronous or return a promise and Orama will make sure everything is handled correctly.

When no components are specified, an Orama database is created with some defaults components which satisfies most common use cases:

English tokenizer with stemming disabled.
BM25 and Radix-Tree based index.
In memory documents store.

Providing your own components

It’s very easy to provide a custom component when creating a database: simply pass the components option when calling create.

For instance, this code:

import { create, insert, search } from "@orama/orama";

const movieDB = create({
  schema: {
    title: "string",
    director: "string",
    plot: "string",
    year: "number",
    isFavorite: "boolean",
  },
  components: {
    afterInsert() {
      console.log("INSERTED");
    },
  },
});

insert(movieDB, {
  title: "Harry Potter and the Philosopher's Stone",
  director: "Chris Columbus",
  plot: "Harry Potter, an eleven-year-old orphan, discovers that he is a wizard and is invited to study at Hogwarts. Even as he escapes a dreary life and enters a world of magic, he finds trouble awaiting him.",
  year: 2001,
  isFavorite: false,
});

const results = search(movieDB, { term: "Harry" });
console.log(results.count);

Will lead to this output:

INSERTED
2

Supported components

`tokenizer`

The tokenizer is used to tokenize documents fields and search terms. To customize the tokenizer used by Orama, provide an object which has at least the following properties:

tokenize: A function that accepts the content to tokenize (string), the language (string) and the property name (string) and returns a list of tokens.
language (string): The language supported by the tokenizer.
normalizationCache (Map): It can used to cache tokens normalization.

In other words, a tokenizer must satisfy the following interface:

interface Tokenizer {
  language: string;
  normalizationCache: Map<string, string>;
  tokenize: (
    raw: string,
    language?: string,
    prop?: string
  ) => string[] | Promise<string[]>;
}

For instance, with the following configuration only the first character of each string will be indexed and only the first character of a term will be searched:

import { create } from "@orama/orama";

const movieDB = create({
  schema: {
    title: "string",
    director: "string",
  },
  components: {
    tokenizer: {
      language: "english",
      normalizationCache: new Map(),
      tokenize(raw) {
        return raw[0];
      },
    },
  },
});

The Orama’s default tokenizer is exported via @orama/orama/components and can be customized:

import { create } from "@orama/orama";
import { tokenizer as defaultTokenizer } from "@orama/orama/components";

const movieDB = create({
  schema: {
    title: "string",
    director: "string",
  },
  components: {
    tokenizer: await defaultTokenizer.createTokenizer({
      language: "english",
      stemming: false,
    }),
  },
});

Optionally you can pass the customization options without using createTokenizer:

import { create } from "@orama/orama";

const movieDB = create({
  schema: {
    title: "string",
    director: "string",
  },
  components: {
    tokenizer: { language: "english", stemming: false },
  },
});

`index`

The index component is used to perform the indexing and searching of documents in Orama.

To customize the index used by Orama, provide an object which has at least the following properties:

create: A function that creates a new index. It receives the following arguments:
- orama: The Orama instance.
- mapper: The document IDs mapper (see Internal components section below).
- schema: The documents schema.
insert: A function that inserts a new document in the index. It receives the following arguments:
- implementation: The current index implementation.
- index: The index.
- prop: The property that it is currently considered.
- id: The ID of the document being inserted.
- value: The value of the property in the document.
- expectedType: The type of the property in the document according with the schema.
- language: The language of the document.
- tokenizer: The tokenizer associated with the current database.
- docsCount: The number of documents in the documents store before the action is performed.
remove: A function that removes a document from the index. It receives the same arguments as insert.
insertDocumentScoreParameters: A function that inserts document information into the index for future results score calculation. It should be typically invoked within insert. It receives the following arguments:
- index: The index.
- prop: The property that is currently considered.
- id: The ID of the document being inserted.
- tokens: The list of the tokens found in the document.
- docsCount: The number of documents in the documents store before the action is performed.
insertTokenScoreParameters: A function that inserts token information into the index for future results score calculation. It should be typically invoked within insert. It receives the following arguments:
- index: The index.
- prop: The property that it is currently considered.
- id: The ID of the document being inserted.
- token: The token.
- tokens: The list of the tokens found in the document.
removeDocumentScoreParameters: A function that removes document scores information from the index. It should be typically invoked within remove. It receives the following arguments:
- index: The index.
- prop: The property that is currently considered.
- id: The ID of the document being inserted.
- docsCount: The number of documents in the documents store before the action is performed.
removeTokenScoreParameters: A function that removes token score information from the index. It should be typically invoked within remove. It receives the following arguments:
- index: The index.
- prop: The property that is currently considered.
- id: The ID of the document being inserted.
- token: The token.
calculateResultScores: A function that calculates the score for the results of the current search. It should be typically invoked within search. It receives the following arguments:
- context: A search context with various useful information about the search.
- index: The index.
- prop: The property search.
- term: The term used to search.
- ids: The list of document IDs matched by the search.
search: A function that searches documents in index data and returns matching IDs with scores. It receives the following arguments:
- context: A search context with various useful information about the search.
- index: The index.
- prop: The property to search into.
- term: The term to search for.
searchByWhereClause: A function that searches in boolean and numeric indexes and returns a list of matching IDs. It receives the following arguments:
- context: A search context with various useful information about the search.
- index: The index.
- filters: An object where keys are the properties to match and the values are search operators as described in the filters page.
getSearchableProperties: A function that returns a list of all searchable properties in the index. It receives the index as the only argument.
getSearchablePropertiesWithTypes: A function that returns an object where keys are the searchable properties in the index and the values are the type of the index for a property. It receives the index as the only argument.
load: A function that deserializes an index from a JavaScript object. It receives The document IDs mapper and a JavaScript object as its only argument and must return an index.
save: A function that serializes the index into a JavaScript object. It receives the index as the only argument and must return a JavaScript object.

The following functions are optional:

beforeInsert or afterInsert: Functions invoked before or after insert. They accept the same arguments as insert except the first one.
beforeRemove or afterRemove: Functions invoked before or after remove. They accept the same arguments as remove except the first one.

For the more formal interface information, look for the IIndex interface in src/types.ts in Orama’s source code.

The Orama’s default index is based on BM25, Radix Trees and AVL trees. All its functions are exported via @orama/orama/components and can be composed to create a custom index:

import { create } from "@orama/orama";
import { index as defaultIndex } from "@orama/orama/components";

const index = await defaultIndex.createIndex();
const movieDB = create({
  schema: {
    title: "string",
    director: "string",
  },
  components: {
    // This index will only customize the deserialization
    index: {
      ...index,
      load(documentsIdsMapper, raw) {
        // Do something here
      },
    },
  },
});

`documentsStore`

The documentsStore component is used to store the documents in Orama.

To customize the documents store used by Orama, provide an object which has at least the following properties:

create: A function that creates a new document store. It receives the following arguments:
- orama: The Orama instance.
- mapper: The document IDs mapper (see Internal components section below).
get: A function that returns a document from the store. It receives the following arguments:
- The documents store.
- The ID of the document to get.
getAll: A function that returns all documents from the store. Note that the IDs in the returned object are the mapped IDs from the mapper component. It receives the following arguments:
- The documents store.
getMultiple: A function that returns multiple documents from the store. It receives the following arguments:
- The documents store.
- A list of IDs of the documents to get.
getAll: A function that returns all the documents from the store. It receives the following arguments:
- The documents store.
store: A function that stores a new document in the documents store. It receives the following arguments:
- The documents store.
- The ID of the new document to store.
- The document to store.
remove: A function that removes a document from the documents store. It receives the following arguments:
- The documents store.
- The ID of the new document to remove.
count: A function that returns the count of the documents currently stored. It receives the current documents store as the only argument.
load: A function that deserializes a documents store from a JavaScript object. It receives The document IDs mapper and a JavaScript object as its only argument and must return a documents store.
save: A function that serializes the documents store into a JavaScript object. It receives the current documents store as the only argument and must return a JavaScript object.

For the more formal interface information, look for the IDocumentsStore interface in src/types.ts in Orama’s source code.

The Orama’s default documents store is based on simple JavaScript object. All its functions are exported via @orama/orama/components and can be composed to create a custom documents store:

import { create } from "@orama/orama";
import { documentsStore as defaultDocumentsStore } from "@orama/orama/components";

const store = await defaultDocumentsStore.createDocumentsStore();
const movieDB = create({
  schema: {
    title: "string",
    director: "string",
  },
  components: {
    // override partially the default documents store
    documentsStore: {
      ...store,
      remove(s, id) {
        // Apply custom logic
        return store.remove(s, id);
      },
    },
  },
});

`sorter`

The sorter component is used to store the documents in Orama.

To customize the documents sort used by Orama, provide an object which has at least the following properties:

create: A function that creates a new sorter. It receives the following arguments:
- mapper: The document IDs mapper (see Internal components section below).
- schema: The documents schema.
- configuration: The sorter configuration.
insert: A function that inserts a new document in the sorter. It receives the following arguments:
- sorter: The sorter returned by the create function.
- prop: The property that is currently considered.
- id: The ID of the document being inserted.
- value: The value of the property in the document.
- schemaType: The type of the property in the document according with the sort schema.
- language: The language of the document.
remove: A function that removes a document from the index. It receives the following arguments:
- sorter: The sorter returned by the create function.
- prop: The property that is currently considered.
- id: The ID of the document being inserted.
sortBy: A function that inserts document information into the index for future results score calculation. It should be typically invoked within insert. It receives the following arguments:
- sorter: The sorter returned by the create function.
- docIds: A [string, number] array contains for each id the weight.
- by: The SortParameters specified during the search
getSortableProperties: A function that returns a list of all sortable properties in the sorter. It receives the index as the only argument.
getSortablePropertiesWithTypes: A function that returns an object where keys are the sortable properties in the sorter and the values are the type of the sort for a property. It receives the index as the only argument.
load: A function that deserializes a sorter from a JavaScript object. It receives The document IDs mapper and a JavaScript object as its only argument and must return a sorter.
save: A function that serializes the sorter into a JavaScript object. It receives the sorter as the only argument and must return a JavaScript object.

import { create } from "@orama/orama";
import { sorter as defaultSorter } from "@orama/orama/components.js";

const s = await defaultSort.createSorter();
const db = create({
  schema: {
    number: "number",
  },
  components: {
    sorter: {
      // override partially the default sorter
      ...s,
      async remove(sort, prop, id) {
        // Apply custom logic here
        return s.remove(sort, prop, id);
      },
    },
  },
});

General purpose components

The components in this category are simple functions which are internally used by Orama. Depending on the use case the component must return a value. Orama will await if a Promise is returned.

`validateSchema`

The component is used to validate a document against the schema. The function should return undefined if the document is valid according to the schema, the path of the invalid property otherwise.

The function will receive two arguments:

The document that is being validated.
The schema provided to create.

import { create, insert } from "@orama/orama";

const movieDB = create({
  schema: {
    title: "string",
    director: "string",
  },
  components: {
    validateSchema(doc) {
      return typeof doc.name === "string" && typeof doc.director === "string";
    },
  },
});

// This will throw
insert(movieDB, {
  title: "Harry Potter and the Philosopher's Stone",
  director: 42,
});

`getDocumentIndexId`

The component is used to extract or generate a unique ID for a document. The returned value must be string.

The function will receive one argument:

The document for which an ID is being generated.

import { create, insert } from "@orama/orama";

const movieDB = create({
  schema: {
    title: "string",
    director: "string",
  },
  components: {
    getDocumentIndexId(doc) {
      return doc.id ?? Date.now().toString();
    },
    afterInsert(_orama, _doc, id) {
      console.log(id);
    },
  },
});

// This will print something like "1679476550629"
insert(movieDB, {
  title: "Harry Potter and the Philosopher's Stone",
  director: "Chris Columbus",
});

`getDocumentProperties`

The component is used to extract indexable properties from a document.

The function receives two arguments:

The document that is being read.
A list of properties paths (using dotted syntax) to extract.

The function must return an object where the keys are the paths received as argument.

import { create } from "@orama/orama";
import { get } from "lodash/get";

const movieDB = create({
  schema: {
    title: "string",
    director: "string",
  },
  components: {
    getDocumentProperties(doc, paths) {
      return Object.fromEntries(
        paths.map((path) => {
          return [path, get(doc, path)];
        })
      );
    },
  },
});

`formatElapsedTime`

The component is used to format the elapsed property in the search results. The return value can be a number, a string or an object.

The function receives a single argument: the search elapsed time as BigInt.

import { create, insert, search } from "@orama/orama";

const movieDB = create({
  schema: {
    title: "string",
    director: "string",
    plot: "string",
    year: "number",
    isFavorite: "boolean",
  },
  components: {
    formatElapsedTime(n) {
      return `${Number(n)} - custom`;
    },
  },
});

insert(movieDB, {
  title: "Harry Potter and the Philosopher's Stone",
  director: "Chris Columbus",
  plot: "Harry Potter, an eleven-year-old orphan, discovers that he is a wizard and is invited to study at Hogwarts. Even as he escapes a dreary life and enters a world of magic, he finds trouble awaiting him.",
  year: 2001,
  isFavorite: false,
});

const results = search(movieDB, { term: "Harry" });

// This will print something like: 100 - custom
console.log(results.elapsed);

Internal components

Documents IDs mapper

In order to improve performance, Orama uses an internal ID for each document. The documents IDs mapper component is used to maintain a between the user document ID and the Orama document ID.

This component is internal and cannot be replaced by the developer. The component is passed to the customizable components (like documentsStore) and must be treated as an opaque object.

When writing or reading documents in the data section of The Orama instance, such as orama.data.index, make sure you always use a internal ID.

Orama exports two helpers which will help dealing with this operations:

getInternalDocumentId: A function that receives a documents mapper object and an external document ID and returns an internal document ID.
getDocumentIdFromInternalId: A function that receives a documents mapper object and an internal document ID and returns an external document ID.

Extending Orama

As Orama is an Open Source Software, we gladly accept proposal for new functionalities.

This obviously include the definition of new components which are used internally by Orama and that can be customized by the users.

Step 1: Define a new interface

If you want to create a new component, you first have to define your component in the ObjectComponents or FunctionComponents interfaces in src/types.ts.

In case of object components, the definition should be a new interface defined in the same file. As convention, start the interface with the letter I.

Example:

 interface IShiningDetector {
   isShining(): SyncOrAsyncValue<boolean>
 }

export interface ObjectComponents {
  tokenizer: Tokenizer | TokenizerConfig
  index: IIndex
  documentsStore: IDocumentsStore
 shiningDetector: IShiningDetector
}

Remember that all functions used in Orama’s components can be async, so we advise to use the SyncOrAsyncValue for their return value. This also implies that when you invoke this function you should always use await to make sure the function is correctly handled whether it is async or not.

Step 2: Define and store data in the database

If your component needs to store some data in the orama database, you have to add a new field in the Data interface in src/types.ts.

Example:

 export interface IShiningDetector {
   isShining(shining: Record<string, number>): SyncOrAsyncValue<number>
 }

export interface ObjectComponents {
  tokenizer: Tokenizer | TokenizerConfig
  index: IIndex
  documentsStore: IDocumentsStore
 shiningDetector: IShiningDetector
}

interface Data<I extends OpaqueIndex, D extends OpaqueDocumentStore> {
  index: I
  docs: D
 shining: Record<string, number>
}

Step 3: Define your component default implementation

Create a file (or folder if appropriate) under the src/components folder with appropriate exports.

Example (../components/shiningDetector.js):

import { IShiningDetector } from '../src/types.js'

export function isShining(shining: Record<string, number>, subject: string): number {
  return shining[subject] ?? 0
}

export function createShiningDetector(): IShiningDetector {
  return { isShining }
}

Note that exporting a create* factory function is not strictly needed but it helps to isolate initialization tasks.

Step 4: Update the `create` method

Update the create method in src/methods/create.ts to use the component provided in the options, or create a new one using the object or factory function defined in the previous step.

If you added a field to the Data interface, also provide the initial value.

Example:

import { createShiningDetector } from '../components/shiningDetector.js'

export async function create({ schema, language, components }: CreateArguments): Promise<Orama> {

  // ...

  const orama = {
    data: {},
    caches: {},
    // ...
   shiningDetector: components.shiningDetector ?? createShiningDetector()
  } as Orama

  orama.data = {
    index: await orama.index.create(orama, schema),
    docs: await orama.documentsStore.create(orama),
    shining: { Paolo: 10 }
  }

Step 5: Have fun!

Call your component’s functions where appropriate!

Example:

export async function search(orama: Orama, params: SearchParams, language?: string): Promise<Results> {
   console.log('Shining level', await orama.shiningDetector.isShining(params.term))

  // ...

Components

Providing your own components

Supported components

tokenizer

index

documentsStore

sorter

General purpose components

validateSchema

getDocumentIndexId

getDocumentProperties

formatElapsedTime

Internal components

Documents IDs mapper

Extending Orama

Step 1: Define a new interface

Step 2: Define and store data in the database

Step 3: Define your component default implementation

Step 4: Update the create method

Step 5: Have fun!

`tokenizer`

`index`

`documentsStore`

`sorter`

`validateSchema`

`getDocumentIndexId`

`getDocumentProperties`

`formatElapsedTime`

Step 4: Update the `create` method