Components
Orama can be completely customized and extended by using its components architecture.
Depending on the case, a component can be a simple as a function or a slightly more complex interface. All components can be synchronous or return a promise and Orama will make sure everything is handled correctly.
When no components are specified, an Orama database is created with some defaults components which satisfies most common use cases:
- English tokenizer with stemming disabled.
- BM25 and Radix-Tree based index.
- In memory documents store.
Providing your own components
It’s very easy to provide a custom component when creating a database: simply pass the components
option when calling create
.
For instance, this code:
Will lead to this output:
Supported components
tokenizer
The tokenizer is used to tokenize documents fields and search terms. To customize the tokenizer used by Orama, provide an object which has at least the following properties:
tokenize
: A function that accepts the content to tokenize (string), the language (string) and the property name (string) and returns a list of tokens.language
(string): The language supported by the tokenizer.normalizationCache
(Map): It can used to cache tokens normalization.
In other words, a tokenizer must satisfy the following interface:
For instance, with the following configuration only the first character of each string will be indexed and only the first character of a term will be searched:
The Orama’s default tokenizer is exported via @orama/orama/components
and can be customized:
Optionally you can pass the customization options without using createTokenizer
:
index
The index component is used to perform the indexing and searching of documents in Orama.
To customize the index used by Orama, provide an object which has at least the following properties:
create
: A function that creates a new index. It receives the following arguments:orama
: The Orama instance.mapper
: The document IDs mapper (see Internal components section below).schema
: The documents schema.
insert
: A function that inserts a new document in the index. It receives the following arguments:implementation
: The current index implementation.index
: The index.prop
: The property that it is currently considered.id
: The ID of the document being inserted.value
: The value of the property in the document.expectedType
: The type of the property in the document according with the schema.language
: The language of the document.tokenizer
: The tokenizer associated with the current database.docsCount
: The number of documents in the documents store before the action is performed.
remove
: A function that removes a document from the index. It receives the same arguments asinsert
.insertDocumentScoreParameters
: A function that inserts document information into the index for future results score calculation. It should be typically invoked withininsert
. It receives the following arguments:index
: The index.prop
: The property that is currently considered.id
: The ID of the document being inserted.tokens
: The list of the tokens found in the document.docsCount
: The number of documents in the documents store before the action is performed.
insertTokenScoreParameters
: A function that inserts token information into the index for future results score calculation. It should be typically invoked withininsert
. It receives the following arguments:index
: The index.prop
: The property that it is currently considered.id
: The ID of the document being inserted.token
: The token.tokens
: The list of the tokens found in the document.
removeDocumentScoreParameters
: A function that removes document scores information from the index. It should be typically invoked withinremove
. It receives the following arguments:index
: The index.prop
: The property that is currently considered.id
: The ID of the document being inserted.docsCount
: The number of documents in the documents store before the action is performed.
removeTokenScoreParameters
: A function that removes token score information from the index. It should be typically invoked withinremove
. It receives the following arguments:index
: The index.prop
: The property that is currently considered.id
: The ID of the document being inserted.token
: The token.
calculateResultScores
: A function that calculates the score for the results of the current search. It should be typically invoked withinsearch
. It receives the following arguments:context
: A search context with various useful information about the search.index
: The index.prop
: The property search.term
: The term used to search.ids
: The list of document IDs matched by the search.
search
: A function that searches documents in index data and returns matching IDs with scores. It receives the following arguments:context
: A search context with various useful information about the search.index
: The index.prop
: The property to search into.term
: The term to search for.
searchByWhereClause
: A function that searches in boolean and numeric indexes and returns a list of matching IDs. It receives the following arguments:context
: A search context with various useful information about the search.index
: The index.filters
: An object where keys are the properties to match and the values are search operators as described in the filters page.
getSearchableProperties
: A function that returns a list of all searchable properties in the index. It receives the index as the only argument.getSearchablePropertiesWithTypes
: A function that returns an object where keys are the searchable properties in the index and the values are the type of the index for a property. It receives the index as the only argument.load
: A function that deserializes an index from a JavaScript object. It receives The document IDs mapper and a JavaScript object as its only argument and must return an index.save
: A function that serializes the index into a JavaScript object. It receives the index as the only argument and must return a JavaScript object.
The following functions are optional:
beforeInsert
orafterInsert
: Functions invoked before or afterinsert
. They accept the same arguments asinsert
except the first one.beforeRemove
orafterRemove
: Functions invoked before or afterremove
. They accept the same arguments asremove
except the first one.
For the more formal interface information, look for the IIndex
interface in src/types.ts
in Orama’s source code.
The Orama’s default index is based on BM25, Radix Trees and AVL trees. All its functions are exported via @orama/orama/components
and can be composed to create a custom index:
documentsStore
The documentsStore component is used to store the documents in Orama.
To customize the documents store used by Orama, provide an object which has at least the following properties:
create
: A function that creates a new document store. It receives the following arguments:orama
: The Orama instance.mapper
: The document IDs mapper (see Internal components section below).
get
: A function that returns a document from the store. It receives the following arguments:- The documents store.
- The ID of the document to get.
getAll
: A function that returns all documents from the store. Note that the IDs in the returned object are the mapped IDs from the mapper component. It receives the following arguments:- The documents store.
getMultiple
: A function that returns multiple documents from the store. It receives the following arguments:- The documents store.
- A list of IDs of the documents to get.
getAll
: A function that returns all the documents from the store. It receives the following arguments:- The documents store.
store
: A function that stores a new document in the documents store. It receives the following arguments:- The documents store.
- The ID of the new document to store.
- The document to store.
remove
: A function that removes a document from the documents store. It receives the following arguments:- The documents store.
- The ID of the new document to remove.
count
: A function that returns the count of the documents currently stored. It receives the current documents store as the only argument.load
: A function that deserializes a documents store from a JavaScript object. It receives The document IDs mapper and a JavaScript object as its only argument and must return a documents store.save
: A function that serializes the documents store into a JavaScript object. It receives the current documents store as the only argument and must return a JavaScript object.
For the more formal interface information, look for the IDocumentsStore
interface in src/types.ts
in Orama’s source code.
The Orama’s default documents store is based on simple JavaScript object. All its functions are exported via @orama/orama/components
and can be composed to create a custom documents store:
sorter
The sorter component is used to store the documents in Orama.
To customize the documents sort used by Orama, provide an object which has at least the following properties:
create
: A function that creates a new sorter. It receives the following arguments:mapper
: The document IDs mapper (see Internal components section below).schema
: The documents schema.configuration
: The sorter configuration.
insert
: A function that inserts a new document in the sorter. It receives the following arguments:sorter
: The sorter returned by thecreate
function.prop
: The property that is currently considered.id
: The ID of the document being inserted.value
: The value of the property in the document.schemaType
: The type of the property in the document according with the sort schema.language
: The language of the document.
remove
: A function that removes a document from the index. It receives the following arguments:sorter
: The sorter returned by thecreate
function.prop
: The property that is currently considered.id
: The ID of the document being inserted.
sortBy
: A function that inserts document information into the index for future results score calculation. It should be typically invoked withininsert
. It receives the following arguments:sorter
: The sorter returned by thecreate
function.docIds
: A [string, number] array contains for each id the weight.by
: The SortParameters specified during the search
getSortableProperties
: A function that returns a list of all sortable properties in the sorter. It receives the index as the only argument.getSortablePropertiesWithTypes
: A function that returns an object where keys are the sortable properties in the sorter and the values are the type of the sort for a property. It receives the index as the only argument.load
: A function that deserializes a sorter from a JavaScript object. It receives The document IDs mapper and a JavaScript object as its only argument and must return a sorter.save
: A function that serializes the sorter into a JavaScript object. It receives the sorter as the only argument and must return a JavaScript object.
General purpose components
The components in this category are simple functions which are internally used by Orama. Depending on the use case the component must return a value. Orama will await if a Promise is returned.
validateSchema
The component is used to validate a document against the schema.
The function should return undefined
if the document is valid according to the schema,
the path of the invalid property otherwise.
The function will receive two arguments:
- The document that is being validated.
- The schema provided to
create
.
getDocumentIndexId
The component is used to extract or generate a unique ID for a document. The returned value must be string.
The function will receive one argument:
- The document for which an ID is being generated.
getDocumentProperties
The component is used to extract indexable properties from a document.
The function receives two arguments:
- The document that is being read.
- A list of properties paths (using dotted syntax) to extract.
The function must return an object where the keys are the paths received as argument.
formatElapsedTime
The component is used to format the elapsed
property in the search results. The return value can be a number
, a string
or an object
.
The function receives a single argument: the search elapsed time as BigInt.
Internal components
Documents IDs mapper
In order to improve performance, Orama uses an internal ID for each document. The documents IDs mapper component is used to maintain a between the user document ID and the Orama document ID.
This component is internal and cannot be replaced by the developer. The component is passed to the customizable components (like documentsStore
) and must be treated as an opaque object.
When writing or reading documents in the data
section of The Orama instance, such as orama.data.index
, make sure you always use a internal ID.
Orama exports two helpers which will help dealing with this operations:
getInternalDocumentId
: A function that receives a documents mapper object and an external document ID and returns an internal document ID.getDocumentIdFromInternalId
: A function that receives a documents mapper object and an internal document ID and returns an external document ID.
Extending Orama
As Orama is an Open Source Software, we gladly accept proposal for new functionalities.
This obviously include the definition of new components which are used internally by Orama and that can be customized by the users.
Step 1: Define a new interface
If you want to create a new component, you first have to define your component in the ObjectComponents
or FunctionComponents
interfaces in src/types.ts
.
In case of object components, the definition should be a new interface defined in the same file. As convention, start the interface with the letter I
.
Example:
Remember that all functions used in Orama’s components can be async, so we advise to use the SyncOrAsyncValue
for their return value.
This also implies that when you invoke this function you should always use await
to make sure the function is correctly handled whether it is async or not.
Step 2: Define and store data in the database
If your component needs to store some data in the orama database, you have to add a new field in the Data
interface in src/types.ts
.
Example:
Step 3: Define your component default implementation
Create a file (or folder if appropriate) under the src/components
folder with appropriate exports.
Example (../components/shiningDetector.js
):
Note that exporting a create*
factory function is not strictly needed but it helps to isolate initialization tasks.
Step 4: Update the create
method
Update the create
method in src/methods/create.ts
to use the component provided in the options, or create a new one using the object or factory function defined in the previous step.
If you added a field to the Data
interface, also provide the initial value.
Example:
Step 5: Have fun!
Call your component’s functions where appropriate!
Example: