Changing Default Search Algorithm
Since version 3.0.0
, Orama allows you to change the default search algorithm (BM25) with two new plugins: QPS (Quantum Proximity Scoring) and PT15 (Positional Token 15).
This guide will help you understand the differences between these algorithms and choose the best one for your search needs.
To change the default search algorithm, you can install the QPS or PT15 plugin in your Orama instance:
And that’s it! The Orama plugin will do the rest for you.
BM25, QPS, and PT15: A Comparison
Each algorithm has its unique approach to scoring and relevance, making them suitable for different search tasks. Here’s a comparison of the three algorithms to help you choose the best one for your use case.
BM25 (Best Matching 25)
BM25 (Best Matching 25) is a probabilistic information retrieval algorithm developed by Stephen E. Robertson and Karen Spärck Jones in the 1990s.
It’s a ranking function that scores documents based on the frequency of query terms (term frequency), the rarity of terms across the document set (inverse document frequency), and the length of the document.
BM25 is widely used in search engines and information retrieval systems due to its ability to provide relevant results by balancing term frequency and document length, ensuring that longer documents aren’t unfairly penalized.
- Primary Focus: Term frequency and document length normalization.
- Scoring Method: Based on term frequency (TF) and inverse document frequency (IDF).
- Proximity Consideration: No direct proximity calculation.
- Relevance Criteria: Depends on term frequency, document length, and rarity of terms.
- Handling Long Documents: Performs well, handles documents of any length.
- Token Matching: Exact match by default, supports fuzzy matching via typo tolerance and stemming.
- Complexity: Moderate; widely implemented in search systems.
- Memory Usage: Low to moderate.
- Performance: Fast and efficient, especially for large datasets.
- Ideal Use Case: General-purpose search with term weighting.
- Strengths: Simple, fast, and effective for most search tasks.
- Weaknesses: Ignores token proximity, which may reduce relevance for proximity-sensitive searches.
QPS (Quantum Proximity Scoring)
QPS (Quantum Proximity Scoring) is a search algorithm developed by the Orama team in 2024. It was designed to improve search performance, particularly in browser-based environments and memory-constrained systems.
QPS enhances relevance by focusing on the proximity of search terms within documents, assigning higher scores to terms that appear close together.
By leveraging token proximity and efficient memory usage, QPS is optimized for fast and relevant search results in scenarios where resource limitations, such as memory and processing power, are a concernm like web browsers and edge networks.
- Primary Focus: Token proximity and quantization.
- Scoring Method: Based on the proximity of tokens and their occurrence within predefined “quantums.”
- Proximity Consideration: High priority on token proximity.
- Relevance Criteria: Documents with closely located matching tokens are prioritized.
- Handling Long Documents: Handles long documents while maintaining proximity-based relevance.
- Token Matching: Supports exact and proximity-based fuzzy matching.
- Complexity: High, due to quantum and bitmask operations.
- Memory Usage: Higher, as it stores token proximity data.
- Performance: Efficient for proximity-focused queries but may incur overhead.
- Ideal Use Case: Proximity-sensitive searches where token location within text is crucial.
- Strengths: High relevance for queries involving closely related tokens.
- Weaknesses: May over-prioritize proximity when it’s not critical, and could increase memory overhead.
PT15 (Positional Token 15)
PT15 (Positional Token 15) is a search algorithm developed by the Orama team in 2024, inspired by Thomas Wilkerling’s work on Flexsearch.
The algorithm prioritizes the position of tokens within a document, assigning higher relevance to terms that appear earlier or in key positions.
PT15 uses a fixed set of 15 positional buckets to store tokens, allowing it to efficiently track token locations. Designed for performance in structured text searches, PT15 is particularly effective for cases where the order and placement of terms play a crucial role in determining relevance.
- Primary Focus: Token position within a document.
- Scoring Method: Tokens are stored and scored based on their relative position in 15 fixed buckets.
- Proximity Consideration: Indirect; focuses on token positions rather than proximity between tokens.
- Relevance Criteria: Tokens in earlier positions are given higher scores.
- Handling Long Documents: Scales positions for documents longer than 15 tokens but loses some granularity.
- Token Matching: Supports partial token matching (prefixes).
- Complexity: Moderate, with fixed positional storage.
- Memory Usage: Moderate due to the use of 15-position storage.
- Performance: Fast for small to medium datasets, but may slow down with very long documents.
- Ideal Use Case Position-sensitive searches where token placement is important (e.g., titles, structured text).
- Strengths: Prioritizes key positions in a document, making it effective for structured queries.
- Weaknesses: Limited to 15 positional buckets, which may oversimplify token positions in longer documents.
How to choose
At Orama, we developed QPS after a long process of evaluation and observation on the usage of BM25.
We believe that it could be extremely beneficial for most applications, especially for product documentation, e-commerce, and content management systems.
PT-15, inspired by Thomas Wilkerling’s work on Flexsearch, is another good algorithm to consider for its efficiency and position-aware scoring.
While BM25 is an industry standard and a reliable choice for general-purpose search, QPS and PT15 offer unique advantages especially when used in a browser or memory-constrained environment, where speed and memory usage are critical.
Before taking a final decision on which algorithm to use, we recommend testing each one with your dataset and queries to see which one provides the best results for your specific use case.
If you want to know what search queries do your users perform, you can install the Search Analytics plugin to track and analyze search queries in your Orama instance, for free.