A few days ago, the OpenSearch Software Foundation, supported by the Linux Foundation, announced, through a announcement, the launch of OpenSearch 3.0, a version that marks the evolution of the project born as a fork of Elasticsearch and Kibana.
Since its inception in 2021, OpenSearch has positioned itself as a truly open-source alternative to the Elastic ecosystem, operating under the Apache 2.0 license, in contrast to Elasticsearch's shift to the AGPLv3 license.
Key new features in OpenSearch 3.0
The most notable new feature of this version is the incorporation of the OpenSearch Vector Engine, an engine designed to handle data used in machine learning and semantic search systems. This engine enables GPU-accelerated vector searches, achieving significant performance improvements: 9.3 times faster indexing and a 3.75 times reduction in operating costs compared to purely CPU-based solutions.
The system also supports the MCP protocol (Model Context Protocol), which allows you to integrate OpenSearch with AI agents and LLMs, including Anthropic, LangChain, and OpenAI, opening the door to new use cases focused on artificial intelligence and conversational systems.
OpenSearch 3.0 incorporates several optimizations that boost the overall performance of the engine and one of the most notable is the Improved range queries, now 25% faster Thanks to a more efficient strategy for handling numeric fields and dates. For high-cardinality cases, execution hints for aggregations have been introduced, which has reduced p75 latency by 90% in benchmark tests compared to previous versions.
In addition to this, the separation of indexing and search traffic It is one of the key functions for clusters with remote storage, since allows you to scale independently, Isolate faults and optimize read-only configurations using the new _scale API. Additionally, support for star tree structures improves aggregations in high-cardinality scenarios, reducing query workload by up to 100 times.
Improvements to search types
At Vector search, a new explanation parameter has been added for Faiss, which allows for the breakdown of k-NN query scores, helping to understand how results are prioritized. This is in addition to the update to the BM25 scoring function, which now uses BM25Similarity by default to align with current Apache Lucene optimizations. Additionally, segment size optimizations have contributed to a 20% reduction in tail latencies.
In hybrid search, statistical normalization techniques such as Z-score normalization and new min-max thresholds have been implemented, which help generate more consistent results and avoid amplification of irrelevant scores.
Among other improvements, OpenSearch 3.0 includes:
- The PPL language has been extended with join and subquery commands, improving data exploration through record correlation and advanced filtering.
- The new Live Query API enables real-time monitoring, while the observability experience is enriched with optimized flows for anomaly detection, making it easy to trigger them contextually from the main dashboard.
- The traditional Java Security Manager has been replaced with a Java agent, which intercepts privileged calls and verifies permissions more efficiently. This improves cluster performance and reduces internal overhead.
- A new PGP public key has been added to strengthen artifact verification starting with version 3.0.
- Lucene updated to version 10, which improves parallel processing and full-text indexing.
- Support for Java Platform Module System, with Java 21 as the minimum required version, allowing modularization of system components.
- Native support for MCP, which strengthens the integration of AI agents into business flows.
- Introduction of direct data extraction mode from streams such as Apache Kafka and Amazon Kinesis, facilitating real-time analysis.
- A planning-execution-reflection agent, designed to tackle complex tasks through iterative steps, very useful in autonomous environments or self-service systems.
- Enabling by default the segment-wise parallelization mode for k-NN vectors, with up to a 2.5x increase in query performance.
Finally, if you are interested in knowing more about it you can check the details in the following link