Yandex released the source code of its DBMS «YDB»

Recently the news broke that Yandex released the source code of its DBMS, «YDB», which implements support for the SQL dialect and ACID transactions.

the DBMS was built from the ground up and was initially developed with an eye toward ensuring fault tolerance, automatic failover, and scalability. It should be noted that Yandex launched working YDB clusters, including more than 10 thousand nodes, which store hundreds of petabytes of data and serve millions of distributed transactions per second.

Main novelties of YDB

Of the features that stand out from YDB is the use of the relational data model with YQL tables (YDB Query Language) is used to query and define the data schema, which is a dialect of SQL adapted to work with large distributed databases. When creating a storage schema, a tree-like grouping of tables is supported, which resembles the directories in a file system. An API is provided for working with data in JSON format.

The ability to create fault tolerant configurations that continue to function when disks, nodes, racks, and even individual data centers fail. YDB supports synchronous deployment and replication across three Availability Zones while maintaining the state of the cluster in the event of a failure of one of the zones.

Data Access Support using scan queries, designed to perform ad-hoc analytical queries on the database, executed in read-only mode and returning a grpc stream.

In addition, it also stands out storing data directly on block devices using the PDisk component native and the VDisk layer. In addition to VDisk, DSProxy runs, which analyzes the availability and performance of disks to exclude them if problems are detected.

Of the other features that stand out:

  • A flexible architecture that allows you to build various services on top of YDB, right down to virtual block devices and persistent queues. Suitability for different types of workload: OLTP and OLAP (analytical queries).
  • Support for multi-user (multi-tenant) and serverless configurations.
  • Ability to authenticate clients. Users can create their own virtual clusters and databases on a common shared infrastructure, considering resource consumption in terms of number of requests and data size, or by renting/reserving certain computing resources and storage space.
  • Possibility to adjust the useful life of the records for the automatic deletion of obsolete data.
  • Interacting with the DBMS and submitting requests is done using the command line interface, the integrated web interface, or the YDB SDK, which provides libraries for C++, C# (.NET), Go, Java, Node.js, PHP and Python.
  • Automatically recover from failures with minimal delay to applications and automatically maintain specified redundancy when storing data.
  • Automatic creation of indexes on the primary key and the ability to define secondary indexes to improve the efficiency of arbitrary column access.
  • Horizontal scalability. As the load and size of data stored grows, the cluster can be expanded simply by connecting new nodes. Compute and storage tiers are separate, allowing you to scale compute and storage separately. The DBMS itself monitors the even distribution of data and load, taking into account the available hardware resources. It is possible to implement geographically distributed configurations that cover multiple data centers in different parts of the world.
  • Support for a strong consistency model and ACID transactions when processing queries that span multiple nodes and tables. To improve performance, you can selectively disable consistency checking.
  • Automatic data replication, automatic partitioning (partitioning, sharding) when size or load increases, and automatic load and data balancing between nodes.

Finally, it should be noted that YDB is used in Yandex projects, the code is written in C/C++ and is distributed under the Apache 2.0 license, you can see the source code, as well as more details about it In the following link.


Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: Miguel Ángel Gatón
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.