Databricks released the code for Delta Lake and MLflow

During the Data + AI Summit Databricks unveiled through an advertisement, which would free up the entire Delta Lake storage framework open source under the supervision of the Linux Foundation.

It is worth mentioning that Delta Lake has been a Linux Foundation project since October 2019 and it is the open storage layer that brings reliability and performance to data lakes through “lake architectures”, the best of data warehouses and data lakes under one roof.

Over the past three years, Lakehouses has become an attractive solution for data engineers, analysts, and data scientists who want the flexibility to run different workloads on the same data with minimal complexity and no duplication, from analytics from data to the development of learning machines. Delta Lake is the most used lake house format in the world and currently sees over 7 million downloads per month (and growing).

“From the beginning, Databricks has been committed to open standards and the open source community. We have created, contributed, fostered growth and donated some of the most impactful innovations in modern open source technology,” said Ali Ghods

That means There will no longer be functional differences between the Delta Lake brand of Databricks and the open source version. The company said it will similarly release its recent enhancements to the MLflow machine learning operations platform and open source Apache Spark analytics framework. Databricks has also rolled out several new features to its main Lakehouse data lake.

“Before Delta Lake, technologies like Spark processed huge amounts of data; Delta Lake allows you to process small deltas with all changes stored in history so you can go back and forth,” said Ali Ghodsi Co-Founder of Databricks and CEO of Databricks. "This is important for audit trails and compliance so you can go back and find the decisions you made a year ago."

In addition, it should be noted that new version 2.0 of Delta Lake features better query performance and a foundation based on open standards. The release candidate is now available and is expected to go into a general release later this year.

Databricks said that update reflects contributions from over 6400 developers and noted that total commits have grown 95% with the average number of lines of code per commit increasing 900% over the last year.

About us also announces version 2.0 of MLflow, a platform for managing machine learning projects. The launch includes Pipelines, a new feature to speed up and simplify machine learning model deployments. Pipelines provide data scientists with predefined, production-ready templates based on the type of model they are building to enable faster and more reliable model development without requiring intervention from production engineers.

Users can define the pipeline elements in a configuration file and MLflow Pipelines manages the execution automatically, the company said. Databricks has also added serverless model terminals to directly support production model hosting, as well as built-in model monitoring dashboards to help teams analyze real-world model performance.

“The Delta Lake project is experiencing phenomenal activity and growth trends that indicate the developer community wants to be a part of the project. Contributor strength has increased by 60% over the last year and growth in total commits has increased by 95% and the average line of code per commit has increased by 900%. We are seeing this upward velocity from contributing organizations such as Uber Technologies, Walmart, and CloudBees, Inc., among others.” —Executive Director of the Linux Foundation, Jim Zemlin.

If you are interested in knowing more about it, you can check the details In the following link.


Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: Miguel Ángel Gatón
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.