Databricks has announced the release of Delta, a data management platform that combines the most sought-after features of data lakes, data warehouses, and streaming systems. Delta will act as a component inside the company’s Unified Analytics Platform. The solution was purpose-built so organizations can avoid the cumbersome extract, transform, and load (ETL) process. According to Databricks, the tool touts the cost benefits of data lake, reliability of a data warehouse, and the low latency of an ingest system.
The tool can simplify data pipelines by allowing Delta tables to be used as a data source and sink. Tables provide transactional guarantees for multiple concurrent writers, batch, and streaming jobs. In addition, a streaming data warehouse returns a recent, consistent view of the writes. Upserts in Delta provide a clean way to change data after it has been written, instead of running the entire job again.
Delta automates performance management, and a self-optimizing data layout ensures that data queried together is stored in the same location. It also automates the compaction of small files for efficient reading. Intelligent data skipping and indexing capabilities bypass unneeded data, and automated caching hastens subsequent reading.
In a press statement, the vendor’s co-founder and chief executive Ali Ghodsi explained: “With this unified management system, enterprises now benefit from a simplified data architecture, up to 100x increase in query performance, and faster access to relevant data – increasing their ability to make decisions that drive results. We have solved a massive struggle facing organizations that are on a mission to run their business in real-time.”
The announcement was made at Spark Summit Europe 2017.
- The 12 Best Big Data Courses and Online Training for 2022 - January 19, 2022
- The 8 Best Data Quality Management Tools and Software for 2022 - January 19, 2022
- The 5 Best Big Data Tutorials on YouTube to Watch Right Now - January 19, 2022