Apache Software Announces Arrow; a Top-Level Project

By Tim King , Executive Editor at Solutions Review
Data Integration News,

In a recent press release, the Apache Software Foundation announced a new top level project – Apache Arrow. According to the company, Arrow is a high-performance cross-system data layer for columnar in-memory analytics. Arrow will provide accelerated performance of analytical workloads, in some cases by more than 100 times. In addition, the Big Data tool will enable multi-system workloads by eliminating cross-system overhead communication.

Arrow was initially seeded by code from another project named Apache Drill. However, Arrow was built on top of a number open source collaborations and establishes a de facto standard for columnar in-memory processing and interchange. Code committers to Apache Arrow include developers from a variety of other Big Data projects including Calcite, Cassandra, Drill, Hadoop, HBase, Impala, Phoenix, Spark and others.

Jacques Nadeau, Vice President of Apache Arrow and Vice Presidet of Apache Drill, adds: “The Open Source community has joined forces on Apache Arrow. Developers from 13 major Open Source Big Data projects are already on board –by introducing a new era of columnar in-memory analytics, we anticipate the majority of the world’s data will be processed through Arrow within the next few years.”

In many workloads, 70 to 80 percent of CPU cycles are spent serializing and deserializing data. Apache Arrow solves this problem by enabling data to be shared between systems and processes with no serialization, deserialization or memory copies. Arrow also supports complex data with dynamic schemas. An example of this would be JSON data which is commonly used in IoT workloads, modern applications and log files. Implementations are also available for a number of programming languages including Java, C++ and Python to allow greater interoperability among a number of Big Data solutions.

Parth Chandra, member of the Apache Arrow and Apache Drill Project Management Committees, notes: “Real world use cases often include complex combinations of structured and rapidly growing complex-data. Already tested with Apache Drill, the efficient in-memory columnar representation and processing in Arrow will enable users to enjoy the performance of columnar processing with the flexibility of JSON.”

You can witness Apache Arrow live in the wild at this year’s Strata+ Hadoop World in sunny San Jose California in March.

For Apache’s full press release, click here.

Widget not in any sidebars

This article was written by Tim King on February 24, 2016

Tim King

Executive Editor

Tim is Solutions Review's Executive Editor and leads coverage on data management and analytics. A 2017 and 2018 Most Influential Business Journalist and 2021 "Who's Who" in Data Management, Tim is a recognized industry thought leader and changemaker. Story? Reach him via email at tking@solutionsreview dot com.

The 17 Best API Integration Platforms, Software and Tools for 2024 - October 26, 2023
The 6 Best Geospatial Data Integration Tools to Consider in 2024 - October 20, 2023
The 19 Best Big Data ETL Tools and Software to Consider in 2024 - October 19, 2023

Data Integration News

Apache Software Announces Arrow; a Top-Level Project

Tim King

Executive Editor

Expert Insights

Latest Posts

Categories

Important Links

Useful Pages

Apache Software Announces Arrow; a Top-Level Project

Share This

Tags

Tim King

Executive Editor

Related Posts

The 6 Best Geospatial Data Integration Tools to Consider in 2024

What to Expect at Safe Software’s The Peak of Data Integratio...

What to Expect at Denodo DataFest Americas 2023 on September 12-14

Expert Insights

Latest Posts

Follow Solutions Review