By Yair Weinberger
Data-Driven Businesses are replacing Batch ETL Uploads with Pipelines to Transform Analytics and Business Intelligence
Most organizations are struggling to harness the real-time intelligence from their disparate data silos that’s needed to improve business agility, decision-making and competitiveness. While many are acutely aware that their old, rigid batch-upload architectures are grossly inadequate, they are at a loss to find a solution.
Meanwhile, millions of organizations have placed their data warehouses in the cloud — to benefit from on-demand storage, flexible computing power, and lower costs. Migrating to the cloud, however, does not solve their data pipeline issues.
That’s because traditional batch upload processes are unable to meet the continuous, real-time, data-to-data needs of warehouses and business intelligence applications. Batch processes that upload data once a day, twice a day, or even once an hour have been pushed beyond their life expectancy by today’s real-time environments.
Data Streaming: A Disruptive New Approach
Data streaming is a powerful and innovative solution to organizations’ data pipeline headaches. Much of today’s modern data streaming capabilities are powered by open source technologies, for example, Apache Kafka as a distributed streaming platform. A robust and proven solution creates secure pipelines that stream data in real-time from various sources — notably databases, applications, and APIs — to leading cloud data warehouse platforms.
Leading organizations in virtually every industry are moving to a real-time data model to be more agile and achieve a competitive edge through faster and better decision making.
Data Streaming in a Straightforward Two-Step Process
Step one involves importing data into the platform from all sources, such as: transactional databases (e.g. Oracle, Postgresql) Salesforce (account information, stage, ownership etc); website tracking (all web event data); web server (customer activity such as adding inputs, and deploying new code in the code engine); backend logs (internal platform events such as data being loaded to the output, new table created); and monitoring systems (to capture system issues such as input connections and latency). This is accomplished by selecting the desired inputs, authentifying user(s) credentials and setting up how files will be imported.
Step two is integration. From here, the data can be analyzed using popular business intelligence tools.
Achieving Real-Time Intelligence
Many enterprises lack a central warehouse for data available from backend relational databases, online events and metrics, support services, and other internal and external sources. Primarily because it is spread across multiple sources and different systems in different formats. Some of it’s flat, some is relational, some is JSON, and writing custom scripts to integrate it all is beyond the resources of most companies. Without centralization, however, analytics are both piecemeal and siloed. This makes it difficult, if not impossible, to produce real-time intelligence.
Yair Weinberger is the Co-founder and CTO at Alooma. He’s an expert in data integration, real-time data platforms, big data and data warehousing. Previously, he led development for ConvertMedia (later acquired by Taboola). Yair began his career with the Israel Defense Forces (IDF) where he managed cyber security and real-time support systems for military operations.
- What’s Changed: 2023 Gartner Magic Quadrant for Integration Platform as a Service - January 31, 2023
- 10 Top Data Engineering Best Practices Generated by ChatGPT - January 30, 2023
- The One Azure Data Engineer Expert Certification to Rule Them All - January 13, 2023