How to Build a Modern Data Architecture to Meet Real-Time Demands
Solutions Review’s Premium Content Series is a collection of contributed articles written by industry experts in enterprise software categories. In this feature, Aerospike Founder and CTO Srini V. Srinivasan reveals how to build a modern data architecture to meet real-time demands.
Behind every fraud flag at PayPal and personalized shopping experience with Wayfair is data — copious amounts of it, serving as the backbone of every transaction. With increasingly larger volumes of data from a growing number of data sources, businesses are experiencing more pressure to provide instant experiences to compete in a market where milliseconds matter. In today’s economy, consumers expect immediate results, and companies must deliver in real time or risk falling behind. With real-time decisioning, organizations can process more instant payments with less fraud; e-commerce sites can grow shopping carts while decreasing cart abandonment; and AdTech companies can match an advertiser’s content with a user’s interests in 50 milliseconds — 640 billion times a day.
According to IDC, over 43 billion devices were connected to the Internet at the end of 2020, creating or replicating over 64 zettabytes of data. By 2025, it’s estimated there will be nearly 52 billion connected devices with more than 180 zettabytes of created or replicated data (IDC). In the sprint to use data in a wide array of digital applications, organizations struggle in two crucial areas: real-time deliverability and scalability. But if organizations implement the elements of a modern data architecture, they’ll be on the path to staying competitive when moments matter.
Modern Data Architecture
Traditional data architectures fall short of meeting real-time data needs because the legacy technologies often found in finance, government and telecommunications today aren’t built for tomorrow. Organizations must upgrade their data stacks with decision-ready architectures supporting speed, performance and scalability. One way organizations with legacy systems can effectively compete is to undertake an application modernization process. This requires a real-time database with a robust Document Store using JSON, for example, that maintains sub-millisecond performance at any scale while allowing developers to work in a familiar environment and language.
Operate as One, But Also as Parts
One foundational requirement of modern data architecture includes massive parallelism. Individual components should operate independently and successfully as units while integrating into a larger, distributed system. Parallelism takes many forms, such as using many storage devices within each cluster node and using several of those cluster nodes, using multi-threading for both transactions and queries, running queries in parallel across sub-parts of the database, and using multiple parallel network queues. Additionally, using multiple clusters for different workloads is recommended. For example, use one database cluster for edge and one for core, where they can be processed independently, and then align them for full functionality. It is also important to use parallel architecture for mixed workloads (e.g., transactional data processing, analysis queries and streaming pipelines).
There’s Nothing Secondary About Secondary Indexing
Once you’ve set up your parallelism architecture and are processing data at scale, the next step to consider is optimizing network bandwidth for the large amounts of data to go from edge to core to cloud. In a distributed system, it’s important to account for the distance the data needs to travel. If not appropriately configured, the system can experience lag and poor performance. Teams should combine massively parallel query processing with secondary indexing to combat this issue. This can reduce — or eliminate — the amount of data bouncing from component to component.
Secondary indexing is a data structure that locates all of the records in a database, or a set (or table) within it, based on a field value in the record. When a part of a record is updated, any applicable secondary index entries are simultaneously updated automatically, resulting in later queries using secondary indexes becoming more efficient. For quicker lookups, you can store secondary indexes in dynamic random access memory (DRAM), Intel Optane persistent memory or flash storage (SSD).
The Need for Speed … & Scale
Modern data architecture is crucial in today’s data-driven world. This is because a significant portion of the data generated, transferred, stored and consumed is in document formats like JSON, which has emerged as the default data model for the web. As data grows, JSON data grows. JSON-supported document-oriented applications are becoming increasingly prevalent and in demand, requiring different components and configurations that enable fast access to data and scalability. Here are a few suggestions to keep in mind when incorporating large-scale JSON applications into a database:
- Ensure that JSON documents are organized efficiently in the database for optimal storage and access. It is important to look for databases with efficient access paths to document-oriented data while providing access to the documents using JSON, Spring Framework, etc.
- Applications should ensure that record keys are denormalized by combining various components like a collection-id, group-id, object-id, etc. Such schema mapping provides efficient access paths to data. A database that performs fast lookups and supports heavy ingestion rates can provide the fast id-mapping response needed for efficient operation.
- For event-id-based documents, group multiple event objects into a single document. This promotes batching of related data and achieves scalability without exploding the sheer number of individual objects.
Organizations that leverage real-time data for innovation — and optimize for deliverability and scalability — are on track for accelerated growth. Without embracing accurate, real-time decisioning through parallel and concurrent access to an adequate amount of recent data, companies will rapidly lose their pace and place in the real-time global economy.