This is part of Solutions Review’s Premium Content Series, a collection of contributed columns written by industry experts in maturing software categories. In this submission, CelerData Vice President of Strategy Li Kang offers an overview of real-time analytics use cases so you can stay informed.
In high tech, there is often a gap between hype and reality that can influence user expectations. As it pertains to the influence of artificial intelligence, Stanford University Professor David Cheriton recently said, “When people have asked me over the last 35 years in Stanford what I thought of AI, I say well, it’s a very promising technology. It’s been promising ever since I encountered it and continues to promise, but I think it suffers from being over-promising.”
Cheriton’s AI observations are applicable to expectations around data analytics, where artificial intelligence and machine learning have made it possible to process an enormous amount of data in real-time–under ideal conditions. But for many users, when those capabilities are put to the test in commercial environments at scale, and supported by traditional data architectures, the reality doesn’t match expectations. That is because legacy systems weren’t designed to move data at the speeds necessary for real-time analytics.
Friction Slows Progress
Under those circumstances, the latency associated with ingesting data into a data warehouse, processing data queries, and running data models can prove frustrating to users who expect instantaneous results. Instead, they are met with delays, slowing progress in analytics programs and creating friction between users and IT. At a time when 73 percent of 1500 business leaders surveyed by IDC say purchases of analytics and associated technologies will outpace other software over the next 12 to 18 months, these conflicts risk undermining those investments.
The good news is that real-time analytics is possible, provided those investments include rearchitecting underlying systems in order to remove the friction associated with legacy processes. Closing the gap between expectations and reality means combining a massively parallel processing (MPP) query engine with a streamlined architecture designed to optimize data movement between the components through which data flows. It’s also important to ensure backend tasks like storage, formatting, and organization are handled separately from frontend tasks like metadata management, query planning, and scheduling to ensure friction is further reduced even as data volumes scale.
An optimized architecture provides additional efficiencies, like minimizing process redundancies and reducing reliance on other resources that can cause delays that render results obsolete. And these improvements are not merely theoretical. There are organizations today that have made these changes and have seen significant improvements in their capabilities to achieve real-time analytics in some of the most demanding contexts.
Real-Time Analytics Use Cases
Social Media Platform Monitoring
A large social media and messaging platform with more than one billion users needed to accelerate its ability to analyze data associated with monitoring performance of its backend systems, as well as subscriber transactions occurring within its platform to ensure calls, messages, payments, and other functions were happening efficiently and in keeping with subscriber expectations.
As users, data volume, and the number of supported services and applications scaled, the platform’s traditional data architecture, operating with simple dimensions, began to experience growing latency and protracted analytics cycles. The legacy systems were no longer able to efficiently manage the large volumes of data the network was generating due to increasingly complex processes–as many as 30 million rows per minute, with 100,000 dimension combinations.
The platform’s dimension tables contained huge amounts of data, with a maximum load speed of 3.3 billion records per minute and 3 trillion records per day. Meanwhile, with the old architecture, the analytics system was only able to process a maximum of 330,000 concurrent calls per minute made by the anomaly detection platform during peak hours.
The social media platform updated its data architecture to support four key functions:
- Flexible data modeling to optimize query performance;
- Partitioning and bucketing to accelerate query performance;
- Intelligent, materialized views to aggregate frequently changed data; and,
- Automated, logic-based view choice for queries.
Today that social media platform is able to easily handle current and expected future data volumes with capacity to spare. And with reduced complexity and architecture improvements, query and analytics performance has accelerated to real-time, ensuring the platform delivers optimal performance even at peak usage.
The digital marketing engine for a microblogging and social networking site became overwhelmed by the volume of data and concurrency of millions of daily active users, hindering the site’s ability to effectively identify and deliver personalized marketing to its subscribers. As a result, the site was at risk of being unable to support the demanding needs of its partners and meet expectations that their targeted marketing campaigns would be delivered to the right users in a timely manner in order to maximize response and conversion rates.
The data models used by the site put a heavy load on data clusters, involved complex maintenance, and did not support aggregated data and analytics operations. Furthermore, as the user base continued to grow, concurrency increased latency, resulting in high development costs and diminished extensibility. The site updated its architecture to support:
- Standard SQL;
- Multi-table join queries and aggregation;
- Bitmap data structure, including set calculation, accurate distinct count, and column conversion;
- High-concurrency query on detailed data; and,
- Simplified architecture and easy operations and maintenance.
As a result, the site is now able to run sophisticated analytics to deliver its partner messages to a highly-targeted audience in real-time, leading to higher conversion rates, greater partner retention, and stronger revenue.
A large, well-known e-commerce hospitality brand with more than four million hosts, six million properties, and more than a billion annual users wanted to implement real-time fraud detection and prevention to protect the rights of its host members, build brand trust, prevent revenue losses, and improve overall user experience. To do this, the company’s data scientists needed to develop detection models capable of identifying violations on the platform in real-time, but the existing data architecture was too slow to run powerful analytics and deliver timely results.
The company’s flat table schema was too slow and complex to manage the required data volume, and data freshness was also an issue as the company’s fraud detection models required the most recent data available. To address its needs, the company updated its data architecture including the following:
- Metrics store implementation;
- Optimized query processing;
- Batch, transactional, and real-time data import;
- MySQL and ANSI SQL protocol support, and 98 percent SQL syntax compatibility;
Improvements streamlined data flow to deliver sub-second query performance and reduced response time from more than ten minutes to mere seconds. Now the company can run its fraud detection models in real-time, ensuring all parties are protected against loss.
Real-Time Decisions at Any Scale
In the past decisions were made by collecting and analyzing historical data to try and recognize trends over time and project those results into the future. Such analytics were slow, resource-heavy, and costly. As such, they might be conducted on an annual or quarterly basis.
Today, the speed of business demands that business intelligence be generated moment-to-moment, based on current conditions so that decisions can be made as events require. To be competitive and to make decisions with the highest degree of clarity and confidence possible requires an operational data architecture capable of supporting real-time analytics that provides organizations with the flexibility to choose the data models and BI tools that best meet their needs–even as enterprises scale.
- Four Essential Real-Time Analytics Use Cases to Know - October 28, 2022