Sisu Data Founder and CEO Peter Bailis walks us through three accelerating data and analytics trends due to the COVID-19 Coronavirus pandemic.
There’s no need to restate that the first half of 2020 is and will continue to be an uncertain time for businesses. We’ve learned how to collaborate remotely and connect with our customers in new and meaningful ways. But what we have not discussed is how we have to change the ways we measure the impact of these changes. As we adapt to the pressures of a rapidly changing economy, we see three accelerating data and analytics trends due to the current environment.
First, proactive diagnostic analytics will overtake predictive modeling and data science as the path forward for successful companies. Second, we see fully cloud-native data architectures overtaking the current hybrid model. And third, we see a future where diagnostic analytics stops looking like manual exploration and starts looking a lot more like search.
Proactive analytics will overtake predictive modeling
Trusting predictions is a treacherous path forward for companies who need to make rapid, high-risk decisions. Moreover, rebuilding data models for a new set of assumptions presents a myriad of problems for responding to business changes quickly.
First, the cost of acquiring sufficient data to tune new models is often prohibitive, and the timeline to build, test, and tune new models is not short. Second, for that cost the accuracy you can expect from a new model will be too low to show material ROI. A general rule of thumb is that it takes exponentially more training data to deliver incremental improvements in accuracy. And most importantly, the rate of change we expect to see from markets and customer behaviors over the next few months make “prediction” challenging, no matter what the method.
While the value of predictive models rapidly decays, it doesn’t mean analytics teams are without the data necessary to inform decisions across the business. Companies capture the rich transaction, customer, and operations data they need every day, from data sources that are accurate to the minute.
Fortunately, the technology is available today to automate the exploration of these vast tables and rapidly test millions of hypotheses. Instead of requiring an analyst to manually test features or carefully construct complex queries, these platforms start with a KPI and diagnose which populations in the data matter — and then proactively recommend to the analyst where they should prioritize their attention.
When the pace of decision-making is accelerating, businesses need to act proactively based on their data. These new proactive platforms not only augment a data team’s ability to work effectively with all their data, but they can inform more daily decisions with data and prevent the inevitable missteps of “gut feel” decision-making.
Cloud-native will overtake hybrid architectures
IDC’s David Reinsel has postulated that by this year, “more data will be stored in core enterprise storage environments than in all the world’s existing endpoints.” But, without fundamental changes to how analytics teams and operational leaders access, investigate, and explain this data, enterprises will miss the opportunity to gain any advantage from these rich stores of information.
While hybrid data models have allowed large organizations to bridge the gap between legacy systems and more flexible data tools, the complexity has unfortunately built a chasm between business users and the data they need to drive informed decisions. Where we see the most effective collaborations between data teams and the business are in organizations that have made the decision to build a fully cloud-native stack from the ground up.
It’s a far simpler architecture, and this simplicity makes it easier to both handle massive transactional datasets and make the information readily available to engineering, data science, analytics, and the business. There are basics, but the blueprint looks a lot like this:
- Cloud-based business applications (Salesforce, Heap, Amplitude)
- Pipeline tools like Fivetran and Stitch extracting and loading data to a single cloud warehouse
- At the center, a cloud-native warehouse like Snowflake, Redshift, BigQuery, or Synapse
- Teams transform the data for diagnostics and reporting with translation tools like Matillion and dbt
- And rapid analysis of rich datasets with tools like Sisu and Databricks
Exploratory analytics begins to resemble search
A company’s ability to effectively explore its data and find useful facts is a true competitive advantage. But, when analyst resources are at a premium, organizations need tools that can proactively seek out and recommend the data that’s having an impact. With the rich breadth of data available and the added pressure of fast-moving markets, the current model of manually digging through data, testing individual hypotheses, and generating static reports is labor-intensive, slow, and fails to scale to the hundreds of columns and millions of rows found in most datasets.
In response, I see an emerging class of tools that are tackling this diagnostic task from a completely different angle. Instead of starting from the raw data and making biased guesses about where to start, these new platforms take a declarative approach. They start from the metric that matters and then proactively explore the hypotheses that explain any change.
To draw a familiar parallel, look no further than the modern search engine. Search allows users to locate highly relevant content, drawn from a corpus of billions of documents, with only a few simple search terms. In addition, these engines improve over time, learning from behavioral signals like clickthrough rates and related pages.
The advantage of this approach is also rooted in the lessons of search. Results from this analysis can be rapidly stack ranked, interesting populations rise to the surface, and competitive hypotheses are tested and presented in parallel. The upshot is that analysts now spend more time making the answers actionable and less time digging through the data.
Addressing the analyst’s dilemma head-on
The analyst’s dilemma is simple: there’s too much data, and too little time. Our ability to collect rich, structured tables has outstripped our ability to find useful answers in them.
Today, companies need to make critical decisions about immediate sustainability, and those that can leverage the data they have in the moment will separate themselves from those who can’t. The most successful leaders will seize on any opportunity to proactively use detailed facts about the current performance of their business to ensure they have a path forward to their future.
Want more data and analytics trends to consider? Read Five Can’t-Miss Analytics and Business Intelligence Trends for 2020, or watch the video below.
Latest posts by Peter Bailis (see all)
- Three Accelerating Data and Analytics Trends Due to COVID-19 - June 26, 2020