Ferraro Consulting POV - Cloudera’s Time Has Come
Coverage from Cloudera EVOLVE24 Milano – October 10, 2024.
Cloudera’s time has come. With more than 25 exabytes of data under management, they rival some of the hyperscalers with their amassed customer base and data expertise. Their customer relationships appear alive and well, their data platform product is solid thanks to years of investments, and advancements in the market for key open source innovations. Their go-to-market strategy seems to be in full swing with the right consulting partners, the right concern for customers, and a full portfolio of accelerators to speed deployment. Now, with big bets and big investments in AI, they may be poised to be a leader in the great AI revolution.
At the Cloudera EVOLVE 24 event in Milan, Italy, I had a chance to take a much deeper look at the product and company strategy. Four years ago, I began seeing the company make their move from a big data vendor to positioning as an enterprise data platform. At the time, I remember looking at their marketecture slides and realizing immediately that an open data platform was the right vision, but the product was not yet there.
At the same time, I was having conversations with MapR when they were being acquired by HPE in 2019. It was clear that the promises of big data were waning, and the true value of open-source data platforms was emerging. My suggestion at the time was to begin moving toward the concept of a data operating system; a data OS where all the complexities of data engineering would be abstracted away, much like the operating system of a laptop computer or phone, and users would no longer have to concern themselves with the OS below. Customers would then be free to choose an operating system based on the simplicity of the user interface and the breadth of applications that could be deployed on the platform.
It was also in 2020 that I saw the worlds of the data warehouse and data lake colliding to form what I called “unified analytics.” While the data OS was emerging from beneath, customers were tired of managing both a data lake and a data warehouse. Data lake vendors were adding data warehouse capabilities, data warehouse vendors were adding data lake capabilities, and access vendors were unifying analytics for data across all sources.
Cloudera was unique at the time, because their vision was to build out both the data OS and unified analytics. Fast forward to 2024, open source software has matured beyond what we could ever have imagined. Everything is now in place to realize the Cloudera vision; and they are making it real. Because they began pursuing this vision in 2019-2020, their progress on the data OS enables them to make data ready for AI, and their unified analytics make them ready to operationalize generative AI, AI applications, and AI agents.
Recent Cloudera Announcements
The recent Cloudera event in Milan highlighted several recent announcements that solidified their commitment to their vision and seemed to resonate with the Italian and Southern European market present at the event. The announcements included a joint AI offering with NVIDIA, new ecosystem partnerships, proof points that they are delivering on their promises around being a “hybrid” platform, and a deepening relationship with Snowflake, especially around Iceberg. The sum of these announcements demonstrates a strong commitment to a very singular vision that Cloudera summarizes under the “hybrid” moniker. However, I think the Cloudera vision is better understood as a unified analytics platform built on three pillars of product strategy: open, accessible, and hybrid.
Optimized Model Deployment Across Hybrid Environments
At the core of Cloudera’s first announcement is NVIDIA NIM™, a set of easy-to-use inference microservices for accelerating the deployment of foundation models on any cloud or data center and helping to keep data secure. An AI inference service orchestrates all that is necessary to run AI models on NVIDIA hardware without having to worry about the technical details of setting up and running the underlying hardware and software. The joint service is integrated with Cloudera’s AI Model Registry enhancing security and governance by managing access controls for both model endpoints and operations. Users benefit from a unified platform where all models—whether LLM deployments or traditional models—are seamlessly managed under a single service. So, what? According to Cloudera, customers can now deploy models at 36 percent less in cost.
Expanded AI Ecosystem for Unified Customer Solutions
Adding to their existing AI ecosystem partners including NVIDIA, Amazon Web Services, and Pinecone, Cloudera announced new and deeper partnerships with Google Cloud, Anthropic, and Snowflake. In these partnerships, Cloudera continues to expand its portfolio of accelerators called AMPs. An AMP is an Accelerator for Machine Learning Projects.
With Google Cloud, Cloudera’s DataHub platform, which serves as the data foundation for building AI applications, now runs on Google Cloud infrastructure. In addition, since Vertex AI is at the core of Google’s AI offering, especially around the Vertex AI Model Garden, Cloudera released an AMP entitled “Summarization with Gemini from Vertex AI” to help customers quickly deploy a use case that takes advantage of the cost effectiveness and performance of Gemini Pro Models accessed from the Vertex AI Model Garden via API. This particular AMP should speed the time to insight for customers wanting to run their generative AI applications on a core AI engine with all of their general AI applications.
With Anthropic, Cloudera customers can leverage Claude large language models (LLMs) for code generation, vision analysis, data insight, and text generation use cases. Built on Anthropic’s LLMs, Cloudera released an AMP entitled “Image Analysis with Anthropic’s Claude LLM” that condenses the time it takes to develop a production image analysis application. In addition, Cloudera will utilize Claude as its default foundational model for the Cloudera AI Coding Co-pilot.
Reduced Snowflake Total Cost of Ownership with Iceberg-as-a-Service
In a strategic, cost-cutting move for Snowflake customers, Cloudera has extended access to the company’s Open Data Lakehouse via its Apache Iceberg REST Catalog, reducing complexity and cost related to data preparation and data streaming technologies used today. Snowflake users can access Iceberg tables directly on the Cloudera platform, without having to replicate data. In addition, Snowflake users can now query data stored on Cloudera’s object storage solution, directly from Snowflake. Because the integration of the two platforms is bidirectional, Cloudera customers can also utilize Snowflake’s high-performance business intelligence capability.
Continued Commitment and Delivery on “True Hybrid”
Cloudera claims to be the only “true hybrid” platform with the ability to move workloads freely across all location types, including multi-cloud and on-premises infrastructure. The company demonstrated investment, progress, and roadmap in six areas associated with “true hybrid”: a unified platform, a hybrid control plane, open data lakehouse, unified security and governance, federated data access, and support for ARM-based processors. With this announcement, Cloudera continues to show the market its ability to wrap a broad range of open source and multi-cloud technologies with its enterprise-class data platform capabilities. Customers can expect that any new product brought into the fold will have the same enterprise capabilities as they have expected from Cloudera’s development all the way back to big data days.
While Cloudera is making significant progress on their product and platform, they have faced one other challenge for the past several years. How do you take a company known for Hadoop and big data, then change the perception of buyers to see you as a data platform and platform of the future for AI?
At the event, a utility customer took the stage with a powerful set of data to support the coming tsunami of power consumption being ushered in by AI. While the audience was trying to grasp the magnitude of power increase required by an AI revolution, they casually projected a slide of their journey with Cloudera starting all the way back in the days of Hadoop and big data. They described the entire journey as “a story of optimization,” with each step streamlining data and intelligence operations. The customer has gone from Hadoop, to columnar, to cloud, to autoscaling, to Spark, to Iceberg, every step improving the step before, and all on Cloudera. For me, it was a cathartic experience. It was the first time I realized how Cloudera had modernized, and maintained relevance in the eyes of their customers.
While little was mentioned regarding the longer term Cloudera vision, it was clear that they want to become the de facto hybrid platform for AI deployments. Once again, while there is more work to be done to become “defacto,” it is the right vision. I have seen Cloudera execute on their vision before, they have the potential to do it again.