From Data Lake to Data Hub: What is the Road Forward?
By Emily Washington
The novelty of data lakes may have worn off, but their value for data consumers continues to grow. Data lakes’ intrinsic ability to house various types of data, along with their diversity of usage, make them a must in the digital age. They help businesses solve scalability and data duplication issues, resulting in increased information use and sharing, and reduced costs through server and license reduction.
However, while conventional data lakes are still the norm for many organizations, data storage technologies continue to evolve at a record pace, with traditional Apache Hadoop architectures being supplanted by cloud-hosted data storage platforms such as AWS and Microsoft Azure. These new “go-to” storage systems deliver organizations even greater flexibility and scalability to meet modern data storage needs.
As the data storage landscape rapidly changes, it is critical for businesses to understand the benefits of cloud-hosted storage systems and how they can leverage the benefits of the cloud to create a data hub.
The Benefits of Data Storage Migrating to the Cloud
Businesses are launching more artificial intelligence (AI) and machine learning data projects for predictive analytics to gain advantage in a competitive landscape. Organizations now require support for systems beyond Hadoop Distributed File System (HDFS), and need the ability to process additional data sources like S3. Cloud platforms typically offer organizations easier, more affordable and agile options for data storage than “traditional” big data storage solutions like Hadoop.
As organizations shift from viewing data as a static resource to data in motion, cloud-hosted data storage systems reduce costs and have the ability to scale elastically. In addition, by leveraging cloud-hosted platforms and creating a data hub, businesses can blend and distribute data in various formats instead of simply storing it all in one place.
From Data Lake to Data Hub
Traditional Hadoop data lakes store data of all formats in one place for availability, but require data users to process and derive value from that data. Data hubs allow businesses to collect and store data from multiple sources in a central repository, with that data organized for distribution, sharing and advanced analytics. Users can also further enhance the value of data sets through de-duplication, data quality scores and a standardized enterprise data exchange.
Data hubs provide a centralized data resource within an enterprise, helping to minimize the many data requests from business users to IT. With a data hub, businesses can organize and visualize crucial details of an organization’s data assets from business terms to reports. The result is a comprehensive view of their data landscape, allowing users to define and easily understand data and associated business terms, quickly track data lineage and efficiently manage all aspects of their data assets. However, a data hub doesn’t just come together overnight.
Building a Data Hub
When a business develops an enterprise data hub, data governance is a critical element. As an enterprise data hub is planned and executed, organizations must invest significant time and energy defining what their data means, where it comes from and what kind of transformation it needs to go through before mapping it to the data hub. If that data is not governed simultaneously, all the metadata involved quickly grows stale. Data governance captures and curates the metadata while the enterprise data hub is being built.
Data challenges will arise in every enterprise data hub initiative, regardless of organization or industry. With a comprehensive data governance framework in place, organizations can proactively manage and mitigate data issues and solve any problems before they impact the business. With proper governance and a properly constructed data hub, organizations can further empower their business users to turn data assets into actionable insights and competitive advantage.
Emily Washington is senior vice president of product management at Infogix, where she is responsible for driving product strategy, product roadmaps, product marketing and vertical solution initiatives. Since joining Infogix in 2002, Emily has worked closely with product development teams and customers to drive introduction and adoption of all new products. Connect with her on LinkedIn.