Big Data in the Blender and Soon it Will Render a Virtual Concoction that Helps You Hang On
One of our favorite bloggers when it comes to data integration is Loraine Lawson at IT Business Edge. Last week Ms. Lawson focused in on the concept of “data blending” which by some experts’ definition represents a possible replacement for classic ETL (Extract, Transform and Load). In the piece, Ms. Lawson highlights how some data integration solution providers like Tableau Software are using the term data blending to describe a process for “blending data from multiple data sources on a single worksheet.”
By seeing the idea of data blending through the lens of “Big Data” – a popular topic with Ms. Lawson – she notes that concept “is generating a lot of discussion as a next step for companies investing in Big Data.” A case in point being the recent release of Pentaho’s new Business Analytics 5.0 platform.
From the Pentaho perspective, “blending big data ‘at the source’ maintains the appropriate level of data governance and security necessary for accurate and reliable analysis. In contrast, the more common end user blending “away from the source” approach lacks the ability to audit and cannot ensure correct inferences from the data. Pentaho 5.0 enables analysts to create cleansed, architected blends directly from diverse big data sources with the ease of use and real time access demanded in today’s agile analytics environments.”
The solution the Pentaho uses is actually quite interesting in that, “the main problem we (Pentaho) faced early on was that the default language used under the covers, in just about any business intelligence user facing tool, is SQL. So we figured that it might be easiest if we would translate the SQL used by the various BI tools into Pentaho Data Integration transformations. This way, Pentaho Data Integration is doing what it does best, not directed by manually designed transformations but by SQL.” An interesting proposition that results in the creation of a virtual “database with tables”.