Data warehousing is a set of techniques and technologies that aggregate data from one or more operational systems. They provide the unique ability for analyzing data from multiple sources and typically play host to relational database technologies. They consist of data that has already been integrated, but they are limited in the size and scope to what they can actually hold. Data warehouses are stored in an enterprise mainframe server that can either by physical or virtual (cloud).
There are several methods for integrating data from the source to the data warehouse, but data warehouses are the final destination for data before its mined by a BI or analytics solution. Data warehouses hold structured data that is most often transactional in nature. Companies commonly set up their data warehouses to be subject oriented so that analysts have a defined location to consult when searching for specific information.
Data warehouses allow enterprises to run analysis on the data they collect. Databases give organizations the ability to funnel structured data into silos so that they have access to it on-demand. However, the need for warehousing often becomes prominent when analytic requirements become too burdensome on an operational database. Data warehousing is a highly governed way of storing data, as users have control over segmenting data into stores based on type.
Data isn’t loaded until users have a defined use for it. This is positive for data architects in that the data is easy to understand and formatted in a way that helps them easily answer the questions they’re asking. Data warehouse technologies transform data as it is being injected into the database, a process known as schema on-write. Data warehouses have trouble hosting data from sources that are not easily structured. Non-traditional data types, such as those that come from social media websites or embedded product sensors are two examples.