Data Lineage can be defined as the data lifecycle. That lifecycle includes the origins of data and where it moves throughout the course of its life. Through this organizations can gain a better understanding of what happens to data as it moves through different locations, providing more visibility for the purposes of analytics. Companies can also use data lineage to trace sources of specific business data, which enables them to track errors and apply steadier data governance protocols.
Data lineage also provides data and analytics professionals with a visual representation that allows them to view the overall flow of data from its source location to its destination via data integration middleware. This provides a single view of the changes that are applied to data as it makes its way throughout an organization’s data architecture. For the purposes of data integration specifically, data lineage provides a look at how data is manipulated via the ETL (extract, transform, load) process so that data quality assessments can be made before data is loaded into an analytics tool.
Data is never static, which means that data lineage becomes more important as data moves with increased velocity. No doubt there is an incredible amount of knowledge that can be derived from historical data stores. However, in a world increasingly being flooded with self-service technology, fast data is what drives real-time decision making. When we think about the types of data that users want to analyze on a continual basis, 21st century data sources come to mind.
The current state of affairs in the enterprise plays into this, and as new data sources emerge on a daily basis, organizations are afforded the ability to keep track of data as it moves throughout disparate systems. Data lineage is being used in the enterprise more creatively than it was in the past. In the olden days, it was always a matter-of-fact way to track data through the data migration and data integration processes.
The scope of data lineage is broadening among enterprise companies. In large part, this is due to widely expanding governance expectations in companies that hold more strict compliance guidelines. Specifically, these types of companies work with sensitive data, and so more stringent care needs to be taken to safeguard it against the outside world. Data lineage can also be impacted by a company’s data management strategy, reporting capabilities, and any specific data elements unique to the organization. It does much more than provide a continuous view of data as it moves about. With the ability to monitor data on an ongoing process, issues can be thwarted before they have the chance to turn into major problems.
Latest posts by Timothy King (see all)
- The Three Best Data Engineering Books on Our Reading List - April 8, 2021
- The 8 Best Data Engineering Courses and Online Training for 2021 - April 8, 2021
- Trifacta Launches Industry First Data Engineering Cloud - April 8, 2021