Solutions Review has compiled the most complete Data Integration glossary of terms available on the web. With over 50 terms defined and growing daily, this resource is sure to help keep you hip to all the latest and greatest lingo in enterprise integration and ETL. Organizations are increasingly looking for solutions that provide them with Data Virtualization capabilities and the ability to combine Data Lakes with their existing platforms since the overbearing expectation is that Data Integration will become cloud and on-premise agnostic.
This evolution makes terminology and vocabulary an integral part of keeping up to date with all the changeover. Be sure to bookmark this page and check back on a regular basis as this page will see ongoing updates.
Access Path: The track chosen by a database management system to collect data requested by the end-user.
Analytics: The discovery of meaningful patterns in data, usually revealed by an analytics software solution.
Application Integration : The sharing of processes and/or data among different applications within an enterprise using real-time communication. This is typically done to increase efficiency and enhance scalability between business applications.
Big Data: Extremely large data sets that may be analyzed to reveal patterns and trends and that are typically too complex to be dealt with using traditional processing techniques.
Business Intelligence: A process for analyzing data and presenting actionable insights to stakeholders in order to help them make more informed business decisions.
Change Data Capture: Capturing the changes made to a production data source, typically performed by reading the source DBMS log.
Citizen Data Scientist: Business analysts and other personnel that may have experience working within an organization’s data architecture and using software tools to derive valuable business insights from stored data.
Cluster: A means of storing data together from multiple tables when the data contains common information that is needed for analysis.
Connector: Software used to create a data connection. A synonym for middleware.
Customer Data Integration: A process comprised of solutions for recognizing a customer at any touchpoint, providing up-to-date knowledge about the customer and delivering it in an actionable form.
Database: A collection of data that is purposefully arranged for fast and convenient search and retrieval by business applications and Business Intelligence software.
Data Blending: Provides a fast and straightforward way to extract value from multiple data sources to find patterns without the deployment of a traditional data warehouse architecture.
Data Cleansing: Transforming data in its native state to a pre-defined standardized format using vendor software.
Data Federation: Process where data is collected from distinct databases without ever copying or transforming the original data.
Data Governance: The management of the availability, usability, integrity and security of the data stored within an enterprise.
Data Integration: The combination of technical and business processes used to combine data from disparate sources into meaningful insights.
Data Lake: A storage repository that holds a large amount of raw data in its native format until it is needed.
Data Lineage: Referred to as the data life-cycle, which includes the origins of the data and where it moves over time, describing what happens to data as it goes through diverse processes.
Data Management: The development and execution of architectures, policies and practices to manage the data life-cycle needs of an enterprise.
Data Mapping: Data mapping is the process of creating data element mappings between two different data models and is used as a first step for a wide array of data integration tasks, including data transformation between a data source and a destination.
Data Mart: A simple data repository that houses data of a specific discipline.
Data Migration: The process of moving data between two or more storage systems, data formats, warehouses or servers.
Data Mining: Extracting previously unknown information from databases and using that data for important business decisions, in many cases helping to create new insights.
Data Modeling: A method used to define and analyze the data requirements needed to support an entity’s business processes, defining the relationship between data elements and structures.
Data Quality: Refers to the level of “quality” in data. If a particular data store is seen as holding highly relevant data for a project, that data is seen as quality to the users.
Data Replication: The frequent copying of data from a database to another so that all users may share the same level of information, resulting in a distributed database that allows users to access data relevant to their own specific tasks.
Data Science: A field of study involving the processes and systems used to extract insights from data in all of its forms. The pfofession is seen as a continuation of the other data analysis fields, such as statistics.
Data Virtualization: A Data Integration approach that allows applications to retrieve and manipulate data without requiring technical details about the data. Virtualization is seen as an alternative to the traditional ETL process.
Data Warehouse: A system used for data analytics. They are a central location of integrated data from other more disparate sources, storing both current (real-time) and historical data which can then be used to create trends reports.
Decision Support System (DSS): A computer-based system that supports organizational decision making activities. Oftentimes, this type of system is used when data is changing rapidly or is not easy to extrapolate.
Enterprise Data Warehouse (EDW): A database environment dedicated to providing a single comprehensive view of an enterprise.
Enterprise Service Bus (ESB): A software architecture model used for creating and facilitating communication between mutually interacting software solutions inside a service-oriented architecture.
Extract, Transform, Load (ETL): In managing databases, extract, transform, load (ETL) refers to three separate functions combined into a single programming tool.
Federated Database: A system in which multiple databases appear to function as a single entity. However, the databases typically involved in this kind of system exists independently of the others. Once the different databases are “combined”, one federated database is formed.
Hadoop: A programming framework that supports the processing of large data sets in a distributed computing environment.
Integration Platform as a Service (iPaaS): A suit of cloud services enabling the execution and governance of Data Integration flows connecting to on-premise and cloud-based processes.
Legacy Solution: An old or outdated software tool.
Machine Learning: A type of artificial intelligence that provides computers with the ability to learn without being specifically programmed to do so, focusing on the development of computer applications that can teach themselves to change when exposed to new data.
Master Data Management: An umbrella term that incorporates processes, policies, standards, tools and governance that define and manage all of an organization’s critical data in order to formulate one point of reference.
Metadata: Metadata describes other data within a database and is responsible for organization while a business or organization sifts through data sets.
Operational Data Store (ODS): An integrated database environment designed to support operational monitoring.
Real-Time Analytics: The ability to use all available enterprise data as needed and usually involves streaming data that allows users to make business decisions on the fly.
Relational Database Management System (RDBMS): A system used to store data manged in relational tables, typically organized according to the relationship between different data values.
Scalability: The ability to increase volumes of data and the number of users to the data warehouse, which is critical for the data and technical architectures of the enterprise.
Schema: The structure that defines how data inside a database is organized.
Service Level Agreement (SLA): A contract between a service provider or vendor and the customer that defines the level of service expected. SLAs are service-based and specifically define what the customer can expect to receive.
Snapshot: View of a data set at a particular instance in time.
Software as a Service (SaaS): A software delivery model in which software is licensed on a subscription basis and is centrally hosted and typically accessed by end-users using a client via web browser.
Source System: Defining the relationship and data flow between source and target objects.
Structured Query Language (SQL): The accepted standard for relational database systems, covering query, data definition, data manipulation, security and additional aspects of data integrity.
Target System: A database, application or other storage medium where the transformed data is loaded into a data warehouse.
Transformation: The transformation step of the ETL acronym. A set of operations that manipulate source data and transform it in a data warehouse.
- Common Data Engineering Manager Interview Questions to Know - February 6, 2023
- 5 Common Data Engineering Interview Questions & Answers to Know - February 6, 2023
- 5 Common Data Engineering Technical Interview Questions - February 6, 2023