Solutions Review has compiled the most complete Big Data glossary of terms available on the web. With over 50 terms defined and growing daily, this resource is sure to help keep you hip to all the latest and greatest lingo in enterprise Big Data. One constant in this software sector is disruption. Companies are now requiring Data Management tools for use with analytics that can handle the management and processing of diverse data formats, both internally and externally.
This evolution makes terminology and vocabulary an integral part of keeping up to date with all the changeover. Be sure to bookmark this page and check back on a regular basis as this page will see ongoing updates.
Access Path: The track chosen by a database management system to collect data requested by the end-user.
Advanced Analytics: The examination of data using sophisticated tools, typically beyond those of traditional Business Intelligence, allowing for deeper insights or predictions to be made.
Administrative Data: Data that helps a data warehouse administrator manage a data warehouse. This data typically includes user profiles and warehouse history.
Aggregate Data: Data that is the end result of applying a process to combine data elements, usually taken collectively or in the form of a summary.
Analytics: The discovery of meaningful patterns in data, usually revealed by an analytics software solution.
Behavioral Analytics: A subset of Business Intelligence that focuses specifically on how and why users behave the way they do, using the data that is connected for analysis.
Big Data: Extremely large data sets that may be analyzed to reveal patterns and trends and that are typically too complex to be dealt with using traditional processing techniques.
Bulk Data Transfer: A mechanism, usually software-based, which is designed to move large data files, supporting compression, blocking and buffering in order to cut down on wait times.
Business Intelligence: A process for analyzing data and presenting actionable insights to stakeholders in order to help them make more informed business decisions.
Citizen Data Scientist: Business analysts and other personnel that may have experience working within an organization’s data architecture and using software tools to derive valuable business insights from stored data.
Cluster: A means of storing data together from multiple tables when the data contains common information that is needed for analysis.
Compliance: Conforming to a set of rules, usually established by a governing body. In terms of Data Management, compliance refers to the following of collection and usage techniques which safeguard private data, and is often used in highly-regulated industries.
Dashboard: A tool that is used to create, deploy and analyze information. Typically, a dashboard will consist of a single screen and show various reports and other metrics that the organization is studying.
Database: A collection of data that is purposefully arranged for fast and convenient search and retrieval by business applications and Business Intelligence software.
Data Blending: Provides a fast and straightforward way to extract value from multiple data sources to find patterns without the deployment of a traditional data warehouse architecture.
Data Cleansing: Transforming data in its native state to a pre-defined standardized format using vendor software.
Data Cube: A database structure with multiple dimensions which can be stacked, combined and manipulated to enable browsing.
Data Democratization: Provides users across an enterprise with access to data, allowing them to run analysis at any time to answer any question.
Data Discovery: User-driven process of searching for patterns in a data set, providing self-service and data democratization. Data Discovery has been labeled by Gartner as “modern Business Intelligence.”
Data Governance: The management of the availability, usability, integrity and security of the data stored within an enterprise.
Data Integration: The combination of technical and business processes used to combine data from disparate sources into meaningful insights.
Data Lake: A storage repository that holds a large amount of raw data in its native format until it is needed.
Data Lineage: Referred to as the data life-cycle, which includes the origins of the data and where it moves over time, describing what happens to data as it goes through diverse processes.
Data Management: The development and execution of architectures, policies and practices to manage the data life-cycle needs of an enterprise.
Data Mart: A collection of reports, metrics and other stored data on a specific subject matter. Think of this as an organization of like information, making for easier discovery.
Data Migration: The process of moving data between two or more storage systems, data formats, warehouses or servers.
Data Mining: Extracting previously unknown data from databases and using that data for important business decisions, in many cases helping to create new insights.
Data Protection: Safeguarding vital business data from corruption or loss.
Data Quality: Refers to the contextually quality of an organization’s collection of data. The more relevant, available, complete and accurate the information, the better chance profitable business insights will be created.
Data Replication: The frequent copying of data from a database to another so that all users may share the same level of information, resulting in a distributed database that allows users to access data relevant to their own specific tasks.
Data Science: A field of study involving the processes and systems used to extract insights from data in all of its forms. The pfofession is seen as a continuation of the other data analysis fields, such as statistics.
Data Staging: A temporary location where all data from outside resources are copied.
Data Warehouse: A system used for Data Analytics. They are a central location of integrated data from other more disparate sources, storing both current (real-time) and historical data which can then be used to create trends reports. In multidimensional data sets, drilling is the process of navigating among levels of data ranging from the most summarized (up) down to the most detailed (down).
Data Visualization: Transforming numerical data into a visual or pictorial context in order to assist users in better understanding what the data is telling them.
Drilling: The process of navigating through different levels of data in multidimensional sets.
Embedded Analytics: The integration of external Business Intelligence tools and capabilities into existing business software.
Enterprise Data Warehouse (EDW): A database environment created to provide a single view of an enterprise and is considered to be a reliable source of controlled information for strategic planning and decision making.
Enterprise Information System (EIS): Applications that are used for presenting and analyzing corporate data, typically used by high-level management.
Enterprise Resource Planning (ERP): This type of software allows a business or organization to manage a suite of integrated applications which are used to collect, manage and store data on a variety of business activities.
Extract, Transform, Load (ETL): A data warehousing process that involves moving data from one location to another. These three functions are combined into one to allow faster migration.
Hadoop: A programming framework that supports the processing of large data sets in a distributed computing environment.
Legacy Solution: An old or outdated software tool.
Location Intelligence: BI feature that relates geographic contexts to business data and designed to turn data into insights for a host of business purposes.
Machine Learning: A type of artificial intelligence that provides computers with the ability to learn without being specifically programmed to do so, focusing on the development of computer applications that can teach themselves to change when exposed to new data.
Master Data Management: Incorporates processes, policies, standards, and tools that define and manage all of an organization’s critical data in order to formulate one point of reference.
Metadata: Describes other data within a database and is responsible for organization while an end-user sifts through collected data.
Online Analytical Processing (OLAP): A technology solution that is used to organize the databases of large businesses, supporting Business Intelligence.
Operational Analytics: Data Analytics that are focused on improving the internal operations of the enterprise.
Operational Data Store (ODS): A current and relevant store of data used to support tactical decision making within an organization.
Predictive Analytics: BI solutions that help the user discover patterns in large data sets in order to predict future behavior.
Prescriptive Analytics: The area of Business Intelligence dedicated to finding the best course of action for a given situation.
Real-Time Analytics: The ability to use all available enterprise data as needed and usually involves streaming data that allows users to make business decisions on the fly.
Relational Database Management System (RDBMS): A system used to store data manged in relational tables, typically organized according to the relationship between different data values.
Reporting: The collection of data from various sources and software tools for presentation to end-users in a way that is understandable and easy to analyze.
Repository: A mechanism for storing data defining a system at any point in its life-cycle.
Scalability: The ability to increase volumes of data and the number of users to the data warehouse, which is critical for the data and technical architectures of the enterprise.
Schema: The structure that defines how data inside a database is organized.
Self-Service: A BI practice that enables business users to access and work with corporate data without a background in statistical analysis.
Service Level Agreement (SLA): A contract between a service provider or vendor and the customer that defines the level of service expected. SLAs are service-based and specifically define what the customer can expect to receive.
Slice And Dice: The breaking down of large data sets into smaller portions so that they can be analyzed in different perspectives.
Software as a Service (SaaS): A software delivery model in which software is licensed on a subscription basis and is centrally hosted and typically accessed by end-users using a client via web browser.
Snapshot: View of a data set at a particular instance in time.
Structured Query Language (SQL): The accepted standard for relational database systems, covering query, data definition, data manipulation, security and additional aspects of data integrity.
- Key Takeaways: The 2021 Gartner Market Guide for Active Metadata Management - November 24, 2021
- Vector Capital-Backed MarkLogic to Acquire Smartlogic’s Metadata Tools - November 23, 2021
- Immuta Updates Data Governance Tool with New Snowflake Integrations - November 23, 2021