Top Books on Data Warehousing, Mining, Quality & Blending

Top Books

There are a wide variety of books available on data warehousing, data mining, data quality, and data blending around the web. Selecting the one that is right for your data-driven organization can be a tough, even overwhelming task. Solutions Review has done the research for you. After reviewing a multitude of books on the subject matter, we’ve carefully selected the following 10 books, based on relevance, popularity, online ratings, and their ability to add value to your business.

This list will run through all of the things you need to accomplish prior to the analytics phase. First, you need somewhere to store all of your data. This is traditionally done in a warehouse. Once you have all of your data, you need to pull, or mine it from the warehouse. Once you do that, you’re ready for the quality check. You need to make sure you have the correct data, right? Then you can blend it all together, add in an analytics solution, and you’re on your way to productive insights.

Reading one or a compilation of the following books could help you evolve from a BI no to a BI pro. Bad joke? Well, what do you expect, we’re a technology news site.

Note: these titles are not industry specific; they should have applications in a variety of fields.

Data Blending for DummiesData Blending for Dummies (Special Edition)  by Alteryx by Michael Wessler, OCP & CISSP

“Today’s analysts need to pull information from many places. But working with multiple sources and preparing data for analysis can be time-consuming and difficult to implement using standard tools like Excel or Access. Get Data Blending for Dummies to learn how to access, cleanse, and join data in any format from your hard drive, data warehouses, social media, prepare data for reports, presentations, visualization, or export to feed downstream processes, or create an intuitive workflow to document and automate data manipulation tasks.”

Data Warehouse ToolkitThe Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling by Ralph Kimball

“The first edition of Ralph Kimball’s The Data Warehouse Toolkit introduced the industry to dimensional modeling, and now his books are considered the most authoritative guides in this space. This new third edition is a complete library of updated dimensional modeling techniques, the most comprehensive collection ever. It covers new and enhanced star schema dimensional modeling patterns, adds two new chapters on ETL techniques, includes new and expanded business matrices for 12 case studies, and more.”

Agile Data Warehouse DesignAgile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Scheme by Lawrence Corr

“Agile Data Warehouse Design is a step-by-step guide for capturing data warehousing/business intelligence (DW/BI) requirements and turning them into high-performance dimensional models in the most direct way: by modelstorming (data modeling + brainstorming) with BI stakeholders. It describes BEAM✲, an agile approach to dimensional modeling, for improving communication between data warehouse designers, BI stakeholders and the whole DW/BI development team. BEAM✲ provides tools and techniques that will encourage DW/BI designers and developers to move away from their keyboards and entity relationship based tools and model interactively with their colleagues.”

Practical Machine Learning Tools and TechniquesData Mining: Practical Machine Learning Tools and Techniques, Third Edition (The Morgan Kaufmann Series in Data Management) by Ian H. Witten

“Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.”

Mining the Social WebMining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and more by Matthew A. Russell

“How can you tap into the wealth of social web data to discover who’s making connections with whom, what they’re talking about, and where they’re located? With this expanded and thoroughly revised edition, you’ll learn how to acquire, analyze, and summarize data from all corners of the social web, including Facebook, Twitter, LinkedIn, Google+, GitHub, email, websites, and blogs.”

Data Science for BusinessData Science for Business: What you need to know about data mining and data-analytic thinking by Foster Provost

“Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the “data-analytic thinking” necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand the many data-mining techniques in use today.”

Executing Data Quality ProjectsExecuting Data Quality Projects: Ten Steps to Quality Data and Trusted Information by Danette McGilvray

“Information is currency. In today’s world of instant global communication and rapidly changing trends, up-to-date and reliable information is essential to effective competition. Recent studies show that data quality problems are costing businesses billions of dollars each year, with poor data linked to waste and inefficiency, damaged credibility among customers and suppliers, and an organizational inability to make sound decisions.”

Competing with High Quality DataCompeting with High Quality Data: Concepts, Tools, and Techniques for Building a Successful Approach to Data Quality by Rajesh Jugulum

“Data is rapidly becoming the powerhouse of industry, but low-quality data can actually put a company at a disadvantage. To be used effectively, data must accurately reflect the real-world scenario it represents, and it must be in a form that is usable and accessible. Quality data involves asking the right questions, targeting the correct parameters, and having an effective internal management, organization, and access system. It must be relevant, complete, and correct, while falling in line with pervasive regulatory oversight programs.”

The Practitioner's Guide to Data Quality Improvement

The Practitioner’s Guide to Data Quality Improvement (The Morgan Kaufmann Series on Business Intelligence) by David Loshin

“The Practitioner’s Guide to Data Quality Improvement shares the fundamentals for understanding the impacts of poor data quality, and guides practitioners and managers alike in socializing, gaining sponsorship for, planning, and establishing a data quality program. This book shares templates and processes for business impact analysis, defining data quality metrics, inspection and monitoring, remediation, and using data quality tools. Never shying away from the difficult topics or subjects, this is the seminal book that offers advice on how to actually get the job done.”

Measuring Data Quality for Ongoing ImprovementMeasuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework (The Morgan Kaufmann Series on Business Intelligence by Laura Sebastian-Coleman

“The Data Quality Assessment Framework shows you how to measure and monitor data quality, ensuring quality over time. You’ll start with general concepts of measurement and work your way through a detailed framework of more than three dozen measurement types related to five objective dimensions of quality: completeness, timeliness, consistency, validity, and integrity. Ongoing measurement, rather than one-time activities will help your organization reach a new level of data quality.”

Happy reading!

Follow Tim

Timothy King

Senior Editor at Solutions Review
Timothy is Solutions Review's Senior Editor. He is a recognized thought leader and influencer in enterprise BI and data analytics. Timothy has been named a top global business journalist by Richtopia. Scoop? First initial, last name at solutionsreview dot com.
Timothy King
Follow Tim