The 10 Best Data Preparation Books on Our Reading List
Our editors have compiled this directory of the best data preparation books based on Amazon user reviews, rating, and ability to add business value.
There are loads of free resources available online (such as Solutions Review’s Data Integration Software Buyer’s Guide, vendor comparison map, and best practices section) and those are great, but sometimes it’s best to do things the old fashioned way. There are few resources that can match the in-depth, comprehensive detail of one of the best data preparation books.
The editors at Solutions Review have done much of the work for you, curating this directory of the best data preparation books on Amazon. Titles have been selected based on the total number and quality of reader user reviews and ability to add business value. Each of the books listed in this compilation meets a minimum criteria of 5 reviews and a 4-star-or-better ranking.
Below you will find a library of titles from recognized industry analysts, experienced practitioners, and subject matter experts spanning the depths of predictive analytics all the way to data science. This compilation includes publications for practitioners of all skill levels.
Note: Titles with recently published new editions will be included if the previous edition met our review and ranking criteria.
Machine Learning Design Patterns: Solutions to Common Challenges in Data Preparation, Model Building, and MLOps
“The design patterns in this book capture best practices and solutions to recurring problems in machine learning. The authors, three Google engineers, catalog proven methods to help data scientists tackle common problems throughout the ML process. These design patterns codify the experience of hundreds of experts into straightforward, approachable advice. In this book, you will find detailed explanations of 30 patterns for data and problem representation, operationalization, repeatability, reproducibility, flexibility, explainability, and fairness. Each pattern includes a description of the problem, a variety of potential solutions, and recommendations for choosing the best technique for your situation.”
“This practical guide provides business analysts with an overview of various data wrangling techniques and tools, and puts the practice of data wrangling into context by asking, “What are you trying to do and why?” Written by key executives at Trifacta, this book walks you through the wrangling process by exploring several factors—time, granularity, scope, and structure—that you need to consider as you begin to work with data. You’ll learn a shared language and a comprehensive understanding of data wrangling, with an emphasis on recent agile analytic processes used by many of today’s data-driven organizations.”
“Written for anyone involved in the data preparation process for analytics, Gerhard Svolba’s Data Preparation for Analytics Using SAS offers practical advice in the form of SAS coding tips and tricks, and provides the reader with a conceptual background on data structures and considerations from a business point of view. The tasks addressed include viewing analytic data preparation in the context of its business environment, identifying the specifics of predictive modeling for data mart creation, understanding the concepts and considerations of data preparation for time series analysis, using various SAS procedures and SAS Enterprise Miner for scoring, creating meaningful derived variables for all data mart types, using powerful SAS macros to make changes among the various data mart structures, and more!”
“Get complete instructions for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.6, the second edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You’ll learn the latest versions of pandas, NumPy, IPython, and Jupyter in the process. Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It’s ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub.”
“The book starts with the absolute basics of Python, focusing mainly on data structures. It then delves into the fundamental tools of data wrangling like NumPy and Pandas libraries. You’ll explore useful insights into why you should stay away from traditional ways of data cleaning, as done in other languages, and take advantage of the specialized pre-built routines in Python. This combination of Python tips and tricks will also demonstrate how to use the same Python backend and extract/transform data from an array of sources including the Internet, large database vaults, and Excel financial tables.”
“This guide for practicing statisticians, data scientists, and R users and programmers will teach the essentials of preprocessing: data leveraging the R programming language to easily and quickly turn noisy data into usable pieces of information. This book will guide the user through the data wrangling process via a step-by-step tutorial approach and provide a solid foundation for working with data in R. The author’s goal is to teach the user how to easily wrangle data in order to spend more time on understanding the content of the data.”
“This hands-on guide shows non-programmers like you how to process information that’s initially too messy or difficult to access. You don’t need to know a thing about the Python programming language to get started. Through various step-by-step exercises, you’ll learn how to acquire, clean, analyze, and present data efficiently. You’ll also discover how to automate your data process, schedule file- editing and clean-up tasks, process larger datasets, and create compelling stories with data you obtain.”
“What is bad data? Some people consider it a technical phenomenon, like missing values or malformed records, but bad data includes a lot more. In this handbook, data expert Q. Ethan McCallum has gathered 19 colleagues from every corner of the data arena to reveal how they’ve recovered from nasty data problems. From cranky storage to poor representation to misguided policy, there are many paths to bad data. Bottom line? Bad data is data that gets in the way. This book explains effective ways to get around it.”
Python Data Cleaning Cookbook: Modern techniques and Python tools to detect and remove dirty data and extract key insights
“This book shows you tools and techniques that you can apply to clean and handle data with Python. You’ll begin by getting familiar with the shape of data by using practices that can be deployed routinely with most data sources. Then, the book teaches you how to manipulate data to get it into a useful form. You’ll also learn how to filter and summarize data to gain insights and better understand what makes sense and what does not, along with discovering how to operate on data to address the issues you’ve identified.”
Recent Releases Worth Checking Out
“Carl Allchin, from The Information Lab in London, gets you up to speed on Tableau Prep through a series of practical lessons that include methods for preparing, cleaning, automating, organizing, and outputting your datasets. Based on Allchin’s popular blog, Preppin’ Data, this practical guide takes you step-by-step through Tableau Prep’s fundamentals. Self-service data preparation reduces the time it takes to complete data projects and improves the quality of your analyses. Discover how Tableau Prep helps you access your data and turn it into valuable information.”