14 Common Data Project Pitfalls to Avoid Like the Plague

By Raj Bhatti
Best Practices,

This is part of Solutions Review’s Premium Content Series, a collection of contributed columns written by industry experts in maturing software categories. In this submission, Cherre SVP of Client Solutions Raj Bhatti offers more than a dozen data project pitfalls to avoid, some common and some not-so-common.

SR Premium Content Twenty-five years ago was the first time I heard a senior business leader say, “Our business is running off spreadsheets and this has to stop.” I had just started my career as a programmer at Lehman Brothers. Lehman does not exist today. But multibillion-dollar global businesses running off spreadsheets still exist.

I have been lucky enough to have held multiple C-level roles in technology, been a part of two very successful IPOs, and led multiple teams developing software that supported trillions of dollars in notional trading and risk. I have had a front-row seat in the journey from DOS-based Paradox databases to SQL relational databases, to client-server, to world-wide-web, OLTP and OLAP, business intelligence to data warehouses, ODS to data marts to data lakes to data fabric, etc.

After all this, I can confidently say where I have seen the most data project failure: data and analytics projects attempting to move business-critical operations and insights from spreadsheets to corporate systems.

If companies are still driving critical business decisions using spreadsheets after rolling out data programs, the project has failed.

Data Project Failure: The Usual Suspects

Project failure does not always mean a dramatic crash and burn. It means that the data program has not produced the outputs that were defined as the success criteria.

Data project failure can be caused because of the same reasons that cause any kind of project to fail:

Not correctly defining business outcomes.
Incorrect cost and time estimates.
Lack of needed executive sponsorship.
Technical issues: Data cannot be processed in a cost-effective and timely manner.
Talent issues: Subpar project management, lack of experience resulting in not enough emphasis on QA/DevOps, Agile-wrapper on Waterfall and the resultant spaghetti code, etc.
Product managers believe they know what the users want better than the users themselves (more of an issue in B2B than B2C)
Lack of communication between IT teams and business users: Inexperienced technical teams not building multiple well-defined feedback loops with end-users.

Data Project Failure: Some Radical Observations from the Trenches

So far, these are reasons the industry likely already knew. But here are some startling reasons for failure that occur time and time again:

Over-Engineering

Getting buy-in from all stakeholders results in the engineering goals being more focused on how to keep multiple balls in the air as opposed to designing the data architecture that most efficiently takes you from point A to point B. And everyone is a stakeholder in a data project.

Wrong Tech Priorities

The engineers’ desire to work on the latest and coolest technology plays a surprisingly oversized role in selecting technology. For many engineers, gaining technical expertise in a marketable skill for their resume carries more weight than determining the most optimal way of solving the problem at hand.

Lack of Focus on Data Management

Although data management is where firms need to spend the majority of their time, it is usually not given the importance it deserves. This is the biggest reason users don’t trust the efficacy of the data, and this results in a lack of adoption – which is a failure.

Not Having the Data You Thought You Had

Collecting the data can be challenging. Creating or purchasing may not be possible. The data may not be clean. There may not be a way to process the data in a timely and cost-effective way. Limited research on client confidentiality and data privacy law implications are mistakes many rookies make.

Poor Team Composition

Every data project has a unique set of talent requirements. Building the right team is like George Clooney building his team for the heist in “Ocean’s 11.” A cookie-cutter approach to team composition happens all too often, resulting in not having the specific talent needed for the specific data problems that you need to solve.

Organizations Go to Analytics Without Having Their Data Ready

Much work needs to be done to collect, normalize, and aggregate data. First-timers often make the mistake of not realizing that the bulk of the heavy-lifting in your “cool” data science project is getting the data lined up.

Lack of Appreciation for the Power of Spreadsheet

I have seen many IT professionals tell business users to give up their spreadsheets and start using clunky dashboards that only provide half the data needed for the business user to do their job. The goal should never be to stop the business from being able to use their spreadsheets for what spreadsheets were built for.

The goal should be for the data storage and heavy business-logic computations to happen on a back-end data platform. The business user should have the option to use new dashboard visualizations and/or export data from the data platform into their spreadsheet.

Why This Time Will Truly Be Different

The work being done in data sciences coupled with the ability to store, aggregate, and compute gazillions of petabytes of data will result in “haves” and “have nots” in the business world. Machine learning and AI models will be used in every element of every workflow. And only the firms that have lined up their data will have the ability to stay competitive (think DVDs vs. streaming).

The “brute force” solution of just hiring more people for data entry will not hide your data program failure any longer. Working the weekend compiling data in spreadsheets will no longer be enough to win.

This article was written by Raj Bhatti on June 17, 2022

Raj Bhatti

Raj Bhatti is currently the SVP of Client Services at Cherre, a leading real estate data and analytics company. In his 25+ years of experience he has architected multiple Data platforms that supported highly scalable real-time processing of data at scale with the goal of aligning successful business outcomes with innovation.

Best Practices