Business Intelligence Buyer's Guide

10 Worst Practices for Big Data

10 Worst Practices for Big Data Andrew Oliver

10 Worst Practices for Big Data Andrew OliverSo many times when you see your favorite teams lose in championship games or a company fails to meet its revenue goals, years later you will see an interview and one of the leaders will say that “they learned more from losing than they ever did winning.” Some might think that this is a losing philosophy, but I wholeheartedly buy into it because let’s face it; we are not all 100% perfect.

+ Check Out a Free 2015 Business Intelligence Tools Buyers Guide

And those who lose but learn from mistakes made will prosper in the long run. With that I’d like to make a connection to an article written by Andrew C. Oliver, President of Open Software Integrators, called “The 10 worst big data practices.”

Below is a list of Andrew C. Oliver’s 10 worst big data practices.

1. Choosing MongoDB as your big data platform.

MongoDB is a good operational database, but not an analytics system.

2. Using RDBMS schema as files.  

There are better ways to create an extract that’s more denormalized.

3. Creating data ponds.

If you create a bunch of data ponds for each business group, you will end up with different views of the data.

4. Failing to develop plausible use cases.

Come up with use cases before the project starts.  You may find that there are certain things you really don’t need even if the vendor recommends it.  Ask yourself the right questions.

5. Thinking Hive is the be-all, end-all.

Don’t get locked in on SQL just because you are familiar with it. Reach outside your knowledge comfort zone and learn something new.

6. Treating HBase like an RDBMS.

You can do things with HBase that would make your RDBMS’s head spin, but the reverse is also true. HBase is good for what HBase is good for, and it is terrible at nearly everything else.

7. Installing 100 nodes by hand.

Sounds good until someone loses a node during this cumbersome activity.

8. RAID/LVM/SAN/VMing your data nodes.

“Hadoop stripes blocks of data across multiple nodes, and RAID stripes it across multiple disks. Put them together, what do you have? A roaring, low-performing, latent mess.”

9. Treating HDFS as just a file system.

If you dump stuff onto HDFS, you haven’t necessarily accomplished anything.

10. Whoo, shiney!

“As with any technology — or anything in life — find that moderate path that prevents you from being the last gazelle in the pack or the first lemming off the cliff.”

Again, no one is 100% perfect and here are just 10 ways you could go wrong with big data. However, I thought that highlighting Andrew C. Oliver’s analysis of bad practices could help you avoid them in the future and save you lots of time and stress rerouting them. I hope that you found this helpful.

Do you agree with this for the 10 worst big data practices? Would you add something to this list or delete anything?

Click here to read Andrew C. Oliver’s entire article on InfoWorld and learn more about the bad practices on this list.

Check out these additional resources:

[box style=”0″]
[col type=”one-third”]2016 Business Intelligence and Data Analytics Buyer’s GuideBI_Data Analytics_Buyers_Guide_Cover_350[/col]

[col type=”one-third”]
BI and Data Analytics Buyer’s Matrix: Comparing 28 Vendors Features
Business Intelligence Solutions Buyers Matrix Comparison[/col]

[col type=”one-third last”]
Gartner Magic Quadrant for Enterprise BI and Data Analytics
Gartner Magic Quadrant Enterprise Business Intelligence[/col][/box]

Doug Atkinson

Share This

Related Posts

Udacity Data Science Ad