10 Worst Practices for Big Data

10 Worst Practices for Big Data Andrew OliverSo many times when you see your favorite teams lose in championship games or a company fails to meet its revenue goals, years later you will see an interview and one of the leaders will say that “they learned more from losing than they ever did winning.” Some might think that this is a losing philosophy, but I wholeheartedly buy into it because let’s face it; we are not all 100% perfect.

And those who lose but learn from mistakes made will prosper in the long run. With that I’d like to make a connection to an article written by Andrew C. Oliver, President of Open Software Integrators, called “The 10 worst big data practices.”

Below is a list of Andrew C. Oliver’s 10 worst big data practices.

1. Choosing MongoDB as your big data platform.

MongoDB is a good operational database, but not an analytics system.

2. Using RDBMS schema as files.  

There are better ways to create an extract that’s more denormalized.

3. Creating data ponds.

If you create a bunch of data ponds for each business group, you will end up with different views of the data.

4. Failing to develop plausible use cases.

Come up with use cases before the project starts.  You may find that there are certain things you really don’t need even if the vendor recommends it.  Ask yourself the right questions.

5. Thinking Hive is the be-all, end-all.

Don’t get locked in on SQL just because you are familiar with it. Reach outside your knowledge comfort zone and learn something new.

6. Treating HBase like an RDBMS.

You can do things with HBase that would make your RDBMS’s head spin, but the reverse is also true. HBase is good for what HBase is good for, and it is terrible at nearly everything else.

7. Installing 100 nodes by hand.

Sounds good until someone loses a node during this cumbersome activity.

8. RAID/LVM/SAN/VMing your data nodes.

“Hadoop stripes blocks of data across multiple nodes, and RAID stripes it across multiple disks. Put them together, what do you have? A roaring, low-performing, latent mess.”

9. Treating HDFS as just a file system.

If you dump stuff onto HDFS, you haven’t necessarily accomplished anything.

10. Whoo, shiney!

“As with any technology — or anything in life — find that moderate path that prevents you from being the last gazelle in the pack or the first lemming off the cliff.”

Do you agree with this for the 10 worst big data practices? Would you add something to this list or delete anything?

Click here to read Andrew C. Oliver’s entire article on InfoWorld and learn more about the bad practices on this list.