On the Brink of Disaster: IT’s Biggest Problem is an Easy Fix

Brink of Disaster

By Alan D. Crowetz

It is easily the most critical IT task. It’s even easier to monitor. So why is it that we regularly find so many people on the brink of a nightmare?

I’m talking about backup and disaster recovery. Twenty plus years in the business and I’m still shocked at what we find. It keeps me awake at night, yet our clients often sleep blissfully unaware of the precipice they are teetering on.  While virtually every business has a backup and disaster recovery system in place, many fail to monitor it as they should. They trust that it is working, and if it doesn’t, they assume they will be alerted in some fashion when something goes wrong. Hopefully they dig deeper before there is a real problem and it’s simply too late.

The good news is that this is easy and often free to fix. Surprisingly, the biggest weakness in any disaster recovery system is people. Between 60 and 70 percent of all the problems we see disrupt a business are due to internal hardware malfunctions or human error.

A common occurrence we see at our firm is that users have an excellent backup system but they don’t check the status of it on a daily basis. Perhaps there was an error months ago and no backups have run at all. It often takes less than 1 minute to check a backup, yet many users assume it’s working or that it will notify them of an issue. Yet if a system fails badly enough, it may be unable to send an error code or alert.

We’ve seen many firms where we have diligently trained staff and emphasized the importance of this task only to follow up in the future and discover an issue or that they haven’t been maintaining their backups. What’s worse (if you can call it that) is that a backup system can often work flawlessly for years without an issue, allowing for IT teams to let their guards down.

Acronis found that 60 percent of system downtime was caused by human error.

We have seen a surprising number of firms that are rotating tapes or backup drives offsite diligently and yet when we check, the backups haven’t been running and the equipment they are rotating daily is empty!

How important is it? 93 percent of companies that lost data for more than 10 days filed for bankruptcy within one year. 50 percent of businesses filed for bankruptcy immediately. Does this happen much? We regularly get “that” call where a business dials us in a panic after having a disaster, and after days of stress they aren’t making progress. People have given up or walked away and now it’s gone critical. Every week 140,000 hard drives crash in the United States. According to one study, the catastrophe most businesses experience is not fire, flood or earthquake, but malware – and that is rapidly getting worse, not better.

We had a firm call us one day out of the blue. Their IT systems are mission critical and had gone down costing them exorbitant amounts of money every hour that went by. Their existing IT department had been confident they were covered. They were not. When they called us, it was now day 3 without operations. Everyone was standing around panicking and unable to work, and the company was hemorrhaging cash. Despite working around the clock, the IT department was in panic, and looking like they would not be able to recover. They covertly called us for advice, emergency help, and feedback. When we looked, it was indeed a very bad situation.

We jumped in and worked around the clock to see if we could help, despite our instinct to run in the opposite direction. With a team of very talented experts and more than our share of miracles, we were able to recover them. But I have to say, it just as easily could have gone the other way, and most of the time, it does.

People were fired, others threatened with lawsuits, and enormous amounts of money were lost, but I would still consider them lucky.

So what can you do? The easiest, quickest, and cheapest thing is to simply check your backups daily. And a “success” email is not enough. How long did it run? How much did it back up? Is anything different in the log or summary then you normally see? Were there any errors in the detail? I’m so OCD about this I personally recommend that someone in the firm keep a mini log daily. It’s easy and even a lower-level staff member can do it. Then I ask that someone else a bit higher up, “randomly” audits that person to ensure they are on top of checking the backups and that they are current to the day.

Also, at least once a year, review your backup systems. Are you backing up everything? Is it flawless or painful? Is there a better option? Hughes Marketing Group found that over the course of a month, 90 percent of small businesses spend less than 8 hours planning or managing their business continuity plans. Only 51 percent of small businesses have an IT business continuity plan in place.

Where are you backups stored? So many firms we see back up their data to the same building or even same room. If someone broke in and stole equipment, they’d likely grab the backup as well. If there was a fire, their backups would be toast as well. You will want to make sure you have copies locally as well as offsite. This may mean at another location or in the cloud; both have pros and cons. Another location has the advantage of being cheap or free and quick to recover when needed. On the downside, because it involves people (remember those weakest parts of the system?) it is sometimes skipped, and offsite might not be enough. If a natural disaster takes out your site, a location 20 miles away may be taken out as well.

Cloud backup is good because it can be automated and involves a distant or multiple locations. 34 percent of SMB executives said disaster recovery had a moderate or very large effect on the decision to adopt public cloud computing. It’s bad because it can take days or weeks to retrieve. We also find that many firms are not replicating everything to the cloud, they are selectively choosing data, which is dangerous. Get everything: operating system, programs, data, offsite.

Finally, nothing beats testing backups. We also recommend that this is done at least once annually. Specifically, test the hardest thing. That doesn’t mean your local backup or just raw data. It means use your offsite backup (you do have multiple, full offsite backups don’t you?) and do a full, “bare metal” restore of a server. Remember, simulate a real disaster, you can’t go back into the server room to grab a special CD or a license key or anything. Can you still fully recover it without any help from existing resources?

Could your business survive if you irretrievably lost all your data? Could it be recreated from nothing? Is there anything more “nuclear” to a firm then this happening? Do yourself a favor and at least ask the questions, check things out, and question the status quo.

Statistics courtesy of Pivotal IT‘s 10 Backup and Disaster Recovery Statistics You Must Know.

Alan CrowetzAlan is the President and CEO of InfoStream, Inc., a business consulting firm based out of West Palm Beach, Florida. InfoStream serves a variety of businesses, charities, schools and government agencies both in South Florida and around the globe. InfoStream has been featured on Microsoft’s Pinpoint.com and has a perfect 5-Star rating on the world’s largest independent computer engineer listing service. In addition, InfoStream has been listed in the Top 100 Global Small Business IT firms by MSPMentor. Alan has a Masters in Business Administration and in his free time likes to enjoy the outdoors. Connect with Alan on LinkedIn.

Timothy King
Follow Tim