This is part of Solutions Review’s Premium Content Series, a collection of contributed columns written by industry experts in maturing software categories. In this submission, DataCore Software Field CTO Brian Bashaw offers four data health best practices to know and key techniques for success.
As IT departments gear up with data protection tools and tactics to secure data and adhere to compliance regulations, the role of data health in storage is often overlooked. Storage architectures do more than protect data and mitigate security risks. Data storage systems must deliver fast recovery from a data loss incident – and the processes that are in place to enable such rapid response are critical to data health.
The backup task today has become even more challenging with the plethora of sources that organizations must backup – including on-premises and cloud-based resources, or any of the thousands of remote devices that clients or employees use to interact with IT resources. This elevates seemingly in-the-weeds capabilities such as continuous checking of data integrity in storage systems to be far more important than check-box features which might have been considered at the time of purchase, and then largely ignored.
Backup frequency, thoroughness, and efficiency often have significant and direct impacts on overall data center security and performance. Best practices across a wide variety of industries reveal four techniques – approaches not always considered by IT teams – that can help assure data health by simplifying backups and improving system resiliency.
Data Health Best Practices to Know
Enabling proactivity, not just rapid incident response. The first step is making sure that what was received by the data storage platform, is in fact what was sent by the client. Many backup applications have done this by using a verify routine on the backup stream. Modern storage systems are smart enough to also calculate a hash or a checksum of the data when written, store that hash or checksum as additional metadata, and then use that on subsequent reads to validate that the data hasn’t suffered from some form of corruption. These reads should not be limited to client interaction, though. Other operations, like balancing of capacity as the storage system expands or retracts, and replication to other entities are also opportunities to validate the integrity of the data stored.
Making More Intelligent Use of Resources
Storage efficiency conversations tend to be specifically focused on the portions of media used to protect data. What rarely seems to be discussed are the (in)efficiencies that surround and complicate data protection. Storage systems can learn a lesson from optimizations found in other places like scale-out compute clusters or even hypervisors. For example, scale-out storage systems should not waste capacity on each node for a boot image. Instead, leverage a smaller set of resources in the cluster to serve a boot image that instead lives in RAM on the protection nodes. Not only does this approach reduce waste, but it also improves performance, simplifies the upgrade experience, and reduces risk when it comes to ensuring that vulnerabilities are patched in unison.
System-Wide Detection, Repair, and immutability
Taking the earlier-cited checksum process a little further, storage systems are generally not always busy churning out responses to clients. Like people, they can often do things to use their idle time more wisely. Modern storage systems should use this time to explore the data they are hosting, looking for any silent corruption and acting accordingly if a discrepancy is found.
Storage systems should also be able to flexibly enforce policies that lock data in place, making it immutable. It’s not enough to rely on snapshots or previous versions as the sole recovery from data loss and more specifically, ransomware. These elements are programmatically created and can be programmatically destroyed. While once limited to regulatory compliance situations, data immutability is now part of any complete ransomware protection strategy.
That provides the flexibility for teams to strike the perfect balance between storage efficiency, or a minimal amount of overhead that can diminish response time, and making sure data withstands failures. Here “durability” is defined by:
- How many failures can be endured while still presenting an accurate representation of the data stored Note that this isn’t just a view of hard disk failures. Other issues can be at play and obscure a view of data health: the disk could be busy and simply failed to respond, or perhaps the disk responded with a sector error or any other number of issues.
- What is the likelihood of these failures? Sometimes quantified by mean time between failures (MTBF) values for HDDs
- How quickly can these failures be recovered from?
For example, most storage admins are familiar with the need to build out RAID groups or protection policies. Usually, this is done with some kind of storage efficiency metric in mind, such as “when I have 8 data drives, at 2 parity, that gives me 20 percent overhead, and I can live with that.” It’s a sensible thought for efficiency, but in doing so, the admin has just defined the number of failures they can endure – which may or may not be acceptable.
In addition, many storage systems under IT teams’ management don’t allow the admin to make such an efficiency versus durability decision with any granularity. Once the admin makes a singular decision here, all data has the same durability. However, accounting users analyzing end-of-year results, or the business owner, or the legal department, may not agree that their applications should have that same durability.
Features such as policy-based erasure coding and mirroring enable storage systems to easily adapt their data protection scheme to the data it is storing, with a focus on achieving this balance of efficiency and durability.
Data health is a top IT priority as ransomware attacks become commonplace, as data sets grow from hundreds of terabytes to petabytes, and as more data is created and consumed in distributed locations. Teams that rigorously review data health processes can keep backup and recovery times at a minimum, and boost efficiency with a significant impact on the bottom line.
- Four Data Health Best Practices to Know and Key Techniques to Deploy - February 24, 2022