Laz Vekiarides is the Chief Technology Officer and Co-Founder of enterprise storage and data protection solution provider, ClearSky Data. Solutions Review had the opportunity to speak with Vekiarides about the challenges of storing hot data on premises. During our conversation, Vekiarides shed light on what the best way to store hot data is, as well as what the future of hot data storage holds. With five years of experience at ClearSky Data, Vekiarides provides insight on the subject.
What is the difference between hot and cold data?
For us, hot data is all the data a customer would access over the course of a week. It’s a small percentage of the total data stored, usually anywhere from 7 to 12 percent of what’s provisioned for primary storage for a particular workload. For example, when we performed an analysis for one of our first customers, they had half a petabyte of storage and, on average, were accessing just 5 to 7 percent of it.
Cold data is either never accessed or is accessed very, very infrequently. Often, cold data must be stored for compliance reasons, but data can also include backups, archives and old files, all of which are very unlikely to ever be accessed.
Use cases for cold data storage are almost always driven by cost because people are usually willing to wait for cold data, especially since the law gives you a few days before required reporting. That said, we have one customer who has a radiology archive, and that data can quickly go cold. But in the event a radiology lab needs to access an old image, they can’t wait several days for it. Cold data doesn’t always mean the same thing for every customer.
Finally, we have a third category of data, which we call “warm data.” It’s data that’s not likely to be accessed within a week, but isn’t quite at the level of “will almost certainly never be accessed again” that we need to classify it as cold.
Why is storing hot data on premises a risk? What are the challenges of doing so?
Hot data needs to be stored close to the end-user to provide performance, but if it permanently lives on-premises, management becomes an expensive, complicated headache. For starters, that data needs to be backed up and also replicated to the disaster recovery site — both of these functions require the installation and management of a separate backup and DR system. Then, if the enterprise has a multi-tiered storage architecture, the organization has to set and follow policies to move data into cold storage once it’s no longer hot, and that’s not a simple task. It’s very likely to be an inefficient process, which means the organization will end up storing far more data in the high-performance “hot” data storage than is necessary.
Finally, access becomes a massive pain in the neck. If the data also needs to be accessed by a cloud application or users in other locations, IT will need to set up some kind of replication scheme, which can quickly become unmanageable and often provides poor performance.
How can a hybrid cloud approach to storing hot data be beneficial to users?
It simplifies management, cuts costs and, in most cases, improves performance. For example, in our service, all data is stored in the cloud, where it is automatically protected. As a result, IT needs to only manage a single, durable copy of their data. But the cloud can’t provide the performance required for day-to-day frequent usage. No matter how fat your pipes are, you can’t transfer data faster than the speed of light. The only way to overcome this latency is to cache hot data at the edge, and that’s what we do. ClearSky runs a network that intelligently and automatically caches hot data on-premises, caches hot and warm data in a point of presence (PoP) that’s no more than 120 miles away, and stores all data, including cold data, in the cloud.
As a result, whenever someone requests data that’s not in the local cache, which happens about 5 percent of the time, the system retrieves the data from the nearby PoP so that latency still isn’t detectable by the user. Even better, since there’s such a small amount of data cached locally — about 7 to 12 percent of the total data — all of that data can be affordably cached on flash, improving local performance.
Do you believe that the challenges of storing this data on premises will change in the future? Why or why not?
Absolutely. Hybrid cloud technologies have matured to the point that there’s absolutely no reason for any enterprise to invest in big iron to store and manage their data locally. Hybrid cloud cuts costs, simplifies management, and improves both performance and access. I believe that, within five years, most organizations will be using some form of hybrid cloud to store and manage their data.
Latest posts by Tess Hanna (see all)
- The 7 Best Udemy Courses for Data Protection Officers to Consider for 2021 - March 5, 2021
- Cobalt Iron Earns Patent for Automatic, Dynamic Data Collection - March 4, 2021
- Arcserve Releases Arcserve Unified Data Protection 8.0 - March 3, 2021