An Unstructured Data Migration Plan Template to Consider
This is part of Solutions Review’s Premium Content Series, a collection of contributed columns written by industry experts in maturing software categories. In this submission, Komprise Chief Customer Success Architect offers a template for unstructured data migration planning, as well as tools to consider.
Data migrations have never been easy. But now, the need to do them intelligently and painlessly is dire because enterprises simply have too much unstructured data sitting on their highest-performing storage technologies in legacy environments. Even though storage technology prices have been declining in recent years, simultaneously, data growth has been exponential. It’s imperative to continually assess which data is being stored on your top-performing tiers and whether data can be migrated to a solution for a better price point and/or to meet organizational needs such as feeding cloud data lakes or complying with ever-evolving regulations.
There are many options these days for unstructured data storage–from storage as a Service (STaaS) to object storage, cloud network attached storage (NAS), and deep archives such as AWS Glacier and Azure Archive Storage. These choices mean IT teams responsible for unstructured data need a detailed understanding of their data and the ability to pivot at a moment’s notice to accommodate change. And let’s face the obvious: pivoting workloads of any size and scale, on-premises or to the cloud can be time-consuming and disruptive without a plan.
Creating a plan of what you need to know before you migrate will avoid errors and delays mixed with cost overruns while ensuring you meet your overall unstructured data management goals; for most organizations, this means moving to the cloud faster and maintaining an agile, hybrid cloud environment.
Unstructured Data Migration Plan: Steps to Take During Development
Map Out Sources and Targets
You’ll first want to get the lay of the land by defining your sources and targets. When you build your plan, make sure it details point A and point B locations and that you have a process to identify and resolve mitigating factors and potential complications of your source and target storage.
Rules and Regulations
Rules are typically established and governed within your organization, such as retention policy, legal hold, delete policy, and disaster recovery. Regulations are typically established by a governing body that can impose fines for non-compliance—such as HIPAA, SOX, GDPR, and GxP. It’s critical to partner with your HR, security, legal and compliance teams to ensure everyone does their part to meet or exceed applicable rules and regulations.
In addition, consider collaborating with data owners or data subject matter experts who can shed light on potential obstacles and provide feedback while establishing the best unstructured data management strategy.
When you do proper data discovery, you’ll understand your workloads and potential speed bumps. Are you migrating one share or thousands? Is it millions of small files, terabytes of large files, or a mixture of everything? Can you identify orphaned data and move it to an archive or confine it for deletion? Tools can help create a central index to help make better decisions through holistic visibility—which by the way, typically gains the undivided attention of legal and/or compliance teams and opens opportunities to partner for funding or demonstrate cost avoidance.
Simplify and Standardize
Just because you’ve been doing things a certain way in the past doesn’t mean it’s the right way tomorrow. Legacy standards that have not evolved over time or those adopted through mergers can wreak havoc on migrations including cloud adoption strategies. You will need to determine whether to carry old permissions forward or standardize in the new target, for instance. Another example is deciding between dual shares–SMB versus NFS where one protocol takes precedence or a mixed protocol architecture. In the latter, both protocols can set permissions and overwrite each other which typically presents supportability issues.
A keen advantage of data visibility is that it allows you to make layered decisions about your data versus taking a one-size-fits-all approach. Instead of directly shifting 2PB worth of unstructured data to another platform you might want to consider archiving or tiering cold data to budget-friendly object storage providing substantial savings year over year. Organizations implementing data visibility strategies can identify 60 percent to 80 percent of their data as cold. Reducing hot storage capacity directly reduces data protection and replication costs which can be a significant percentage of your overall storage budget.
Network and security configurations can have enormous consequences on migrations. Are you moving data between sites or regions, cloud to cloud, or even back from the cloud? Define your path and understand your round-trip latency, total versus. consumable bandwidth, and security requirements. Security technologies, particularly antivirus and IDS/IPS, have been known to negatively impact migrations when not configured to compensate for the increased workloads. The purpose of understanding topology is to avoid bottlenecks ahead of time which may slow down or even stop migration altogether.
Test, Test, Test
Pre-migration testing is as critical. Some of the most common issues include utilized nodes or clusters, misconfigurations, vendor-specific technology limitations such as shares that include one million or more files, short file names (8.3 Enforcement), long path names, and Unicode versus Non-Unicode (which affects data storage due to differences in character standards). Oversubscribed or saturated networks, asymmetric routing, or security systems can cause problems: frequently packet drops, out-of-order packets, or retransmits typically trace back to any of the above.
Starting with basic tools that are included in most operating systems is a smart idea. There’s nothing more basic than ping, traceroute, and nslookup, which test network connectivity, network path, and DNS configurations. iPerf can be used to measure bandwidth while Wireshark is excellent at showing blocked, dropped, out of order, or retransmission of packets.
Free Copy Tools vs. Enterprise Migration Software
Robocopy and Rsync are common open-source tools designed to only copy data and lack the features of enterprise migration software. Look for a solution that offers the ability to efficiently run, monitor, and manage hundreds of data migrations across hybrid cloud storage; identify the right files to migrate to maximize efficiency and reduce spend; minimize network usage; auto retry if network or storage is unavailable; migrate with or without all file permissions and access control and keep data integrity intact by conducting MD5 checksums on not just portions of files but all files.
The most effective way to keep people apprised of the migration plan and milestones is by sending out one email thread to all stakeholders over a predetermined interval during the migration. Keep the subject and summary brief and include relevant details toward the end. Less is sometimes more and consider applying color-coded status updates: red, yellow and green. The subject line can be used strategically, such as Green Status. People want to hear the positives. Avoid the blame game and celebrate the wins.
The Unstructured Data Migration Plan Should Embrace Key Issues:
- Which tier? Which cloud?
- What about rules and regulations?
- What are our common data types and workloads?
- What topology requirements do we have?
- Do I really need to test?
- Free tools or enterprise solution?
- How do I write communications that people will read?
Cloud migrations are a team sport. While there are plenty of tools and metrics that help, bringing teams together across IT and business lines promote shared accountability, which is imperative to a successful outcome that meets organizational and end-user objectives. The more planning and testing you do up front, the less potential for issues later which will erode trust in your cloud data management strategy.