ETL vs. ELT and the Benefits of Data Transformation in the Cloud
The prizefight between ETL vs. ELT rages on. This post highlights key differences in the two data transformation processes and provides three reasons or benefits to working in the cloud. What’s the difference between ETL and ELT? Read on to find out.
Enterprises are embracing digital transformation and moving as quickly as their strategies allow. As they continue toward the digital promised land, they are realizing that legacy, on-prem infrastructures are not built to withstand the journey. While 89 percent of enterprises have plans to adopt or have already adopted a digital-first business strategy, a third of organizations say that the need to replace legacy systems is one of the biggest obstacles.
Digital transformation – the use of technology to create new value in business models, customer experiences and operations – is driven by data and insights. When incorporating insights into business strategy and planning, organizations are seeing impressive “return on insights” and growing at an average of more than 30 percent annually. But legacy systems and on-prem data warehouses are inhibiting businesses from gleaning insights fast enough to stay competitive.
Moving Data to the Cloud is Only Half the Job
Enterprises are great at collecting data, but many struggle to join together siloed data from different sources; to add business logic, and embellish metrics to take raw “captured” data and turn it into something useful. Organizations need to transform data in order to make it useful and this needs to happen at scale, inside of an already complex IT environment. Merely ingesting data into a cloud data warehouse – or rather a cloud data warehouse engine – does not necessarily make data useful or usable. It is the transformation of that data; taking it from a raw, normalized state to data that is denormalized and ready for analysis. Transforming data is challenging but also important. Once it’s done, companies can benefit from an actual data warehouse model running on top of the cloud data warehouse engine.
As it stands, many enterprises do not have the right technologies to transform data into an analytics-ready format. The data management strategy of large enterprises varies wildly. Some remain completely on-prem, some are in the cloud (or multiple clouds), and many fall somewhere in between, with a hybrid of both. In fact, 77 percent of enterprises have at least one application or a portion of their enterprise computing infrastructure in the cloud. For companies looking to be insights-driven, cloud data warehouses like Snowflake, Amazon Redshift, Google BigQuery and Azure Synapse are the practical choice over their on-prem counterparts.
The Rise of ELT
As companies transition from on-prem to the cloud, they can also move toward a better data transformation architecture using ELT rather than ETL. ETL is the process by which you extract data from a source or multiple sources, transform it with an ETL engine, and then load it into its permanent home, usually a data warehouse. In ELT, you extract data from the source, load it unchanged into a target platform (database, data warehouse, data store), and then transform the data inside the warehouse itself – something only practical in the cloud.
With ELT, businesses gain better performance and cost-savings because they can leverage the cloud data warehouse to transform data. With ELT, data professionals work directly and natively on data inside the warehouse for higher fidelity, faster productivity, increased scalability, and fewer errors. Longer batch processes and slow development are essentially eliminated. The infrastructure and architecture are far simpler and can be scaled up and down as needed. The ELT process reduces waste, improves speed and consigns bottlenecks to the history books.
End-users see significant savings on infrastructure, better performing workloads, and shorter development cycles. Data is quickly integrated and immediately available for transformations and analysis as well. ELT is also more aligned with current software development best practices. If developers aren’t sure how they want to use data, they have the freedom to apply transformations at a later date. Given the switch to different development strategies and the rise of iterative agile methodologies, this capability is very appealing.
Three Great Reasons For ELT instead of ETL
Development productivity
The resources needed to work in an ETL architecture have grown as the amount of data grows. For data transformation within ETL, developers need a fully coded job before they could even begin to debug and validate their transformation logic. This often results in an iterative process of making changes, rerunning ETL jobs, and re-validating the results.
Using ELT, developers can perform quick, real-time data validation before running jobs and debugging. They see quicker speed to development and fewer errors. Tests no longer take hours to run – they take minutes. Gone is the burden of incomplete or failed jobs and debugging. Developers can dedicate their time to data discovery and innovation.
Infrastructure complexity
Before the cloud data warehouse, data infrastructure with on-prem, legacy systems was complex, and only got more so as business demands changed and data volumes increased. In the cloud, you don’t need to add more servers just because your data needs have grown. For example, a retail business is seasonal. It may need more compute power and storage, but only during the fourth quarter. Before, that retailer would need to keep data servers and overhead costs high all year in anticipation of the reporting and analytics needs of the holiday season. This was wasteful for their IT budget and bottom line. ELT allows for scalability and simplicity inside the data infrastructure. That retailer can now scale resources up and down, managing the cost of its cloud data warehouse, and avoid being locked into a larger size than it needs for three quarters of the year.
Faster performance of common jobs
With an ETL architecture, teams would run jobs nightly and experience long batch windows because there was only so much compute power available in the organization. To solve this problem, businesses would invest money in more servers, which would take months to see value (think ripping and replacing technology, implementing new processes, training teams on the new hardware) and make data transformation a bottleneck.
With ELT, your cloud data warehouse is your compute engine. This means as your data volume grows, you can easily allocate more compute power and speed up your workflows for shorter batch windows that require no additional hardware.
Is ELT right for your business?
The advantages of ELT are clear – it is an enhanced technology that puts the hard work of data transformation in the hands of the powerful data warehouses and an underlying public cloud. But is it the right approach for your organization?
As ELT does require script writing, which when done manually, can be a time and labor-intensive process. That much hand-coding might not be a viable option for some organizations. And, the likelihood of human error can increase, especially when you account for the amount of data, data sources, and outputs involved.
The death of ETL?
So is ETL dead? The process of extracting, transforming, and loading data is still necessary but as the technology has changed, and so has the method. Gartner predicts that by 2023, 75 percent of all databases will be on a cloud platform. ELT will be an important part of the cloud data management strategy for businesses looking to grow their business with analytics-ready insights.