Apache Hadoop celebrated its 10th birthday on Friday, a decade after the first cluster went into production in 2006 inside the labs of Yahoo. Hadoop as we know it today began as an experiment in distributed computing for Yahoo’s internet search, but has since evolved into the open source Big Data framework of choice in some of the world’s largest organization, driving billions in enterprise spending.
Hadoop was co-created by Doug Cutting and Mike Cafarella, whom split the distributed file system and MapReduce facility from their open source web crawler project which they referred to as Nutch. The undertaking, which began in 2003, spawned from their desire to create a more scalable search engine. What really allowed Hadoop to take off was the inspiration that the two technologists gained after reading a series of Google whitepapers which described a distributed file system and execution engine that had previously allowed Google’s engineers to write an entire computation with very little coding that had the ability to run in parallel on thousands of other machines.
Cutting and Cafarella realized that this had the ability to transform Nuch into something much more than a project, and if the system could process a seemingly infinite number of web pages, then it could have other applications too. Cutting, who is now Cloudera’s Chief Architect, began working at Yahoo in January of 2006. Upon his hire, he was given the resources he needed to begin working on Hadoop full-time, and in short order, the technology took off. The rest is history.
Hadoop has been one of the major engines for the digital transformation of business in its decade of use. Companies are largely built around their systems of data now, quite the difference from how the majority of organizations were run just ten years ago. Now a mature, but still-growing technology, Hadoop’s bright yellow elephant has become the poster image for the Big Data landscape, and has acted as one of the main engines in the revolution of data analytics.
- TIBCO Updates TIBCO Connect Integration at Annual User Conference - October 1, 2021
- The 3 Best AWS Data Engineering Certifications to Consider for 2021 - September 30, 2021
- AtScale Launches New Semantic Layer Integrations for Microsoft Excel - September 30, 2021