Back in December, Talend’s Chief Marketing Officer Ashley Stirrup made things interesting when he wrote a blog post outlining the importance of speed when working with Big Data. In the post, he stresses that the speed advantages Talend offers over some of the other vendors in the space was nothing short of fact, and he wanted to prove it. So Talend went ahead and used MCG Global Services to run some benchmark tests, pitting Talend Big Data Integration up against the big kid on the block, Informatica Big Data Edition.
Stirrup believes that MCG did a “really nice job” on the benchmark, defining a common set of use cases and questions that would be relevant to the majority of data-driven organizations. You can access that benchmark report here, but according to the graphs, Talend not only defeated Informatica, but cruised right past them. Talend explains that their victory in this contest is a result of their tool enabling users to leverage Spark’s in-memory capabilities to integrate datasets at faster rates than their competitor, adding: “With Informatica Big Data Edition, which doesn’t support Spark directly, how Hive-on-Spark behaves and performs is up to the Hadoop engine and how it is configured.”
As the current king of the Data Integration market, Informatica issued a response the only way we would expect: by writing a blog post highlighting the reasons why their platform was best while completely ommitting the word name ‘Talend.’ Informatica’s Principal Big Data Product Manager Sumeet Agrawal says: “Informatica has always embraced open source innovation for its products and will continue to leverage and extend open source technologies. Informatica already uses Spark for graph processing. However, our latest Big Data Management product shows 2-3 times faster performance over Spark for batch ETL processing by using Informatica Blaze on YARN.”
Sumeet goes on to caution solution-seekers to be wary of benchmarks that do not consider a variety of critical criteria, referring to Talend simply as “a software vendor” and reprimanding them for comparing a two-year old version of Informatica which supported only MapReduce. Sumeet goes on to explain why Talend’s benchmark was misleading, citing Talend’s selection of a custom benchmarking over an industry standard TPC test which would provide vendor-neutral evaluation for performance. In addition, Sumeet argues that the benchmark was run using only 12 million records on a cluster with 4 CPUs, 20.5 GB of memory and 200 GB of storage, something that is not representative of a real-world Big Data environment.
Phew! Glad that’s over. The enterprise Big Data Integration market is no place for this kind of drama, right? Wrong!
It later came out, according to a January Talend blog post, that Informatica’s lawyers sent letters to MCG to retract the benchmark However, Talend did admit that it had used an outdated version of Informatica. Talend’s Chief Technical Officer Laurent Bride went on to scold Informatica regarding the frequency with which they update their tools, saying: “In general, Informatica releases their products every 2-3 years, while we release twice per year, so it’s not surprising to see their product out of date relative to ours and the rest of the Big Data ecosystem.”
Bride concludes the rather aggressive post with: “If you’re so committed to the Informatica stack that you are willing to put a legacy runtime on every Hadoop node, suffer the performance hit, toggle back and forth between their incompatible traditional and big data ETL products, and rule out a simple migration to the cloud then Informatica has a good solution for you.” Yowza, those are fightin’ words!
Informatica has yet to respond to these claims. So it will be interesting to see if and how they do. Talend’s CEO Mike Tuchen did come out recently and try to clear the air a bit. However, Tuchen did uphold Talend’s challenge to Informatica, even saying that they would still love to benchmark against the Big Data behemoth, even if it meant going head-to-head with their new solution, concluding: “If you choose to hide behind your lawyers and publish misleading marketing fluff instead, then we know you secretly agree with me too. If you really believe in your product, then let’s have some fun together.”
Friendly(ish) competition, it’s what this is all about! If you’re a data scientist, you probably love your job, but for the rest of us, it’s always nice to see a little playful banter between vendors to keep things lively. Do you have experience using either Talend or Informatica? What have your experiences been like? Is Informatica really behind the 8-ball like Talend says they are? Let us know in the comments!
- The 6 Best Geospatial Data Integration Tools to Consider in 2022 - October 20, 2022
- The 15 Best Open-Source Data Engineering Tools for 2022 - October 13, 2022
- The 10 Best Data Engineering Tools (Commercial) for 2022 - October 11, 2022