原文地址:
http://www.quantivo.com/blog/top-5-reasons-not-use-hadoop-analytics
As a former diehard fan of Hadoop, I LOVED the fact that you can work on up to Petabytes of data. I loved the ability to scale to thousands of nodes to process a large computation job. I loved the ability to store and load data in a very flexible format. In many ways, I loved Hadoop, until I tried to deploy it for analytics. That’s when I became disillusioned with Hadoop (it just "ain't all that").
At Quantivo, we’ve explored many ways to deploy Hadoop to answer analytical queries (trust me – I made every attempt to include it in my day job). At the end of the day, it became an exercise much like trying to build a house with just a hammer - Conceivably, it’s possible, but it’s unnecessarily painful and ridiculously cost-inefficient to do.
Let me share with you my top reasons why Hadoop should not be used for Analytics.
1 - Hadoop is a framework, not a solution – For many reasons, people have an expectation that Hadoop answers Big Data analytics questions right out of the box. For simple queries, this works. For harder analytics problems, Hadoop quickly falls flat and requires you to directly develop Map/Reduce code directly. For that reason, Hadoop is more like J2EE programming environment than a business analytics solution.
2 - Hive and Pig are good, but do not overcome architectural limitations – Both Hive and Pig are very well thought-out tools that enable the lay engineer to quickly being productive with Hadoop. After all, Hive and Pig are two tools that are used to translate analytics queries in common SQL or text into Java Map/Reduce jobs that can be deployed in a Hadoop environment. However, there are limitations in the Map/Reduce framework of Hadoop that prohibit efficient operation, especially when you require inter-node communications (as is the case with sorts and joins).
3 - Deployment is easy, fast and free, but very costly to maintain and develop – Hadoop is very popular because within an hour, an engineer can download, install, and issue a simple query. It’s also an open source project, so there are no software costs, which makes it a very attractive alternative to Oracle and Teradata. The true costs of Hadoop become obvious when you enter maintenance and development phase. Since Hadoop is mostly a development framework, Hadoop-proficient engineers are required to develop an application as well as optimize it to execute efficiently in a Hadoop cluster. Again, it’s possible but very hard to do.
4 - Great for data pipelining and summarization, horrible for AdHoc Analysis – Hadoop is great at analyzing large amounts of data and summarizing or “data pipelining” to transform the raw data into something more useful for another application (like search or text mining) – that’s what’s it’s built for. However, if you don’t know the analytics question you want to ask or if you want to explore the data for patterns, Hadoop becomes unmanageable very quickly. Hadoop is very flexible at answering many types of questions, as long as you spend the cycles to program and execute MapReduce code.
5 - Performance is great, except when it’s not – By all measures, if you wanted speed and you are required to analyze large quantities of data, Hadoop allows you to parallelize your computation to thousands of nodes. The potential is definitely there. But not all analytics jobs can easily be parallelized, especially when user interaction drives the analytics. So, unless the Hadoop application is designed and optimized for the question that you want to ask, performance can quickly become very slow – as each map/reduce job has to wait until the previous jobs are completed. Hadoop is always as slow as the slowest compute MapReduce job.
That said, Hadoop is a phenomenal framework for doing some very sophisticated data analysis. Ironically, it’s also a framework that requires a lot of programming effort to get those questions answered.
分享到:
相关推荐
5. 降低运营成本:通过减少库存持有成本和运输成本,企业可以实现运营成本的降低。 总的来说,尽管VMI可能会面临一些挑战和误解,但其潜在的利益不容忽视。在评估是否采用VMI时,公司应该全面考虑其业务需求、供应...
还在四处寻找有关于Top 10 Reasons Not to Try VMIDOC吗?整理发布的这一款Top 10 Reasons No...该文档为Top 10 Reasons Not to Try VMIDOC,是一份很不错的参考资料,具有较高参考价值,感兴趣的可以下载看看
Top 10 Reasons Not to Try VMI不仅能给你参考与借鉴,还能够让学到许多成功方法与技巧,赶快来下载Top ...该文档为Top 10 Reasons Not to Try VMI,是一份很不错的参考资料,具有较高参考价值,感兴趣的可以下载看...
Just search for “big data” and “Hadoop” on LinkedIn and you will see that there are a large number of high-salary openings for developers who know how to use Hadoop. In addition to giving you ...
42 Reasons To Start a Business Analyst Career
5. **鲁棒性**:即使在部分组件失效的情况下,DLT也能保持系统的整体稳定。 6. **成本效益**:DLT有助于减少不必要的资源浪费,实现成本最小化。 7. **简化设计**:DLT提供了一套简化的设计原则,降低系统复杂度。 8...
标题中的“10 reasons to use logback”是一个讨论日志框架选择的主题,它提出了使用logback而非其他日志工具(如log4j)的十个理由。logback是log4j的作者Ceki Gülcü创建的一个更现代、性能更优的日志框架。在...
If your device is not detected, use Impactor's USB Driver Scan feature to attempt to automatically construct and install a driver for your device. You do not need the Android SDK installed to use ...
It emphasizes the need for businesses to not only track website traffic but also understand customer behavior, preferences, and the reasons behind their actions. Key topics include: - **Understanding...
在 "SF_Top10_Reasons - V2.pptx" 文件中,提到了升级到 Storage Foundation 或 VxFusion(可能的最新版本)的十大理由。以下是对这些关键特性和优势的详细解释: 1. **异构平台支持**:Storage Foundation 支持...
### Microsoft Dynamics AX 2012:十大购买理由深度解析 **一、快速实现价值** 在竞争激烈的商业环境中,企业需要迅速部署解决方案并立即获得成果。Microsoft Dynamics AX 2012通过预构建的行业能力,针对制造、...
本压缩包文件“28119_white_paper_reasons_to_migrate_to_delphi_xe.ZIP.zip”包含了一份名为“Reasons_to_migrate_to_Delphi_XE_White_Paper.pdf”的白皮书,专门讨论了迁移到Delphi XE的种种优势。以下将详细阐述...
The only problem is, a lot of times, you probably want to allow for deep links into your app using tiny urls, but for the best of reasons, do not want to register for all urls starting with ...
For example, Bank of America Merrill Lynch's Quartz project and JPMorgan Chase's Athena project both strategically use Python alongside established technologies to enhance their core IT systems....
5. **智能调度技术**:利用智能调度功能,安全任务会自动在计算机处于空闲状态时执行,避免在工作高峰期对用户造成干扰。这样,安全更新和扫描等非关键任务可以在不影响正常工作的前提下完成。 Symantec Insight...