原文地址:
http://www.quantivo.com/blog/top-5-reasons-not-use-hadoop-analytics
As a former diehard fan of Hadoop, I LOVED the fact that you can work on up to Petabytes of data. I loved the ability to scale to thousands of nodes to process a large computation job. I loved the ability to store and load data in a very flexible format. In many ways, I loved Hadoop, until I tried to deploy it for analytics. That’s when I became disillusioned with Hadoop (it just "ain't all that").
At Quantivo, we’ve explored many ways to deploy Hadoop to answer analytical queries (trust me – I made every attempt to include it in my day job). At the end of the day, it became an exercise much like trying to build a house with just a hammer - Conceivably, it’s possible, but it’s unnecessarily painful and ridiculously cost-inefficient to do.
Let me share with you my top reasons why Hadoop should not be used for Analytics.
1 - Hadoop is a framework, not a solution – For many reasons, people have an expectation that Hadoop answers Big Data analytics questions right out of the box. For simple queries, this works. For harder analytics problems, Hadoop quickly falls flat and requires you to directly develop Map/Reduce code directly. For that reason, Hadoop is more like J2EE programming environment than a business analytics solution.
2 - Hive and Pig are good, but do not overcome architectural limitations – Both Hive and Pig are very well thought-out tools that enable the lay engineer to quickly being productive with Hadoop. After all, Hive and Pig are two tools that are used to translate analytics queries in common SQL or text into Java Map/Reduce jobs that can be deployed in a Hadoop environment. However, there are limitations in the Map/Reduce framework of Hadoop that prohibit efficient operation, especially when you require inter-node communications (as is the case with sorts and joins).
3 - Deployment is easy, fast and free, but very costly to maintain and develop – Hadoop is very popular because within an hour, an engineer can download, install, and issue a simple query. It’s also an open source project, so there are no software costs, which makes it a very attractive alternative to Oracle and Teradata. The true costs of Hadoop become obvious when you enter maintenance and development phase. Since Hadoop is mostly a development framework, Hadoop-proficient engineers are required to develop an application as well as optimize it to execute efficiently in a Hadoop cluster. Again, it’s possible but very hard to do.
4 - Great for data pipelining and summarization, horrible for AdHoc Analysis – Hadoop is great at analyzing large amounts of data and summarizing or “data pipelining” to transform the raw data into something more useful for another application (like search or text mining) – that’s what’s it’s built for. However, if you don’t know the analytics question you want to ask or if you want to explore the data for patterns, Hadoop becomes unmanageable very quickly. Hadoop is very flexible at answering many types of questions, as long as you spend the cycles to program and execute MapReduce code.
5 - Performance is great, except when it’s not – By all measures, if you wanted speed and you are required to analyze large quantities of data, Hadoop allows you to parallelize your computation to thousands of nodes. The potential is definitely there. But not all analytics jobs can easily be parallelized, especially when user interaction drives the analytics. So, unless the Hadoop application is designed and optimized for the question that you want to ask, performance can quickly become very slow – as each map/reduce job has to wait until the previous jobs are completed. Hadoop is always as slow as the slowest compute MapReduce job.
That said, Hadoop is a phenomenal framework for doing some very sophisticated data analysis. Ironically, it’s also a framework that requires a lot of programming effort to get those questions answered.
分享到:
相关推荐
Top 10 Reasons Not to Try VMI.doc
还在四处寻找有关于Top 10 Reasons Not to Try VMIDOC吗?整理发布的这一款Top 10 Reasons No...该文档为Top 10 Reasons Not to Try VMIDOC,是一份很不错的参考资料,具有较高参考价值,感兴趣的可以下载看看
Top 10 Reasons Not to Try VMI不仅能给你参考与借鉴,还能够让学到许多成功方法与技巧,赶快来下载Top ...该文档为Top 10 Reasons Not to Try VMI,是一份很不错的参考资料,具有较高参考价值,感兴趣的可以下载看...
Just search for “big data” and “Hadoop” on LinkedIn and you will see that there are a large number of high-salary openings for developers who know how to use Hadoop. In addition to giving you ...
Big Data Analytics: A Practical Guide for Managers not only supplies descriptions of common tools, but also surveys the various products and vendors that supply the big data market. Comparing and ...
42 Reasons To Start a Business Analyst Career
Microsoft_Dynamics_AX_2012_Top_10_Reasons_To_Buy
分布式计算入门的第一篇论文,是入门级的概括性质的文章!
SF_Top10_Reasons - V2.pptx
DS-01276-CN_5_Reasons_to_Upgrade_to_SEP12.pdf
However, this approach may not necessarily be suitable for your purposes, and you may wish to read the chapters in a different order or just dip into particular sections of the book. If this is true ...
英巴卡迪诺资源,讲述Delphi_XE2特性,很好的资源。
If your device is not detected, use Impactor's USB Driver Scan feature to attempt to automatically construct and install a driver for your device. You do not need the Android SDK installed to use ...
XP 10 reasons to fail
before and decided not to use it for various reasons. If you are one of the many users who are not using .NET because it is only available on Windows, doesn’t have a strong open source community, and...
28119_white_paper_reasons_to_migrate_to_delphi_xe.ZIP.zip
Top Reasons people will buy this book: -Covers Linux on PPC -Top-down approach traces functionality from user space into the kernel -Lots of code commentary and examples. It walks you through the ...