I'm sharing you my experience with CDH.(it is jpurely a personal recommendation)
CDH source code is basically from the apache svn itself,but not mirrored to apache releases.
A CDH release would be corresponding to a certain/latest release from apache with a good number
of patches on top. Majority of these patches would be available in hadoop svn but may be not
part of the current Apache Hadoop release.
The major advantages I saw with CDH are
- Cloudera provides a tool SCM that would kind of automatically set up a hadoop cluster for
you
- Cloudera bundles the hadoop related projects which is pretty ease to install on any standard
linux boxes()
- Cloudera ensures that the CDH release and the available hadoop projects for the release
are compatible(for example you don't have to take the hassle on finding the compatible
hbase release with your hadoop release and integration between related projects etc)
- There are a good number of large enterprises using CDH with cloudera support.(Cloudera provides
various support packages)
- Since a large enterprises are dependent on CDH, it in turn speaks how well CDH is tested
and if a bug arises how large would be the impact. (In short CDH is well tested)
- Under Cloudera support you get help and suggestions from Cloudera hadoop expert engineers
in fine tuning your hadoop platform, tools application etc.
- When you go in with some end to end enterprise solutions with hadoop, you can even get advises
on best practices in your code level as well from them.(You do get the same from hadoop user
groups as well but here there is a dedicated timeline based commitment when you are a customer
of Cloudera)
- If you don't have the best hadoop resources in store, you may find tough times in handling
failures on your cluster , fine tuning your cluster, updating your cluster, optimizing your
applications etc. Cloudera guys would throw light almost all critical issues and helps in
getting resolved under stringent SLAs.
These points never says Apache Releases not so great. It is definitely the best and back bone
of hadoop. It is well tested as well. But when it comes nonavailability of expert hadoop resources
in house, you can face lot of unexpected hurdles which you may need to handle in time bound
manner and there you need to have hadoop consultants.
Definitely you'd get more valid points directly from the Cloudera engineers.(Some official
comments)
Hope it helps!..
来源:http://mail-archives.apache.org/mod_mbox/hive-user/201111.mbox/%3C1320337854.22398.YahooMailNeo@web121217.mail.ne1.yahoo.com%3E
分享到:
相关推荐
Spring Data for Apache Hadoop API。 Spring Data for Apache Hadoop 开发文档
《Pro Apache Hadoop》是一本深入探讨Apache Hadoop生态系统的专业书籍,旨在为读者提供全面且深入的Hadoop知识。Hadoop是大数据处理领域的重要框架,由Apache软件基金会开发,以分布式计算为核心,实现了对海量数据...
标题中提到的“SQL for Apache Hadoop”指向一种通过SQL语言访问和操作Apache Hadoop...但即便如此,文档提供的信息依然足够清晰,能够让读者抓住“SQL for Apache Hadoop”的核心概念以及Cloudera Impala的使用情况。
The book begins with an overview of big data and Apache Hadoop. Then, you will set up a pseudo Hadoop development environment and a multi-node enterprise Hadoop cluster. You will see how the parallel ...
Apache Hadoop 是一个开源的分布式计算框架,专为处理和存储大规模数据集而设计。它由Apache软件基金会维护,是大数据处理领域中的核心组件。Hadoop 的主要特点是高容错性和可扩展性,使得它能够处理PB级别的数据。...
《Apache Hadoop YARN》,全名Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2 (Addison-Wesley Data & Analytics Series) 这本书是2014年3月31号出版的,是基于Hadoop 2...
Wangda Tan and Wei-Chiu Chuang the current status of Apache Hadoop 3.x—how it’s used today in deployments large and small, and they dive into the exciting present and future of Hadoop 3.x—features ...
《Pro Apache Hadoop, 2nd Edition》是一本专门介绍Apache Hadoop第二版的专业书籍。Hadoop是一个开源框架,旨在从大型数据集中进行存储和处理的分布式系统。它允许开发者使用简单的编程模型在计算机集群上分布式地...
Starting with the basics of Apache Hadoop and Solr, this book then dives into advanced topics of optimizing search with some interesting real-world use cases and sample Java code.
Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2 完整版哦,绝对清晰,不是扫描的mobi格式电子书,请使用电子书库calibre (http://calibre-ebook.com/download) 打开。
Apache Hadoop v2.7.0是大数据处理领域的一个关键组件,它是一个开源框架,主要用于分布式存储和计算。Hadoop的出现使得企业能够处理和分析海量数据,即使这些数据超过了单台机器的处理能力。在v2.7.0这个版本中,...
Apache Hadoop (hadoop-3.3.3.tar.gz)项目为可靠、可扩展的分布式计算开发开源软件。 Apache Hadoop 软件库是一个框架,它允许使用简单的编程模型跨计算机集群分布式处理大型数据集。它旨在从单个服务器扩展到数千...
Apache Hadoop是一个开源框架,主要用于分布式存储和计算大数据集。Hadoop 3.1.0是这个框架的一个重要版本,提供了许多性能优化和新特性。在Windows环境下安装和使用Hadoop通常比在Linux上更为复杂,因为Hadoop最初...
该解决方案基于Cloudera的Apache Hadoop发行版(CDH),这是一款全球领先的商业和非商业环境中使用的Hadoop发行版。CDH提供了将Hadoop应用于生产环境所需的全面路径,使企业能够利用Hadoop解决业务问题。 #### ...
在这个“Apache Hadoop基于开源监控模板大全”中,我们关注的是如何使用它们来监控Hadoop生态系统中的组件,如Hadoop本身、Zookeeper以及HBase。 首先,JMX(Java Management Extensions)是Java平台提供的一种标准...