Friday, December 17, 2010Hadoop cluster at Ebay
I am always curious to know how other companies are installing Hadoop clusters. How are they using its ecosystem. Since Hadoop is still relatively new, there are no best practices. Every company is implementing what they think is the best infrastructure for the Hadoop Cluster.
At Hadoop NYC 2010 conference, ebay showcased there implementation of Hadoop production cluster. Following are some tidbits on ebay's implementation of Hadoop.
- JobTracker, Namenode, Zookeeper, HBase Master are all enterprise nodes running in Sun 64 bit architecture. They are running red hat linux with 72GB Ram and 4TB disks.
- There are 4000 datanodes, each running cent OS with 48 GB RAM and 10TB space
- Ganglia and Nagios are used for monitoring and alerting. Ebay is also building a custom solution to augment them.
- ETL is done using mostly Java Map Reduce programs
- Pig is used to build data pipelines
- Hive is used for AdHoc queries
- Mahout is used for Data Mining
They are toying with the idea of using Oozie to manage work flows but haven't decided to use it yet.
It looks like they are doing all the right things.
分享到:
相关推荐
在大数据处理领域,Hadoop是一个不可或缺的开源框架,它提供了分布式存储和计算的能力,使得海量数据的处理变得可能。本文将深入探讨“Hadoop集群配置”这一主题,结合提供的WordCount代码实例,来阐述Hadoop集群...
Hadoop cluster planning guide
【标题】"hadoop-cluster-build"涉及的知识点主要围绕着Hadoop集群的构建,这是一个大数据处理的核心技术。Hadoop是一个开源框架,它允许在廉价硬件上进行大规模数据处理,具有高度可扩展性和容错性。 【描述】...
hadoop-cluster-docker, 在 Docker 容器中运行 Hadoop 在 Docker 容器内运行Hadoop集群博客:在 Docker 更新中运行Hadoop集群。博客:基于Docker搭建Hadoop集群之升级版 3节点Hadoop集群 1.拉 Docker 图像sudo do
### Hadoop集群部署详解 #### 一、Hadoop概述与重要性 Hadoop是一个开源软件框架,用于分布式存储和处理大型数据集。它基于Google的MapReduce论文和Google File System (GFS) 论文而设计,能够有效地处理PB级别的...
指导Hadoop集群部署的资料, 注意: 内容是英文的, 可能有些同学会失望
Hadoop在centOS系统下的安装文档,系统是虚拟机上做出来的,一个namenode,两个datanode,详细讲解了安装过程。
[Packt Publishing] Hadoop Operations and Cluster Management Cookbook (E-Book) ☆ 图书概要:☆ Over 60 recipes showing you how to design, configure, manage, monitor, and tune a Hadoop cluster ...
《Hadoop在eBay中的使用历程》是一篇深入探讨大数据处理技术如何在电子商务巨头eBay中发挥关键作用的文章。文章作者通过分享eBay在使用Hadoop进行数据处理和分析的实践经验,揭示了这一开源框架在实际业务场景中的...
人工智能-Hadoop
Hadoop Multi Node Cluster 安装步骤.pdf
The Hadoop market is predicted to grow at a compound annual growth rate over the next several years. Several good tools and guides describe how to deploy Hadoop clusters, but very little ...
Hadoop Performance at LinkedIn
大数据与云计算培训学习资料 基于Hadoop平台的eBay用户邮件数据分析 共26页.pptx
### Hadoop集群搭建详解 #### 一、目的 本文档旨在详细介绍如何安装、配置和管理非简单的Hadoop集群,这些集群可能包含从几台到数千台节点不等的大规模集群。如果你想要尝试Hadoop的基本功能,可以先在单机上进行...
Hadoop Single Node Cluster的详细安装,master主机与data1、data2、data3三台节点连接。
A sample of the NCDC weather dataset that is used throughout the book can be found at https://github.com/tomwhite/hadoop-book/tree/master/input/ncdc/all. and another one : The full dataset is stored...
为Hadoop MultiNode Cluster创建AWS基础架构 Hadoop名称节点 配置Hadoop名称节点 Hadoop数据节点 配置Hadoop数据节点 先决条件 Ansible应该已安装和配置 应该安装和配置AWS CLI 角色的其他要求包含在特定角色的...