`
ouyida3
  • 浏览: 50367 次
  • 性别: Icon_minigender_1
  • 来自: 广州
社区版块
存档分类
最新评论

ubuntu搭建hadoop

 
阅读更多
Ubuntu搭建hadoop
一 环境准备
ubuntu-11.04-desktop-i386
Hadoop-0.20.2-cdh3u1
Hbase-0.90.3-cdh3u1
Zookeeper-3.3.3-cdh3u1
Jdk-6u26-linux-i586.bin

二 安装中文输入法
System-administration-language support-keyboard input method
System->Preferences->Keyboard Input Method->Tab Input Method->Add new

三 安装飞鸽
http://code.google.com/p/iptux/
tar -xvf iptux*.tar.gz
cd iptux*
./configure
sudo apt-get build-dep gedit
make
sudo make install

四 安装配置ssh

没配置ssh服务前,是连接不成功
Administrator@pc ~
$ ssh localhost
ssh: connect to host localhost port 22: Connection refused

输入ssh-host-config配置,一直输入yes,value输入netsec tty
Administrator@pc ~
$ ssh-host-config
*** Info: Generating /etc/ssh_host_key
*** Info: Generating /etc/ssh_host_rsa_key
*** Info: Generating /etc/ssh_host_dsa_key
*** Info: Generating /etc/ssh_host_ecdsa_key
*** Info: Creating default /etc/ssh_config file
*** Info: Creating default /etc/sshd_config file
*** Info: Privilege separation is set to yes by default since OpenSSH 3.3.
*** Info: However, this requires a non-privileged account called 'sshd'.
*** Info: For more info on privilege separation read /usr/share/doc/openssh/READ
ME.privsep.
*** Query: Should privilege separation be used? (yes/no) yes
*** Info: Note that creating a new user requires that the current account have
*** Info: Administrator privileges.  Should this script attempt to create a
*** Query: new local account 'sshd'? (yes/no) yes
*** Info: Updating /etc/sshd_config file
*** Info: Added ssh to C:\WINDOWS\system32\driversc\services


*** Warning: The following functions require administrator privileges!

*** Query: Do you want to install sshd as a service?
*** Query: (Say "no" if it is already installed as a service) (yes/no) yes
*** Query: Enter the value of CYGWIN for the daemon: [] netsec tty

*** Info: The sshd service has been installed under the LocalSystem
*** Info: account (also known as SYSTEM). To start the service now, call
*** Info: `net start sshd' or `cygrunsrv -S sshd'.  Otherwise, it
*** Info: will start automatically after the next reboot.

*** Info: Host configuration finished. Have fun!


启动服务
Administrator@pc ~
$ net start sshd
CYGWIN sshd 服务正在启动 .
CYGWIN sshd 服务已经启动成功。

连接自己和别人都成功
Administrator@pc ~
$ ssh localhost
Administrator@localhost's password:

Administrator@pc ~
$ ssh 130.51.38.219
The authenticity of host '130.51.38.219 (130.51.38.219)' can't be established.
ECDSA key fingerprint is c4:d0:48:2a:56:c5:22:26:7e:28:61:2b:6c:e3:c1:7a.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '130.51.38.219' (ECDSA) to the list of known hosts.
Administrator@130.51.38.219's password:
Last login: Mon Aug  8 14:58:31 2011 from localhost

三 免密码ssh配置
比如,需要master免密码ssh到slave,则需要在master生成公钥放到slave上
Administrator@pc ~
$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/Administrator/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/Administrator/.ssh/id_rsa.
Your public key has been saved in /home/Administrator/.ssh/id_rsa.pub.
The key fingerprint is:
e6:15:b6:91:05:ce:6a:31:41:ad:00:14:cd:86:1b:b6 Administrator@pc
The key's randomart image is:
+--[ RSA 2048]----+
|   .+* .o....    |
|    + =  +.o     |
|   . = .o.B      |
|    E   .= +     |
|        S o      |
|       + .       |
|        .        |
|                 |
|                 |
+-----------------+

公钥生成在D:\cygwin\home\Administrator\.ssh,可以生成keys文件给s,也可以把里面的串给s,让s自己粘贴到自己的keys文件里。
Administrator@pc ~
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

在master里修改C:\WINDOWS\system32\drivers\etc\hosts,增加slave的机器名和ip映射
改完ssh的配置记得重启!net stop sshd

四 安装java
sudo apt-get install openjdk-6-jdk  不成功
chmod u+x jdk-6u26-linux-i586.bin
./jdk-6u26-linux-i586.bin
ln jdk-6u26-linux-i586.bin –s jdk

/etc/profile
export JAVA_HOME=/opt/jdk
export HADOOP_HOME=/opt/hadoop
export HBASE_HOME=/opt/hbase
export HADOOP_CONF_DIR=$HADOOP_HOME/conf
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/tools.jar
export PATH=.:$JAVA_HOME/bin:$JAVA_HOME/lib:$JAVA_HOME/jre/bin:$PATH:$HOME/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin
java –version不重启生效;jps要重启才生效

五 安装hadoop
把下载的hadoop-0.20.2.tar.gz,hbase-0.90.3.tar.gz,zookeeper-3.3.2.tar.gz放到/opt目录(或者/usr/local,这两个比较符合习惯),然后解压


六 配置hadoop
配置以下文件
hadoop-env.sh
core-site.xml
hdfs-site.xml
mapred-site.xml
hbase-env.sh
hbase-site.xml
zoo.cfg

<property>
<name>fs.default.name</name>
<value>hdfs://c1:54310</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>

<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/data/hadoop/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/data/hadoop/data</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>30</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>2047</value>
</property>

<property>
  <name>mapred.job.tracker</name>
  <value>c1:54311</value>
</property>
<property>
  <name>mapred.tasktracker.map.tasks.maximum</name>
  <value>7</value>
</property>
<property>
  <name>mapred.tasktracker.reduce.tasks.maximum</name>
  <value>7</value>
</property>
<property>
  <name>mapred.child.java.opts</name>
  <value>-Xmx1024M</value>
</property>
<property>
  <name>mapred.map.tasks.speculative.execution</name>
  <value>false</value>
</property>
<property>
  <name>mapred.reduce.tasks.speculative.execution</name>
  <value>false</value>
</property>
<property>
  <name>mapred.max.tracker.blacklists</name>
  <value>200</value>
</property>
<property>
  <name>mapred.max.tracker.failures</name>
  <value>20</value>
</property>
<property>
  <name>mapred.task.timeout</name>
  <value>1800000</value>
</property>

hadoop-env.sh
export JAVA_HOME=/opt/jdk
export HADOOP_OPTS=-server
export HADOOP_HEAPSIZE=2048


执行
sudo update-alternatives --install /usr/bin/java java /opt/jdk/bin/java 300 
sudo update-alternatives --install /usr/bin/javac javac /opt/jdk/bin/javac 300

hbase-env.sh
export JAVA_HOME=/opt/jdk
export HBASE_CLASSPATH=/opt/hadoop/conf
export HBASE_MANAGES_ZK=true
export HADOOP_OPTS=-server
export HADOOP_HEAPSIZE=2048

<property>
<name>hbase.cluster.distributed</name>
  <value>true</value>
</property>

<property>
  <name>hbase.rootdir</name>
  <value>hdfs://c1:54310/hbase</value>
  <description>The directory shared by region servers.
  </description>
</property>

<property>
  <name>dfs.replication</name>
  <value>2</value>
  <description>The replication count for HLog and HFile storage. Should not be greater than HDFS datanode count.
  </description>
</property>

<property>
  <name>hbase.regionserver.hlog.replication</name>
  <value>2</value>
  <description>For HBase to offer good data durability, we roll logs if
  filesystem replication falls below a certain amount.  In psuedo-distributed
  mode, you normally only have the local filesystem or 1 HDFS DataNode, so you
  don't want to roll logs constantly.</description>
</property>

<property>
  <name>hbase.client.retries.number</name>
  <value>777</value>
  <description>Maximum retries.  Used as maximum for all retryable
  operations such as fetching of the root region from root region
  server, getting a cell's value, starting a row update, etc.
  Default: 10.
  </description>
</property>

<property>
    <name>hbase.zookeeper.quorum</name>
    <value>c1</value>
    <description>Comma separated list of servers in the ZooKeeper Quorum.
    For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
    By default this is set to localhost for local and pseudo-distributed modes
    of operation. For a fully-distributed setup, this should be set to a full
    list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
    this is the list of servers which we will start/stop ZooKeeper on.
    </description>
</property>

<property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/zookeeper</value>
    <description>Property from ZooKeeper's config zoo.cfg.
    The directory where the snapshot is stored.
    </description>
</property>




启动错误:
启动namenode
java.io.IOException: File /tmp/hadoop-hadoop/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)
把replicat由1改为2

java.io.IOException: Call to hadoop5/10.20.151.9:9000 failed on local exception: java.io.EOFException
hbase的lib里的hadoop的jar版本不对

java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:37)
解决:
把apache的configure包拷贝到hbase的lib目录即可




分享到:
评论

相关推荐

    ubuntu搭建hadoop单节点.docx

    ubuntu 搭建 Hadoop 单节点 Hadoop 是一个由 Apache 基金会所开发的分布式系统根底架构,用户可以在不了解分布式底层细节的情况下,开发分布式程序。Hadoop 实现了一个分布式文件系统(HDFS),简称 HDFS。HDFS 有...

    用虚拟机在ubuntu上搭建hadoop平台的单机模式

    ### 使用虚拟机在Ubuntu上搭建Hadoop平台单机模式 #### 概述 本文旨在详细介绍如何在虚拟机环境下,利用Ubuntu系统搭建Hadoop平台的单机模式。通过本教程,您可以掌握从安装虚拟机到配置Hadoop环境的全过程。本文...

    详解搭建ubuntu版hadoop集群

    在本文中,我们将详细阐述如何在Ubuntu 16.04环境下搭建Hadoop集群。Hadoop是一个开源的分布式计算框架,它允许处理和存储大量数据,尤其适合大数据分析。Ubuntu是Linux发行版中的一个流行选择,其稳定性和丰富的...

    基于Ubuntu16+Hadoop2.7.5+Eclipse Neon的云计算开发平台的搭建

    基于Ubuntu16+Hadoop2.7.5+Eclipse Neon的云计算开发平台的搭建

    ubuntu下搭建Hadoop

    在Ubuntu系统中搭建Hadoop是一项基础的分布式计算环境配置任务,这个过程涵盖了多个步骤,包括安装Linux操作系统、创建Hadoop用户组和用户、安装JDK、修改机器名、安装SSH服务以及建立SSH无密码登录,最后是安装...

    Ubuntu上搭建Hadoop2.x详细文档

    ### Ubuntu上搭建Hadoop2.x详细文档 #### 背景与动机 随着大数据时代的到来,数据处理的需求急剧增加,传统的数据库系统难以满足大规模数据处理的要求。因此,Apache Hadoop作为一个开源的大数据处理框架,凭借其...

    在ubuntu上搭建hadoop总结

    在Ubuntu上搭建Hadoop是一个常见的任务,特别是在学习和实验分布式计算环境时。本文将详细讲解在Ubuntu 12.04上安装Hadoop伪分布式模式的过程,这将帮助你理解Hadoop的基础设置,并为日后扩展到完全分布式环境打下...

    ubuntu18.04搭建hadoop步骤

    ubuntu18.04或者centos7搭建hadoop集群,可以参考博客https://blog.csdn.net/u013305783/article/details/83744122

    ubuntu下搭建hadoop

    ### Ubuntu 下搭建 Hadoop Hadoop 是一个能够对大量数据进行分布式处理的软件框架,它在大数据处理领域占据着重要地位。本文将详细介绍如何在 Ubuntu 操作系统下搭建 Hadoop 的单机版与伪分布式环境。 #### 2.1 ...

    VMware7.1虚拟机+Ubuntu12.04搭建hadoop环境记录

    小白搭建hadoop完全分布式环境笔记,步骤详细,原创

    Ubuntu20.04配置Hadoop.txt

    本教程是根据个人在UBUNTU虚拟机上安装配置Hadoop2.7.3的实际操作步骤一步步记录下来的,大部分指令操作的目的都加了注释以方便理解。(本教程很详细,如果还是遇到问题可以直接咨询楼主,不会让你的积分百花的)

    Ubuntu下Hadoop的配置与搭建

    在Ubuntu下利用Hadoop配置与搭建分布式系统。 全部是自己搜网上资料, 自己摸索写的。 内容详细!

    如何在Vmware上用Ubuntu系统搭建Hadoop和spark环境

    一个描述环境搭建的文档,我用的Vmware10和Ubuntu14.04,Hadoop2.7

    Ubuntu20.04搭建hadoop2.10.0集群.md

    本文档详细介绍了如何在Ubuntu20.04操作系统上搭建Hadoop2.10.0集群,包括环境配置、软件安装、网络设置、集群启动等多个方面。 ##### 1. 服务器环境 集群包含三台服务器: - **us1**:`192.168.94.136` - **us2*...

    ubuntu运行hadoop的wordcount

    ### Ubuntu上运行Hadoop WordCount实例详解 #### 一、环境搭建与配置 在Ubuntu系统上部署并运行Hadoop WordCount实例,首先需要确保已经安装了Hadoop环境,并且版本为hadoop-0.20.2。此版本较旧,主要用于教学或...

    Ubuntu下搭建Hadoop单机模式环境

    本文主要介绍了在Ubuntu系统上Hadoop单机版测试环境的搭建过程。

Global site tag (gtag.js) - Google Analytics