ubuntu搭建hadoop

ouyida3

浏览: 50367 次
性别:
来自: 广州

最近访客更多访客>>

chenzhenguo

coldg

蓝天飞鹰

青春..荒唐

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

Ubuntu搭建hadoop
一环境准备
ubuntu-11.04-desktop-i386
Hadoop-0.20.2-cdh3u1
Hbase-0.90.3-cdh3u1
Zookeeper-3.3.3-cdh3u1
Jdk-6u26-linux-i586.bin

二安装中文输入法
System-administration-language support-keyboard input method
System->Preferences->Keyboard Input Method->Tab Input Method->Add new

三安装飞鸽
http://code.google.com/p/iptux/
tar -xvf iptux*.tar.gz
cd iptux*
./configure
sudo apt-get build-dep gedit
make
sudo make install

四安装配置ssh

没配置ssh服务前，是连接不成功
Administrator@pc ~
$ ssh localhost
ssh: connect to host localhost port 22: Connection refused

输入ssh-host-config配置，一直输入yes，value输入netsec tty
Administrator@pc ~
$ ssh-host-config
*** Info: Generating /etc/ssh_host_key
*** Info: Generating /etc/ssh_host_rsa_key
*** Info: Generating /etc/ssh_host_dsa_key
*** Info: Generating /etc/ssh_host_ecdsa_key
*** Info: Creating default /etc/ssh_config file
*** Info: Creating default /etc/sshd_config file
*** Info: Privilege separation is set to yes by default since OpenSSH 3.3.
*** Info: However, this requires a non-privileged account called 'sshd'.
*** Info: For more info on privilege separation read /usr/share/doc/openssh/READ
ME.privsep.
*** Query: Should privilege separation be used? (yes/no) yes
*** Info: Note that creating a new user requires that the current account have
*** Info: Administrator privileges. Should this script attempt to create a
*** Query: new local account 'sshd'? (yes/no) yes
*** Info: Updating /etc/sshd_config file
*** Info: Added ssh to C:\WINDOWS\system32\driversc\services

*** Warning: The following functions require administrator privileges!

*** Query: Do you want to install sshd as a service?
*** Query: (Say "no" if it is already installed as a service) (yes/no) yes
*** Query: Enter the value of CYGWIN for the daemon: [] netsec tty

*** Info: The sshd service has been installed under the LocalSystem
*** Info: account (also known as SYSTEM). To start the service now, call
*** Info: `net start sshd' or `cygrunsrv -S sshd'. Otherwise, it
*** Info: will start automatically after the next reboot.

*** Info: Host configuration finished. Have fun!

启动服务
Administrator@pc ~
$ net start sshd
CYGWIN sshd 服务正在启动 .
CYGWIN sshd 服务已经启动成功。

连接自己和别人都成功
Administrator@pc ~
$ ssh localhost
Administrator@localhost's password:

Administrator@pc ~
$ ssh 130.51.38.219
The authenticity of host '130.51.38.219 (130.51.38.219)' can't be established.
ECDSA key fingerprint is c4:d0:48:2a:56:c5:22:26:7e:28:61:2b:6c:e3:c1:7a.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '130.51.38.219' (ECDSA) to the list of known hosts.
Administrator@130.51.38.219's password:
Last login: Mon Aug 8 14:58:31 2011 from localhost

三免密码ssh配置
比如，需要master免密码ssh到slave，则需要在master生成公钥放到slave上
Administrator@pc ~
$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/Administrator/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/Administrator/.ssh/id_rsa.
Your public key has been saved in /home/Administrator/.ssh/id_rsa.pub.
The key fingerprint is:
e6:15:b6:91:05:ce:6a:31:41:ad:00:14:cd:86:1b:b6 Administrator@pc
The key's randomart image is:
+--[ RSA 2048]----+
|   .+* .o....    |
|    + = +.o     |
|   . = .o.B      |
|    E   .= +     |
|        S o      |
|       + .       |
|        .        |
|                 |
|                 |
+-----------------+

公钥生成在D:\cygwin\home\Administrator\.ssh，可以生成keys文件给s，也可以把里面的串给s，让s自己粘贴到自己的keys文件里。
Administrator@pc ~
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

在master里修改C:\WINDOWS\system32\drivers\etc\hosts，增加slave的机器名和ip映射
改完ssh的配置记得重启！net stop sshd

四安装java
sudo apt-get install openjdk-6-jdk 不成功
chmod u+x jdk-6u26-linux-i586.bin
./jdk-6u26-linux-i586.bin
ln jdk-6u26-linux-i586.bin –s jdk

/etc/profile
export JAVA_HOME=/opt/jdk
export HADOOP_HOME=/opt/hadoop
export HBASE_HOME=/opt/hbase
export HADOOP_CONF_DIR=$HADOOP_HOME/conf
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/tools.jar
export PATH=.:$JAVA_HOME/bin:$JAVA_HOME/lib:$JAVA_HOME/jre/bin:$PATH:$HOME/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin
java –version不重启生效；jps要重启才生效

五安装hadoop
把下载的hadoop-0.20.2.tar.gz，hbase-0.90.3.tar.gz，zookeeper-3.3.2.tar.gz放到/opt目录（或者/usr/local，这两个比较符合习惯），然后解压

六配置hadoop
配置以下文件
hadoop-env.sh
core-site.xml
hdfs-site.xml
mapred-site.xml
hbase-env.sh
hbase-site.xml
zoo.cfg

<property>
<name>fs.default.name</name>
<value>hdfs://c1:54310</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>

<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/data/hadoop/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/data/hadoop/data</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>30</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>2047</value>
</property>

<property>
<name>mapred.job.tracker</name>
<value>c1:54311</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>7</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>7</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1024M</value>
</property>
<property>
<name>mapred.map.tasks.speculative.execution</name>
<value>false</value>
</property>
<property>
<name>mapred.reduce.tasks.speculative.execution</name>
<value>false</value>
</property>
<property>
<name>mapred.max.tracker.blacklists</name>
<value>200</value>
</property>
<property>
<name>mapred.max.tracker.failures</name>
<value>20</value>
</property>
<property>
<name>mapred.task.timeout</name>
<value>1800000</value>
</property>

hadoop-env.sh
export JAVA_HOME=/opt/jdk
export HADOOP_OPTS=-server
export HADOOP_HEAPSIZE=2048

执行
sudo update-alternatives --install /usr/bin/java java /opt/jdk/bin/java 300
sudo update-alternatives --install /usr/bin/javac javac /opt/jdk/bin/javac 300

hbase-env.sh
export JAVA_HOME=/opt/jdk
export HBASE_CLASSPATH=/opt/hadoop/conf
export HBASE_MANAGES_ZK=true
export HADOOP_OPTS=-server
export HADOOP_HEAPSIZE=2048

<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>

<property>
<name>hbase.rootdir</name>
<value>hdfs://c1:54310/hbase</value>
<description>The directory shared by region servers.
</description>
</property>

<property>
<name>dfs.replication</name>
<value>2</value>
<description>The replication count for HLog and HFile storage. Should not be greater than HDFS datanode count.
</description>
</property>

<property>
<name>hbase.regionserver.hlog.replication</name>
<value>2</value>
<description>For HBase to offer good data durability, we roll logs if
filesystem replication falls below a certain amount. In psuedo-distributed
mode, you normally only have the local filesystem or 1 HDFS DataNode, so you
don't want to roll logs constantly.</description>
</property>

<property>
<name>hbase.client.retries.number</name>
<value>777</value>
<description>Maximum retries. Used as maximum for all retryable
operations such as fetching of the root region from root region
server, getting a cell's value, starting a row update, etc.
Default: 10.
</description>
</property>

<property>
    <name>hbase.zookeeper.quorum</name>
    <value>c1</value>
    <description>Comma separated list of servers in the ZooKeeper Quorum.
    For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
    By default this is set to localhost for local and pseudo-distributed modes
    of operation. For a fully-distributed setup, this should be set to a full
    list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
    this is the list of servers which we will start/stop ZooKeeper on.
    </description>
</property>

<property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/zookeeper</value>
    <description>Property from ZooKeeper's config zoo.cfg.
    The directory where the snapshot is stored.
    </description>
</property>

启动错误：
启动namenode
java.io.IOException: File /tmp/hadoop-hadoop/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)
把replicat由1改为2

java.io.IOException: Call to hadoop5/10.20.151.9:9000 failed on local exception: java.io.EOFException
hbase的lib里的hadoop的jar版本不对

java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:37)
解决：
把apache的configure包拷贝到hbase的lib目录即可