Ubuntu搭建hadoop
一 环境准备
ubuntu-11.04-desktop-i386
Hadoop-0.20.2-cdh3u1
Hbase-0.90.3-cdh3u1
Zookeeper-3.3.3-cdh3u1
Jdk-6u26-linux-i586.bin
二 安装中文输入法
System-administration-language support-keyboard input method
System->Preferences->Keyboard Input Method->Tab Input Method->Add new
三 安装飞鸽
http://code.google.com/p/iptux/
tar -xvf iptux*.tar.gz
cd iptux*
./configure
sudo apt-get build-dep gedit
make
sudo make install
四 安装配置ssh
没配置ssh服务前,是连接不成功
Administrator@pc ~
$ ssh localhost
ssh: connect to host localhost port 22: Connection refused
输入ssh-host-config配置,一直输入yes,value输入netsec tty
Administrator@pc ~
$ ssh-host-config
*** Info: Generating /etc/ssh_host_key
*** Info: Generating /etc/ssh_host_rsa_key
*** Info: Generating /etc/ssh_host_dsa_key
*** Info: Generating /etc/ssh_host_ecdsa_key
*** Info: Creating default /etc/ssh_config file
*** Info: Creating default /etc/sshd_config file
*** Info: Privilege separation is set to yes by default since OpenSSH 3.3.
*** Info: However, this requires a non-privileged account called 'sshd'.
*** Info: For more info on privilege separation read /usr/share/doc/openssh/READ
ME.privsep.
*** Query: Should privilege separation be used? (yes/no) yes
*** Info: Note that creating a new user requires that the current account have
*** Info: Administrator privileges. Should this script attempt to create a
*** Query: new local account 'sshd'? (yes/no) yes
*** Info: Updating /etc/sshd_config file
*** Info: Added ssh to C:\WINDOWS\system32\driversc\services
*** Warning: The following functions require administrator privileges!
*** Query: Do you want to install sshd as a service?
*** Query: (Say "no" if it is already installed as a service) (yes/no) yes
*** Query: Enter the value of CYGWIN for the daemon: [] netsec tty
*** Info: The sshd service has been installed under the LocalSystem
*** Info: account (also known as SYSTEM). To start the service now, call
*** Info: `net start sshd' or `cygrunsrv -S sshd'. Otherwise, it
*** Info: will start automatically after the next reboot.
*** Info: Host configuration finished. Have fun!
启动服务
Administrator@pc ~
$ net start sshd
CYGWIN sshd 服务正在启动 .
CYGWIN sshd 服务已经启动成功。
连接自己和别人都成功
Administrator@pc ~
$ ssh localhost
Administrator@localhost's password:
Administrator@pc ~
$ ssh 130.51.38.219
The authenticity of host '130.51.38.219 (130.51.38.219)' can't be established.
ECDSA key fingerprint is c4:d0:48:2a:56:c5:22:26:7e:28:61:2b:6c:e3:c1:7a.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '130.51.38.219' (ECDSA) to the list of known hosts.
Administrator@130.51.38.219's password:
Last login: Mon Aug 8 14:58:31 2011 from localhost
三 免密码ssh配置
比如,需要master免密码ssh到slave,则需要在master生成公钥放到slave上
Administrator@pc ~
$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/Administrator/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/Administrator/.ssh/id_rsa.
Your public key has been saved in /home/Administrator/.ssh/id_rsa.pub.
The key fingerprint is:
e6:15:b6:91:05:ce:6a:31:41:ad:00:14:cd:86:1b:b6 Administrator@pc
The key's randomart image is:
+--[ RSA 2048]----+
| .+* .o.... |
| + = +.o |
| . = .o.B |
| E .= + |
| S o |
| + . |
| . |
| |
| |
+-----------------+
公钥生成在D:\cygwin\home\Administrator\.ssh,可以生成keys文件给s,也可以把里面的串给s,让s自己粘贴到自己的keys文件里。
Administrator@pc ~
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
在master里修改C:\WINDOWS\system32\drivers\etc\hosts,增加slave的机器名和ip映射
改完ssh的配置记得重启!net stop sshd
四 安装java
sudo apt-get install openjdk-6-jdk 不成功
chmod u+x jdk-6u26-linux-i586.bin
./jdk-6u26-linux-i586.bin
ln jdk-6u26-linux-i586.bin –s jdk
/etc/profile
export JAVA_HOME=/opt/jdk
export HADOOP_HOME=/opt/hadoop
export HBASE_HOME=/opt/hbase
export HADOOP_CONF_DIR=$HADOOP_HOME/conf
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/tools.jar
export PATH=.:$JAVA_HOME/bin:$JAVA_HOME/lib:$JAVA_HOME/jre/bin:$PATH:$HOME/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin
java –version不重启生效;jps要重启才生效
五 安装hadoop
把下载的hadoop-0.20.2.tar.gz,hbase-0.90.3.tar.gz,zookeeper-3.3.2.tar.gz放到/opt目录(或者/usr/local,这两个比较符合习惯),然后解压
六 配置hadoop
配置以下文件
hadoop-env.sh
core-site.xml
hdfs-site.xml
mapred-site.xml
hbase-env.sh
hbase-site.xml
zoo.cfg
<property>
<name>fs.default.name</name>
<value>hdfs://c1:54310</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/data/hadoop/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/data/hadoop/data</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>30</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>2047</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>c1:54311</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>7</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>7</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1024M</value>
</property>
<property>
<name>mapred.map.tasks.speculative.execution</name>
<value>false</value>
</property>
<property>
<name>mapred.reduce.tasks.speculative.execution</name>
<value>false</value>
</property>
<property>
<name>mapred.max.tracker.blacklists</name>
<value>200</value>
</property>
<property>
<name>mapred.max.tracker.failures</name>
<value>20</value>
</property>
<property>
<name>mapred.task.timeout</name>
<value>1800000</value>
</property>
hadoop-env.sh
export JAVA_HOME=/opt/jdk
export HADOOP_OPTS=-server
export HADOOP_HEAPSIZE=2048
执行
sudo update-alternatives --install /usr/bin/java java /opt/jdk/bin/java 300
sudo update-alternatives --install /usr/bin/javac javac /opt/jdk/bin/javac 300
hbase-env.sh
export JAVA_HOME=/opt/jdk
export HBASE_CLASSPATH=/opt/hadoop/conf
export HBASE_MANAGES_ZK=true
export HADOOP_OPTS=-server
export HADOOP_HEAPSIZE=2048
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://c1:54310/hbase</value>
<description>The directory shared by region servers.
</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>The replication count for HLog and HFile storage. Should not be greater than HDFS datanode count.
</description>
</property>
<property>
<name>hbase.regionserver.hlog.replication</name>
<value>2</value>
<description>For HBase to offer good data durability, we roll logs if
filesystem replication falls below a certain amount. In psuedo-distributed
mode, you normally only have the local filesystem or 1 HDFS DataNode, so you
don't want to roll logs constantly.</description>
</property>
<property>
<name>hbase.client.retries.number</name>
<value>777</value>
<description>Maximum retries. Used as maximum for all retryable
operations such as fetching of the root region from root region
server, getting a cell's value, starting a row update, etc.
Default: 10.
</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>c1</value>
<description>Comma separated list of servers in the ZooKeeper Quorum.
For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
By default this is set to localhost for local and pseudo-distributed modes
of operation. For a fully-distributed setup, this should be set to a full
list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
this is the list of servers which we will start/stop ZooKeeper on.
</description>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/zookeeper</value>
<description>Property from ZooKeeper's config zoo.cfg.
The directory where the snapshot is stored.
</description>
</property>
启动错误:
启动namenode
java.io.IOException: File /tmp/hadoop-hadoop/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)
把replicat由1改为2
java.io.IOException: Call to hadoop5/10.20.151.9:9000 failed on local exception: java.io.EOFException
hbase的lib里的hadoop的jar版本不对
java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:37)
解决:
把apache的configure包拷贝到hbase的lib目录即可
分享到:
相关推荐
ubuntu 搭建 Hadoop 单节点 Hadoop 是一个由 Apache 基金会所开发的分布式系统根底架构,用户可以在不了解分布式底层细节的情况下,开发分布式程序。Hadoop 实现了一个分布式文件系统(HDFS),简称 HDFS。HDFS 有...
### 使用虚拟机在Ubuntu上搭建Hadoop平台单机模式 #### 概述 本文旨在详细介绍如何在虚拟机环境下,利用Ubuntu系统搭建Hadoop平台的单机模式。通过本教程,您可以掌握从安装虚拟机到配置Hadoop环境的全过程。本文...
在本文中,我们将详细阐述如何在Ubuntu 16.04环境下搭建Hadoop集群。Hadoop是一个开源的分布式计算框架,它允许处理和存储大量数据,尤其适合大数据分析。Ubuntu是Linux发行版中的一个流行选择,其稳定性和丰富的...
基于Ubuntu16+Hadoop2.7.5+Eclipse Neon的云计算开发平台的搭建
在Ubuntu系统中搭建Hadoop是一项基础的分布式计算环境配置任务,这个过程涵盖了多个步骤,包括安装Linux操作系统、创建Hadoop用户组和用户、安装JDK、修改机器名、安装SSH服务以及建立SSH无密码登录,最后是安装...
### Ubuntu上搭建Hadoop2.x详细文档 #### 背景与动机 随着大数据时代的到来,数据处理的需求急剧增加,传统的数据库系统难以满足大规模数据处理的要求。因此,Apache Hadoop作为一个开源的大数据处理框架,凭借其...
在Ubuntu上搭建Hadoop是一个常见的任务,特别是在学习和实验分布式计算环境时。本文将详细讲解在Ubuntu 12.04上安装Hadoop伪分布式模式的过程,这将帮助你理解Hadoop的基础设置,并为日后扩展到完全分布式环境打下...
ubuntu18.04或者centos7搭建hadoop集群,可以参考博客https://blog.csdn.net/u013305783/article/details/83744122
### Ubuntu 下搭建 Hadoop Hadoop 是一个能够对大量数据进行分布式处理的软件框架,它在大数据处理领域占据着重要地位。本文将详细介绍如何在 Ubuntu 操作系统下搭建 Hadoop 的单机版与伪分布式环境。 #### 2.1 ...
小白搭建hadoop完全分布式环境笔记,步骤详细,原创
本教程是根据个人在UBUNTU虚拟机上安装配置Hadoop2.7.3的实际操作步骤一步步记录下来的,大部分指令操作的目的都加了注释以方便理解。(本教程很详细,如果还是遇到问题可以直接咨询楼主,不会让你的积分百花的)
在Ubuntu下利用Hadoop配置与搭建分布式系统。 全部是自己搜网上资料, 自己摸索写的。 内容详细!
一个描述环境搭建的文档,我用的Vmware10和Ubuntu14.04,Hadoop2.7
本文档详细介绍了如何在Ubuntu20.04操作系统上搭建Hadoop2.10.0集群,包括环境配置、软件安装、网络设置、集群启动等多个方面。 ##### 1. 服务器环境 集群包含三台服务器: - **us1**:`192.168.94.136` - **us2*...
### Ubuntu上运行Hadoop WordCount实例详解 #### 一、环境搭建与配置 在Ubuntu系统上部署并运行Hadoop WordCount实例,首先需要确保已经安装了Hadoop环境,并且版本为hadoop-0.20.2。此版本较旧,主要用于教学或...
本文主要介绍了在Ubuntu系统上Hadoop单机版测试环境的搭建过程。