论坛首页 综合技术论坛

【2】Hadoop 配置

浏览 3506 次
精华帖 (0) :: 良好帖 (0) :: 新手帖 (1) :: 隐藏帖 (0)
作者 正文
   发表时间:2009-09-26   最后修改:2010-11-10
Hadoop Configuration
新增hadoopuser用户
[root@noc rou]# adduser
bash: adduser: command not found
[root@noc rou]# cd /usr/bin/
[root@noc bin]# ln -s /usr/sbin/adduser adduser
[root@noc bin]# adduser hadoopuser

passwd wpsop
修改系统允许打开的文件数
有时候在程序里面需要打开多个文件,进行分析,系统一般默认数量是1024,(用ulimit -n可以看到)对于正常使用是够了,但是对于程序来讲,就太少了。
修改办法:
重启就OK
修改2个文件。
1)/etc/security/limits.conf
vi /etc/security/limits.conf
加上:
* soft nofile 8192
* hard nofile 20480

2)./etc/pam.d/login
session    required     /lib/security/pam_limits.so
注意:要重启才能生效(也就是把putty关了再打开)
创建mysql用户kwps和密码kwps
grant all privileges on *.* to 'kwps'@'%' identified by 'kwps' ;
flush privileges ;
简化输入
sudo -s                            切换到root
vi /usr/bin/wpsop                  新建
#! /bin/bash
ssh s$1-opdev-wps.rdev.kingsoft.net -l hadoopuser            指定用户wpsop
更改hosts
1) sudo vi /etc/hosts
2) sudo vi /etc/sysconfig/network
3) hostname -v newhostname
SSH免密码公钥认证
1) mkdir .ssh
2) cd .ssh
sudo chmod 700 . //这一步很重要
3) ssh-keygen -t rsa
4) cat rsa_d.pub >> authorized_keys
当然也可以: cp rsa_d.pub  authorized_keys
使用 scp向其他服务器发送,注意不要覆盖原有的文件!!
5) chmod 644 authorized_keys //这一步很重要
注意:要保证所有的结点间(包括自连接)都是免密码ssh连接的

解压Hadoop-0.19.1
tar -xvf  Hadoop-0.19.1
Hadoop配置
Hadoop下载地址
http://apache.etoak.com/hadoop/core/
http://hadoop.apache.org/common/releases.html
本机环境:
版本:Hadoop-0.191
操作系统:CentOS
五台服务器:
S2 (namenode)
S5 (secondarynamenode datanode)
S6 (datanode)
S7 (datanode)
S8 (datanode)
S9 (datanode)

***/home/wps/hadoop-0.19.1/conf***
修改masters:
s5
修改slaves:
s5
s6
s7
s8
s9
修改log4j.propperties
hadoop.log.dir=/data/hadoop-0.19.1/logs
修改hadoop-env.sh
export JAVA_HOME=/opt/JDK-1.6.0.14
export HADOOP_HEAPSIZE=4000
修改hadoop-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>
<name>fs.default.name</name>
<value>hdfs://s2-opdev-wps.rdev.kingsoft.net:9000/</value>
<description>The name of the default file system. Either the literal string "local" or a host:port for DFS.</description>
</property>

<property>
<name>mapred.job.tracker</name>
<value>s2-opdev-wps.rdev.kingsoft.net:9001</value>
<description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task.</description>
</property>

<property>
<name>dfs.name.dir</name>
<value>/data/hadoop-0.19.1/name</value>
<description>Determines where on the local filesystem the DFS name node should store the name table. If this is a comma-delimited list of directories then the name table is

replicated in all of the directories, for redundancy. </description>
</property>

<property>
<name>dfs.data.dir</name>
<value>/data/hadoop-0.19.1/dfsdata</value>
<description>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in

all named directories, typically on different devices. Directories that do not exist are ignored.</description>
</property>

<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop-0.19.1/tmp</value>
<description>A base for other temporary directories.</description>
</property>

<property>
<name>dfs.replication</name>
<value>3</value>
<description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in

create time.</description>
</property>



<property>
  <name>fs.checkpoint.dir</name>
  <value>/data/hadoop-0.19.1/namesecondary</value>
  <description>Determines where on the local filesystem the DFS secondary
      name node should store the temporary images to merge.
      If this is a comma-delimited list of directories then the image is
      replicated in all of the directories for redundancy.
  </description>
</property>

<property>
  <name>dfs.http.address</name>
  <value>s2-opdev-wps.rdev.kingsoft.net:50070</value>
  <description>
    The address and the base port where the dfs namenode web ui will listen on.
    If the port is 0 then the server will start on a free port.
  </description>
</property>



<property>
  <name>mapred.map.tasks</name>
  <value>50</value>
  <description>The default number of map tasks per job.  Typically set
  to a prime several times greater than number of available hosts.
  Ignored when mapred.job.tracker is "local". 
  </description>
</property>

<property>
  <name>mapred.reduce.tasks</name>
  <value>7</value>
  <description>The default number of reduce tasks per job.  Typically set
  to a prime close to the number of available hosts.  Ignored when
  mapred.job.tracker is "local".
  </description>
</property>
启动hadoop
bin/hadoop namenode  —format
&& Do not format a running Hadoop namenode ,this will cause all your data in the HDFS filesystem to be erased. &&
bin/start-all.sh
bin/stop-all.sh
查看文件目录:
bin/hadoop fs -ls /

查看数据块:
/home/wpsop/hadoop-0.19.1/running/dfsdata/current
Bin/hadoop fs -ls /data/user/hiveware
论坛首页 综合技术版

跳转论坛:
Global site tag (gtag.js) - Google Analytics