`
fally
  • 浏览: 18174 次
  • 性别: Icon_minigender_1
  • 来自: 济南
社区版块
存档分类
最新评论

Hadoop HBase Hive伪分布式环境搭建

阅读更多

Hadoop HBase Hive

启动:

$HADOOP_HOME/bin/start-all.sh

$HBASE_HOME/bin/start-hbase.sh

$HIVE_HOME/bin/hive start

环境配置

1JDK安装

2SSH配置

3、环境变量

/etc/profile

    export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_65

export JRE_HOME=${JAVA_HOME}/jre

export CLASSPATH=.:${JAVA_HOME}/lib/dt.jar:$JAVA_HOME/lib/tools.jar:${JRE_HOME}/lib/rt.jar

export HADOOP_HOME=/usr/local/hadoop

export CLASSPATH=.:$CLASSPATH:$HADOOP_HOME/lib

export HBASE_HOME=/usr/local/hbase

export HIVE_HOME=/usr/local/hive

export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib

export PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin:${HADOOP_HOME}/bin:${HBASE_HOME}/bin:$HIVE_HOME/bin:$PATH

Hadoop   

Hadoop版本:1.1.2

目的

This document describes how to set up and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).

Prerequisites

Required Software

Required software for Linux and Windows include:

1.    JavaTM 1.6.x, preferably from Sun, must be installed.

2.    ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.

Additional requirements for Windows include:

1.   Cygwin - Required for shell support in addition to the required software above.

Installing Software

If your cluster doesn't have the requisite software you will need to install it.

For example on Ubuntu Linux:

$ sudo apt-get install ssh 
$ sudo apt-get install rsync

On Windows, if you did not install the required software when you installed cygwin, start the cygwin installer and select the packages:

·  openssh - the Net category

下载

To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors.

Prepare to Start the Hadoop Cluster

Unpack the downloaded Hadoop distribution. In the distribution, edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.

export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_65

Try the following command:
$ bin/hadoop 
This will display the usage documentation for the hadoop script.

 

伪分布式Pseudo-Distributed Operation

Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

配置Configuration

Use the following: 

hostname myhadoop

vi /etc/hostname

myhadoop

vi /etc/hosts

ip myhadoop

conf/core-site.xml:

<configuration>
     <property>
         <name>fs.default.name</name>
         <value>hdfs://myhadoop:9000</value>
     </property>
</configuration>

conf/hdfs-site.xml:

<configuration>
     <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>
    <property>
        <name>dfs.name.dir</name>
        <value>/usr/local/hadoop/hadoopdata/dfsname</value>
    </property>
    <property>
        <name>dfs.data.dir</name>                                    
        <value>/usr/local/hadoop/hadoopdata/dfsdata</value>
    </property>
</configuration>

conf/mapred-site.xml:

<configuration>
     <property>
         <name>mapred.job.tracker</name>
         <value>myhadoop:9001</value>
     </property>
</configuration>

Setup passphraseless ssh

Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

执行Execution

Format a new distributed-filesystem:
$ bin/hadoop namenode -format

Start the hadoop daemons:
$ bin/start-all.sh

The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs).

Browse the web interface for the NameNode and the JobTracker; by default they are available at:

<!--[if !supportLists]-->·         <!--[endif]-->NameNode - http://localhost:50070/

<!--[if !supportLists]-->·         <!--[endif]-->JobTracker - http://localhost:50030/

Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input

Run some of the examples provided:
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

Examine the output files:

Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output 
$ cat output/*

or

View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*

When you're done, stop the daemons with:
$ bin/stop-all.sh

 

HBase

HBase版本:0.94.7

hbase-site.xml

  <property>

    <name>hbase.rootdir</name>

    <value>hdfs://myhadoop:9000/hbase</value>

  </property>

  <property>

    <name>hbase.cluster.distributed</name>

    <value>true</value>

  </property>

  <property>

    <name>hbase.zookeeper.quorum</name>

    <value>myhadoop</value>

  </property>

bin/start-hbase.sh

 

Hive

一、hive配置

 cp hive-default.xml.template hive-site.xml

 cp hive-log4j.properties.template hive-log4j.properties

 cp hive-env.sh.template hive-env.sh

 

二、修改hive-env.sh

配置HADOOP_HOME路径。

三、修改hive-site.xml配置文件,把Hive的元数据存储到MySQL

<property>

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://myhadoop:3306/hive_metadata?createDatabaseIfNotExist=true</value>

</property>

<property>

<name>javax.jdo.option.ConnectionDriverName</name>

<value>com.mysql.jdbc.Driver</value>

</property>

<property>

<name>javax.jdo.option.ConnectionUserName</name>

<value>root</value>

</property>

<property>

<name>javax.jdo.option.ConnectionPassword</name>

<value>root</value>

</property>

<property>

<name>hive.metastore.warehouse.dir</name>

<value>/user/hive/warehouse</value>

</property>

 

<property>

    <name>hive.aux.jars.path</name>

<value>file:///usr/local/hive/lib/hive-hbase-handler-0.9.0.jar,file:///usr/local/hive/lib/hbase-0.94.7-security.jar,file:///usr/local/hive/lib/protobuf-java-2.4.0a.jar,file:///usr/local/hive/lib/zookeeper-3.4.5.jar</value>

</property>

删除/usr/local/hive/lib下的hbase-0.92.0.jarhbase-0.92.0-tests.jarzookeeper-3.4.3.jar

 

hbase拷贝hbase-0.94.7-security.jarzookeeper-3.4.5.jarprotobuf-java-2.4.0a.jarhive/lib下。

三、修改hive-log4j.properties

#log4j.appender.EventCounter=org.apache.hadoop.metrics.jvm.EventCounter

log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter

 

四、在hdfs上面,创建目录

$HADOOP_HOME/bin/hadoop fs -mkdrr /tmp

$HADOOP_HOME/bin/hadoop fs –mkdir /user/hive/warehouse

$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp

$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse

 

五、手动上传mysqljdbc库到hive/lib

~ ls /home/cos/toolkit/hive-0.9.0/lib

mysql-connector-java-5.1.22-bin.jar

六、启动hive

方式1

bin/hive start

方式2

#启动metastore服务

~ bin/hive --service metastore &

Starting Hive Metastore Server

 

#启动hiveserver服务

~ bin/hive --service hiveserver &

Starting Hive Thrift Server

#启动hive客户端

~ bin/hive shell

Logging initialized using configuration in file:/root/hive-0.9.0/conf/hive-log4j.properties

Hive history file=/tmp/root/hive_job_log_root_201211141845_1864939641.txt

 

hive> show tables

OK

 

 

 

Hive函数、复杂类型访问操作

hive提供了复合数据类型:

Structs structs内部的数据可以通过DOT.)来存取,例如,表中一列c的类型为STRUCT{a INT; b INT},我们可以通过c.a来访问域a

MapsK-V对):访问指定域可以通过["指定域名称"]进行,例如,一个Map M包含了一个group-gidkv对,gid的值可以通过M['group']来获取

Arraysarray中的数据为相同类型,例如,假如array A中元素['a','b','c'],则A[1]的值为'b'

 

Array

建表:

create table class_test(name string, student_id_list array<INT>)  ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY ':';

导数据:

vim test.txt

034,1:2:3:4

035,5:6

036,7:8:9:10

 

LOAD DATA LOCAL INPATH '/opt/test.txt' INTO TABLE class_test ;

查询:

select student_id_list[3] from class_test;

 

Map

建表:

create table employee(id string, perf map<string, int>) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY ',' MAP KEYS TERMINATED BY ':';

导数据:

vim test2.txt  

1   job:80,team:60,person:70

2   job:60,team:80

3   job:90,team:70,person:100

 

LOAD DATA LOCAL INPATH '/opt/test2.txt' INTO TABLE employee;

查询:

select perf['person'] from employee; 

select perf['person'] from employee where perf['person'] is not null;

 

Struct使用

建表:

create table student_test(id INT, info struct<name:STRING, age:INT>) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY ':'; 

'FIELDS TERMINATED BY' :字段与字段之间的分隔符

'COLLECTION ITEMS TERMINATED BY' :一个字段各个item的分隔符

导入数据:

cat test3.txt  

1,zhou:30 

2,yan:30 

3,chen:20 

4,li:80 

LOAD DATA LOCAL INPATH '/opt/test3.txt' INTO TABLE student_test; 

查询:

select info.age from student_test;

 

 

查询每年9.30-10.07十一期间的身份证号:

select substr(rzsj,1,4) as year, sfzmhm from jnlk where substr(rzsj,6,5)>='09-30' and substr(rzsj,6,5)<='10-07' and rzsj is not null order by year,sfzmhm;

练习

Hbase创建表:

create 'blog','article','author'

 

插入hbase数据:

put 'blog','1','article:title','Head First HBase'

put 'blog','1','article:content','HBase is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data.'

put 'blog','1','article:tags','Hadoop,HBase,NoSQL'

put 'blog','1','author:name','hujinjun'

put 'blog','1','author:nickname','一叶渡江'

 

put 'blog','10','article:tags','Hadoop'

put 'blog','10','author:nickname','heyun'

 

 

put 'blog','100','article:tags','hbase,nosql'

put 'blog','100','author:nickname','shenxiu'

 

hive:

CREATE EXTERNAL TABLE blog(key int,title string,content string,tags string,name string,nickname string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = "article:title,article:content,article:tags,author:name,author:nickname") TBLPROPERTIES("hbase.table.name" = "blog");

 

 

hive> create table wyp (id int, name string, age int, tel string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;

 

vim /opt/wyp.txt

1   wyp 25  13188888888888

2   test   30  13888888888888

3   zs  34  899314121

 

hive> load data local inpath '/opt/wyp.txt' into table wyp;

 

vim /opt/add.txt

5   wyp1   23  131212121212

6   wyp2   24  134535353535

7   wyp3   25  132453535353

8   wyp4   26  154243434355

 

$HADOOP_HOME/bin/hadoop fs -mkdir /wyp

$HADOOP_HOME/bin/hadoop fs -copyFromLocal /opt/add.txt /wyp/add.txt

 

hive> load data inpath '/wyp/add.txt' into table wyp;

 

hive> select * from wyp;

 

 

FAQ

 

hive hwi 启动错误

错误日志:

INFO hwi.HWIServer: HWI is starting up

WARN conf.HiveConf: DEPRECATED: Ignoring hive-default.xml found on the CLASSPATH at /usr/local/hive/conf/hive-default.xml

FATAL hwi.HWIServer: HWI WAR file not found at /usr/local/hive/usr/local/hive/lib/hive-hwi-0.9.0.war

解决方法:

这样的错误解决办法很简单,hive-site.xml添加:

<property>

  <name>hive.hwi.war.file</name>

  <value>lib/hive-hwi-0.9.0.war</value>

  <description>This sets the path to the HWI war file, relative to ${HIVE_HOME}. </description>

</property>

否则路径错误!

 

 

分享到:
评论

相关推荐

    hadoop hbase hive 伪分布安装

    标题和描述均提到了“hadoop hbase hive 伪分布安装”,这涉及到在单台机器上模拟分布式环境来安装和配置Hadoop、HBase和Hive。以下将详细阐述这一过程中的关键步骤和相关知识点。 ### 1. Hadoop安装与配置 - **...

    Hadoop hbase hive sqoop集群环境安装配置及使用文档

    该文档将分为四部分:Hadoop 集群环境搭建、HBase 集群环境搭建、Hive 集群环境搭建和 Sqoop 集成使用。 一、Hadoop 集群环境搭建 1.1 JDK 安装与配置 在开始搭建 Hadoop 集群环境前,我们需要先安装并配置 JDK。...

    大数据Hadoop+HBase+Spark+Hive集群搭建教程(七月在线)1

    在构建大数据处理环境时,Hadoop、HBase、Spark和Hive是四个核心组件,它们协同工作以实现高效的数据存储、处理和分析。本教程将详细介绍如何在Ubuntu系统上搭建这些组件的集群。 1. **Hadoop**:Hadoop是Apache...

    HDP3.1.5源码下载—hadoop hbase hive

    标题中的“HDP3.1.5源码下载—hadoop hbase hive”指的是Hortonworks Data Platform(HDP)的3.1.5版本,它是一个全面的大数据解决方案,包含了对Hadoop、HBase和Hive等组件的源代码支持。这个版本是大数据开发者和...

    hadoop,hbase,hive版本整合兼容性最全,最详细说明【适用于任何版本】

    由于ZooKeeper是独立的分布式协调服务,并不直接依赖于Hadoop或Hive,因此HBase与ZooKeeper的兼容性主要取决于ZooKeeper的版本是否满足HBase运行的要求。用户需要查看HBase官方的部署指南,以确定推荐的ZooKeeper...

    hadoop+hbase+hive集群搭建

    Java运行环境(JDK)是Hadoop、HBase和Hive运行的必要条件。文中提到的JDK版本为1.6,虽然现在可能更推荐使用更高版本的JDK,但1.6在当时是广泛支持的稳定版本。配置`.bash_profile`文件中的环境变量,如`JAVA_HOME`...

    Hadoop+Hbase+Spark+Hive搭建

    在本文档中,我们详细地介绍了Hadoop+Hbase+Spark+Hive的搭建过程,包括环境准备、主机设置、防火墙设置、环境变量设置、Hadoop安装、Hbase安装、Spark安装和Hive安装。本文档旨在指导读者从零开始搭建Hadoop+Hbase+...

    伪分布式的Hadoop+Hive+HBase搭建记录[收集].pdf

    总结来说,伪分布式Hadoop搭建涉及了HDFS、MapReduce、Hive、HBase和Storm等多个组件,这些技术一起构建了一个完整的数据处理生态系统。HDFS提供存储,MapReduce负责计算,Hive提供数据分析的SQL接口,HBase满足实时...

    新手指导hadoop、hbase、hive版本对应关系查找表

    首先,Hadoop是一个开源框架,它允许使用简单的编程模型在跨计算机集群的分布式环境中存储和处理大数据。它主要由HDFS(Hadoop Distributed File System)和MapReduce两部分构成。HBase是建立在Hadoop之上的一个开源...

    Hadoop+Zookeeper+Hbase+Hive部署.doc

    大数据平台搭建之 ...大数据平台搭建需要经过多个步骤,包括环境准备、Hadoop 安装和配置、Zookeeper 部署、Hbase 部署和 Hive 部署。通过本文档,我们可以了解大数据平台搭建的整个过程,并掌握相关的技术和经验。

    Hadoop分布式搭建配置/Hive/HBase

    本文将围绕“Hadoop分布式搭建配置/Hive/HBase”这一主题,深入探讨Hadoop生态系统中的关键组件,并结合提供的书籍资源进行讲解。 首先,Hadoop是一个开源的分布式计算框架,它允许在大规模集群上处理和存储大量...

    zookeeper+hadoop+hbase+hive(集成hbase)安装部署教程(超详细).docx

    jdk1.8.0_131、apache-zookeeper-3.8.0、hadoop-3.3.2、hbase-2.4.12 mysql5.7.38、mysql jdbc驱动mysql-connector-java-8.0.8-dmr-bin.jar、 apache-hive-3.1.3 2.本文软件均安装在自建的目录/export/server/下 ...

    hadoop,hbase,hive版本整合兼容性最全,最详细说明【适用于任何版本】 -

    在大数据处理领域,Hadoop、HBase和Hive是三个重要的组件,它们分别扮演着不同的角色。Hadoop作为分布式计算框架,提供了数据存储和计算的能力;HBase是一个基于Hadoop的分布式NoSQL数据库,适用于实时读写大数据;...

    Hbase 高可用分布式搭建

    总结,HBase的高可用分布式搭建涉及到Hadoop环境的配置、HBase组件的优化以及与Hadoop生态中其他工具的协同。通过合理的配置和管理,可以构建出稳定、可靠的HBase集群,满足大数据场景下的高性能存储和访问需求。

    大数据hadoop分布式集群搭建(Hadoop、hbase、hive、mysql、zookeeper、Kafka、flume)

    自己整理的Hadoop环境的一些安装,和一些简单的使用,其中包括Hadoop、hbase、hive、mysql、zookeeper、Kafka、flume。都是一些简单的安装步骤和使用,只在自己的虚拟机(Linux centOS7)上使用测试过。按照步骤一步...

    hadoop,hive,hbase学习资料

    【标签】:“hadoop”、“hive”、“hbase”这三个标签明确了资料的主题,Hadoop是分布式计算框架,Hive是基于Hadoop的数据仓库工具,而Hbase则是Hadoop生态系统中的一个NoSQL数据库,用于存储和查询大规模数据集。...

    Hadoop2.2+Zookeeper3.4.5+HBase0.96集群环境搭建

    Hadoop2.2+Zookeeper3.4.5+HBase0.96集群环境搭建 Hadoop2.2+Zookeeper3.4.5+HBase0.96集群环境搭建是大数据处理和存储的重要组件,本文档将指导用户从零开始搭建一个完整的Hadoop2.2+Zookeeper3.4.5+HBase0.96集群...

Global site tag (gtag.js) - Google Analytics