Eclipse+JDBC远程操作Hive0.13

qindongliang1922

浏览: 2198392 次
性别:
来自: 北京

最近访客更多访客>>

北风norther

godandghost

youhere

tanss

博主相关

博客

微博

相册

留言

关于我

博客专栏

: 证道Lucene4
浏览量：117955

: 证道Hadoop
浏览量：126340

: 证道shell编程
浏览量：60313

: ELK修真
浏览量：71651

文章分类

社区版块

存档分类

博客分类：

Hive

jdbc eclipse mapreduce hive

在前几篇的博客里，散仙已经写了如何在Liunx上安装Hive以及如何与Hadoop集成和将Hive的元数据存储到MySQL里，今天散仙就来看下，如何在Eclipse里通过JDBC的方式操作Hive.

我们都知道Hive是一个类SQL的框架，支持HSQL语法操作Hive，而Hive内部，会转成一个个MapReduce作业来完成具体的数据统计，虽然我们可以直接在Hive的shell里，向Hive发起命令，但这样做受限制比较多，如果我们能把它的操作结合在编程里，这样以来我们的Hive就会变得非常灵活了。

Hive是支持JDBC操作的，所以我们就可以像操作MySQL一样，在JAVA代码里，操作Hive，进行数据统计。

下面详细看下，操作步骤：
软件环境
序号说明系统
1 centos6.5安装hadoop2.2.0 linux
2 centos6.5安装Hive0.13 linux
3 Eclipse4.2 Windows7

序号步骤说明
1 hadoop2.2.0安装，启动 Hive依赖Hadoop环境
2 hive安装类SQL方式操作MapReduce
3 启动hiveserver2 远程操作Hive的服务端程序
4 在win上新建一个java项目，并导入Hive所需jar包远程操作必需步骤
5 在eclipse里编码，测试测试连接hive是否成功
6 在hiveserver2端查看检查是否对接成功和任务打印日志
7 在hadoop的8088端口上查看MR执行任务查看MR执行调度

一些HIVE操作语句：

导入数据到一个表中：
LOAD DATA LOCAL INPATH '/home/search/abc1.txt' OVERWRITE INTO TABLE info;

show tables;//显示当前的所有的表
desc talbeName;查看当前表的字段结构
show databases;//查看所有的已有的数据库
建表语句
create table mytt (name string ,count int) row format delimited fields terminated by '#' stored as textfile ;

jar包，截图

Hive依赖Hadoop，因此客户端最好把hadoop的jar包夜引入项目中，下面是调用源码，运行前，确定你在服务端的hiversver2已经开启。

package com.test;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;

import org.apache.hadoop.conf.Configuration;

 
/**
 * 在Win7上，使用JDBC操作Hive
 * @author qindongliang
 * 
 * 大数据技术交流群：376932160
 * **/
public class HiveJDBClient {
	
	/**Hive的驱动字符串*/
	private static String driver="org.apache.hive.jdbc.HiveDriver";
	
	
	
	public static void main(String[] args) throws Exception{
		//加载Hive驱动
		Class.forName(driver);
		//获取hive2的jdbc连接，注意默认的数据库是default
		Connection conn=DriverManager.getConnection("jdbc:hive2://192.168.46.32/default", "search", "dongliang");
	    Statement st=conn.createStatement();
	    String tableName="mytt";//表名
	    ResultSet rs=st.executeQuery("select  avg(count) from "+tableName+" ");//求平均数,会转成MapReduce作业运行
	    //ResultSet rs=st.executeQuery("select  * from "+tableName+" ");//查询所有,直接运行
	    while(rs.next()){
	    	System.out.println(rs.getString(1)+"   ");
	    }
	    System.out.println("成功!");
	    st.close();
	    conn.close();
		
	}
	
	
	

}

结果如下：

48.6   
成功!

Hive的hiveserver2 端log打印日志：

[search@h1 bin]$ ./hiveserver2 
Starting HiveServer2
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/08/05 04:00:02 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
14/08/05 04:00:02 WARN conf.HiveConf: DEPRECATED: Configuration property hive.metastore.local no longer has any effect. Make sure to provide a valid value for hive.metastore.uris if you are connecting to a remote metastore.
OK
OK
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1407179651448_0001, Tracking URL = http://h1:8088/proxy/application_1407179651448_0001/
Kill Command = /home/search/hadoop/bin/hadoop job  -kill job_1407179651448_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2014-08-05 04:03:49,951 Stage-1 map = 0%,  reduce = 0%
2014-08-05 04:04:19,118 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.74 sec
2014-08-05 04:04:30,860 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.7 sec
MapReduce Total cumulative CPU time: 3 seconds 700 msec
Ended Job = job_1407179651448_0001
MapReduce Jobs Launched: 
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 3.7 sec   HDFS Read: 253 HDFS Write: 5 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 700 msec
OK

hadoop的8088界面截图如下：

下面这条SQL语句，不会转成MapReduce执行，select * from mytt limit 3；
结果如下：

 
中国   
美国   
中国   
成功!

至此，我们的JDBC调用Hive已经成功运行，我们可以在客户端执行，一些建表，建库，查询等操作，但是有一点需要注意的是，如果在win上对Hive的表，执行数据导入表的操作，那么一定确保你的数据是在linux上的，导入的路径也是linux路径，不能直接把win下面的数据，给导入到linux上的hive表里面。

查看图片附件

0
顶

0
踩

分享到：

跟散仙学shell命令（六） | 跟散仙学shell命令（五）

2014-08-04 20:45
浏览 2029
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论