`
LJ你是唯一LT
  • 浏览: 243332 次
社区版块
存档分类
最新评论

使用eclipse远程连接hive

 
阅读更多

基础环境:
namenode 192.168.1.187  kafka3
datanode 192.168.1.188  kafka4
datanode 192.168.1.189  kafka1

这个集群是自己下的hadoop-*.tar.gz包逐个服务安装的,因此配置文件都需要手动修改,相对cloudera manager的要复杂一些。

hadoop 2.6.2
hive 2.0.1   --只安装在了187上面

1.启动hadoop
./start-all.sh

2.配置hive
[root@kafka3 conf]# cat hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
    <description>location of default database for the warehouse</description>
</property>
<property>
    <name>hive.querylog.location</name>
    <value>/hadoop/hive/log</value>
    <description>Location of Hive run time structured log file</description>
</property>
  <property> 
   <name>mapred.job.tracker</name> 
   <value>http://192.168.1.187:9001</value> 
  </property> 
  <property> 
   <name>mapreduce.framework.name</name> 
   <value>yarn</value> 
  </property> 
  <property>
   <name>hive.server2.thrift.port</name>
   <value>10000</value>
  </property>
  <property>
   <name>hive.server2.thrift.bind.host</name>
   <value>192.168.1.187</value>
  </property>
<property>
<property>
  <name>hive.server2.enable.doAs</name>
  <value>true</value>
</property>
   <name>hive.hwi.listen.port </name>
   <value>9999</value>
   <description>This is the port the Hive Web Interface will listen on </description>
</property>
<property>
   <name>datanucleus.autoCreateSchema </name>
   <value>false</value>
</property>
<property>
   <name>datanucleus.fixedDatastore </name>
   <value>true</value>
</property>
<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://192.168.1.189:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>
<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>root</value>
</property>
<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>root</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>kafka1,kafka4,kafka3</value>
</property>
</configuration>

3.启动hiveserver服务
[root@kafka3 bin]# ./hiveserver2
命令行模式:
hive --service hiveserver2

服务模式:
./hiveserver2

4.测试连接:
不用写jdbc程序,运行 bin/beeline.sh

[root@kafka3 bin]# ./beeline
ls: 无法访问/opt/apache-hive-2.0.1-bin//lib/hive-jdbc-*-standalone.jar: 没有那个文件或目录
Beeline version 2.0.1 by Apache Hive
beeline> !connect jdbc:hive://192.168.1.187:10000 root root         
scan complete in 1ms
scan complete in 7577ms
No known driver to handle "jdbc:hive://192.168.1.187:10000"  ---不用hive,改用hive2
beeline>

找到了这个包
注意要讲hive的lib下的所有jar包都放到eclipse里面去

[root@kafka3 bin]# cp /opt/apache-hive-2.0.1-bin/jdbc/hive-jdbc-2.0.1-standalone.jar  /opt/apache-hive-2.0.1-bin/lib/

beeline> !connect jdbc:hive2://192.168.1.187:10000
Connecting to jdbc:hive2://192.168.1.187:10000
Enter username for jdbc:hive2://192.168.1.187:10000: root
Enter password for jdbc:hive2://192.168.1.187:10000: root

beeline>  !connect jdbc:hive2://192.168.1.187:10000
Connecting to jdbc:hive2://192.168.1.187:10000
Enter username for jdbc:hive2://192.168.1.187:10000: root
Enter password for jdbc:hive2://192.168.1.187:10000:                                                   
Enter password for jdbc:hive2://192.168.1.187:10000: Error: Failed to open new session: java.lang.RuntimeException:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
User: root is not allowed to impersonate root (state=,code=0)        

重启hadoop后,还是不行,但是报错内容换了。
在hadoop的core-site.xml中添加内容:
<property>
        <name>hadoop.proxyuser.hadoop.hosts</name>       --刚开始这里写错了一直不知道!不是hadoop用户,我是用的root用户                                        
        <value>*</value>
    </property>
    <property>
            <name>hadoop.proxyuser.hadoop.groups</name>
            <value>root</value>
    </property>

正确的是:
在hadoop的core-site.xml中添加内容:
<property>
        <name>hadoop.proxyuser.root.hosts</name>                                    
        <value>*</value>
    </property>
    <property>
            <name>hadoop.proxyuser.root.groups</name>
            <value>root</value>
    </property>

beeline>  !connect jdbc:hive2://192.168.1.187:10000
Connecting to jdbc:hive2://192.168.1.187:10000
Enter username for jdbc:hive2://192.168.1.187:10000: root
Enter password for jdbc:hive2://192.168.1.187:10000:                                                   
16/06/02 11:22:00 [main]: INFO jdbc.HiveConnection: Transport Used for JDBC connection: null
Error: Could not open client transport with JDBC Uri: jdbc:hive2://192.168.1.187:10000: java.net.ConnectException: 拒绝连接 (state=08S01,code=0)


把$HIVE_HOME/lib下的所有hive开头的jar包都拷贝过去
[root@kafka3 bin]# ./beeline
Beeline version 2.0.1 by Apache Hive   --报错没有了
beeline>

开启hive的log
cd /opt/apache-hive-2.0.1-bin/conf
cp hive-log4j2.properties.template hive-log4j2.properties
vi hive-log4j2.properties
property.hive.log.dir = /hadoop/hive/log
property.hive.log.file = hive.log

[root@kafka3 log]# more hive.log

2016-06-03T10:20:16,883 INFO  [main]: service.AbstractService (AbstractService.java:init(89)) - Service:OperationManager is inited.
2016-06-03T10:20:16,884 INFO  [main]: service.AbstractService (AbstractService.java:init(89)) - Service:SessionManager is inited.
2016-06-03T10:20:16,884 INFO  [main]: service.AbstractService (AbstractService.java:init(89)) - Service:CLIService is inited.
2016-06-03T10:20:16,884 INFO  [main]: service.AbstractService (AbstractService.java:init(89)) - Service:ThriftBinaryCLIService is inited.
2016-06-03T10:20:16,884 INFO  [main]: service.AbstractService (AbstractService.java:init(89)) - Service:HiveServer2 is inited.
2016-06-03T10:20:17,022 INFO  [main]: service.AbstractService (AbstractService.java:start(104)) - Service:OperationManager is started.
2016-06-03T10:20:17,022 INFO  [main]: service.AbstractService (AbstractService.java:start(104)) - Service:SessionManager is started.
2016-06-03T10:20:17,023 INFO  [main]: service.AbstractService (AbstractService.java:start(104)) - Service:CLIService is started.
2016-06-03T10:20:17,023 INFO  [main]: service.AbstractService (AbstractService.java:start(104)) - Service:ThriftBinaryCLIService is started.
2016-06-03T10:20:17,023 INFO  [main]: service.AbstractService (AbstractService.java:start(104)) - Service:HiveServer2 is started.
2016-06-03T10:20:17,038 INFO  [main]: server.Server (Server.java:doStart(252)) - jetty-7.6.0.v20120127
2016-06-03T10:20:17,064 INFO  [main]: webapp.WebInfConfiguration (WebInfConfiguration.java:unpack(455)) - Extract jar:file:/opt/apache-hive-2.0.1-bin/lib/hive-jdbc-2.
0.1-standalone.jar!/hive-webapps/hiveserver2/ to /tmp/jetty-0.0.0.0-10002-hiveserver2-_-any-/webapp
2016-06-03T10:20:17,582 INFO  [Thread-10]: thrift.ThriftCLIService (ThriftBinaryCLIService.java:run(100)) - Starting ThriftBinaryCLIService on port 10000 with 5...500
worker threads

2016-06-03T10:20:17,023 INFO  [main]: service.AbstractService (AbstractService.java:start(104)) - Service:ThriftBinaryCLIService is started.
2016-06-03T10:20:17,023 INFO  [main]: service.AbstractService (AbstractService.java:start(104)) - Service:HiveServer2 is started.
2016-06-03T10:20:17,038 INFO  [main]: server.Server (Server.java:doStart(252)) - jetty-7.6.0.v20120127
2016-06-03T10:20:17,064 INFO  [main]: webapp.WebInfConfiguration (WebInfConfiguration.java:unpack(455)) - Extract jar:file:/opt/apache-hive-2.0.1-bin/lib/hive-jdbc-2.0.1-standalone.jar!/hive-webapps/hiveserver2/ to /tmp/jetty-0.0.0.0-10002-hiveserver2-_-any-/webapp
2016-06-03T10:20:17,582 INFO  [Thread-10]: thrift.ThriftCLIService (ThriftBinaryCLIService.java:run(100)) - Starting ThriftBinaryCLIService on port 10000 with 5...500 worker threads
2016-06-03T10:20:17,781 INFO  [main]: handler.ContextHandler (ContextHandler.java:startContext(737)) - started o.e.j.w.WebAppContext{/,file:/tmp/jetty-0.0.0.0-10002-hiveserver2-_-any-/webapp/},jar:file:/opt/apache-hive-2.0.1-bin/lib/hive-jdbc-2.0.1-standalone.jar!/hive-webapps/hiveserver2
2016-06-03T10:20:17,827 INFO  [main]: handler.ContextHandler (ContextHandler.java:startContext(737)) - started o.e.j.s.ServletContextHandler{/static,jar:file:/opt/apache-hive-2.0.1-bin/lib/hive-jdbc-2.0.1-standalone.jar!/hive-webapps/static}
2016-06-03T10:20:17,827 INFO  [main]: handler.ContextHandler (ContextHandler.java:startContext(737)) - started o.e.j.s.ServletContextHandler{/logs,file:/hadoop/hive/log/}
2016-06-03T10:20:17,841 INFO  [main]: server.AbstractConnector (AbstractConnector.java:doStart(333)) - Started SelectChannelConnector@0.0.0.0:10002
2016-06-03T10:20:17,841 INFO  [main]: server.HiveServer2 (HiveServer2.java:start(438)) - Web UI has started on port 10002


网页可以打开,看到hiveserver2
http://192.168.1.187:10002/hiveserver2.jsp


1..通过日志,可以看到hiveserver2是正常开启的,但就是一直报错: User: root is not allowed to impersonate root

设置hadoop的core-site.xml<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp</value>
</property>
<property>
  <name>fs.default.name</name>
  <value>hdfs://192.168.1.187:9000</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/hadoop/name</value>
</property>
<property>
  <name>hadoop.proxyuser.root.hosts</name>                                              
  <value>192.168.1.187</value>
</property>
<property>
  <name>hadoop.proxyuser.root.groups</name>
  <value>root</value>
</property>
<property>
  <name>fs.checkpoint.period</name>
  <value>3600</value>
  <description>The number of seconds between two periodic checkpoints.</description>
</property>
<property>
  <name>fs.checkpoint.size</name>
  <value>67108864</value>
</property>
<property>
     <name>fs.checkpoint.dir</name>
     <value>/hadoop/namesecondary</value>
</property>
</configuration>
搞了很久才发现,在187上对hadoop的core-site.xml做的修改,没有传到另外两个节点


2. 设置impersonation,这样hive server会以提交用户的身份去执行语句,如果设置为false,则会以起hive server daemon的admin user来执行语句
[html] 
<property> 
  <name>hive.server2.enable.doAs</name> 
  <value>true</value> 
</property> 

3. JDBC方式
hive server 1的driver classname是org.apache.hadoop.hive.jdbc.HiveDriver,Hive Server 2的是org.apache.hive.jdbc.HiveDriver,这两个容易混淆。

[root@kafka3 bin]# hiveserver2   --终于成功啦!!
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-2.0.1-bin/lib/hive-jdbc-2.0.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.6.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
OK


[root@kafka3 hadoop]# cd /opt/apache-hive-2.0.1-bin/bin
[root@kafka3 bin]# ./beeline
Beeline version 2.0.1 by Apache Hive
beeline>  !connect jdbc:hive2://192.168.1.187:10000
Connecting to jdbc:hive2://192.168.1.187:10000
Enter username for jdbc:hive2://192.168.1.187:10000: root
Enter password for jdbc:hive2://192.168.1.187:10000:                                                   
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-2.0.1-bin/lib/hive-jdbc-2.0.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.6.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connected to: Apache Hive (version 2.0.1)
Driver: Hive JDBC (version 2.0.1)
16/06/03 15:44:19 [main]: WARN jdbc.HiveConnection: Request to set autoCommit to false; Hive does not support autoCommit=false.
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://192.168.1.187:10000>
0: jdbc:hive2://192.168.1.187:10000> show tables;
INFO  : Compiling command(queryId=root_20160603154642_dd611020-8d3f-4abe-9bd5-7f2fda519007): show tables
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=root_20160603154642_dd611020-8d3f-4abe-9bd5-7f2fda519007); Time taken: 0.291 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=root_20160603154642_dd611020-8d3f-4abe-9bd5-7f2fda519007): show tables
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=root_20160603154642_dd611020-8d3f-4abe-9bd5-7f2fda519007); Time taken: 0.199 seconds
INFO  : OK
+---------------------------+--+
|         tab_name          |
+---------------------------+--+
| c2                        |
| hbase_runningrecord_temp  |
| rc_file                   |
| rc_file1                  |
| runningrecord_old         |
| sequence_file             |
| studentinfo               |
| t2                        |
| test_table                |
| test_table1               |
| tina                      |
+---------------------------+--+
11 rows selected (1.194 seconds)
0: jdbc:hive2://192.168.1.187:10000>



创建项目:hivecon
新建包:hivecon
新建类:testhive
package hivecon;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;

public class testhive {
public static void main(String[] args)throws Exception {
// TODO Auto-generated method stub 
Class.forName("org.apache.hive.jdbc.HiveDriver"); 
Connection conn=DriverManager.getConnection("jdbc:hive2://192.168.1.187:10000","root","");
System.out.println("连接:"+conn);
Statement stmt=conn.createStatement();
//String tablename="";
String query_sql="select systemno from runningrecord_old limit 1";
ResultSet rs=stmt.executeQuery(query_sql);
System.out.println("是否有数据:"+rs.next());
}
}
可以直接执行:
ERROR StatusLogger Unrecognized format specifier [msg]
ERROR StatusLogger Unrecognized conversion specifier [msg] starting at position 54 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [n]
ERROR StatusLogger Unrecognized conversion specifier [n] starting at position 56 in conversion pattern. --日志的报错暂时忽略
连接:org.apache.hive.jdbc.HiveConnection@64485a47
是否有数据:false


---再添加一些操作:
package hivecon;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;

public class testhive {
    private static String sql = ""; 
    private static ResultSet res; 

public static void main(String[] args)throws Exception {
// TODO Auto-generated method stub 
Class.forName("org.apache.hive.jdbc.HiveDriver"); 
Connection conn=DriverManager.getConnection("jdbc:hive2://192.168.1.187:10000","root","");
System.out.println("连接:"+conn);
Statement stmt=conn.createStatement();
String query_sql="select systemno from runningrecord_old limit 1";
ResultSet rs=stmt.executeQuery(query_sql);
System.out.println("是否有数据:"+rs.next());

//创建的表名 
String tableName = "tinatest"; 

/** 第一步:存在就先删除 **/ 
sql = "drop table " + tableName; 
stmt.execute(sql); 

/** 第二步:不存在就创建 **/ 
sql = "create table " + tableName + " (key int, value string)  row format delimited fields terminated by ','"; 
stmt.execute(sql); 

// 执行“show tables”操作 
sql = "show tables '" + tableName + "'"; 
System.out.println("Running:" + sql); 
res = stmt.executeQuery(sql); 
System.out.println("执行“show tables”运行结果:"); 
if (res.next()) { 
        System.out.println(res.getString(1)); 
}

// 执行“describe table”操作 
sql = "describe " + tableName; 
System.out.println("Running:" + sql); 
res = stmt.executeQuery(sql); 
System.out.println("执行“describe table”运行结果:"); 
while (res.next()) {   
        System.out.println(res.getString(1) + "\t" + res.getString(2)); 

// 执行“load data into table”操作 
String filepath = "/tmp/test2.txt"; 
sql = "load data local inpath '" + filepath + "' into table " + tableName; 
System.out.println("Running:" + sql); 
stmt.executeUpdate(sql); 
// 执行“select * query”操作 
sql = "select * from " + tableName; 
System.out.println("Running:" + sql); 
res = stmt.executeQuery(sql); 
System.out.println("执行“select * query”运行结果:"); 
while (res.next()) { 
        System.out.println(res.getInt(1) + "\t" + res.getString(2)); 

conn.close(); 
conn = null; 
}
}

--执行结果:
连接:org.apache.hive.jdbc.HiveConnection@64485a47
是否有数据:true
Running:show tables 'tinatest'
执行“show tables”运行结果:
tinatest
Running:describe tinatest
执行“describe table”运行结果:
key int
value string
Running:load data local inpath '/tmp/test2.txt' into table tinatest
Running:select * from tinatest
执行“select * query”运行结果:
1 a
2 b
3 tina

去hive里面验证:
hive> show tables;
OK
c2
hbase_runningrecord_temp
rc_file
rc_file1
runningrecord_old
sequence_file
studentinfo
t2
test_table
test_table1
tina
tinatest
Time taken: 0.065 seconds, Fetched: 12 row(s)
hive> select * from tinatest;
OK
1 a
2 b
3 tina
Time taken: 3.065 seconds, Fetched: 3 row(s)

分享到:
评论

相关推荐

    远程eclipse连接hive数据问题.txt

    hive无法连接本地eclipse等

    eclipse集成hadoop+spark+hive开发源码实例

    9. **建立连接**:在Eclipse中配置Hadoop和Spark连接,使得Eclipse能够与本地或远程Hadoop和Spark集群通信。 10. **开发源码**:现在你可以在Eclipse中编写Hadoop MapReduce、Spark应用和Hive查询。使用Eclipse的...

    hive2.1.1 + hadoop2.6.0jdbc驱动

    10. **libthrift-0.9.3.jar**:Thrift库,是Facebook开源的一种跨语言的服务框架,Hive使用Thrift来提供RPC服务。 这个压缩包提供的jar文件是构建Hive JDBC连接的基础,可以用于开发应用程序,通过JDBC接口访问和...

    hadoop2.6(x64)Win7上远程调试hadoop 集群

    - 使用IDE(如Eclipse或IntelliJ IDEA)的远程调试功能,设置与`HADOOP_OPTS`中指定的地址相同的调试端口。 - 当作业运行时,IDE将自动连接到运行中的Java进程,从而可以在Win7上进行远程调试。 8. **监控和故障...

    hadoop-eclipse2.7.1、hadoop-eclipse2.7.2、hadoop-eclipse2.7.3

    6. **与其他Hadoop组件的配合**:Hadoop生态系统中有许多其他组件,如HBase、Hive、Pig等,Hadoop-Eclipse插件可能也提供了与这些组件的集成,使得开发者能够方便地在Eclipse中编写和测试相关应用。 7. **最佳实践*...

    Hive配置和基本操作 (2).pdf

    - Hive使用JDBC连接MySQL作为元数据存储,因此需要配置数据库的相关信息,包括驱动、URL、用户名和密码。 - 示例配置如下: ```xml &lt;name&gt;javax.jdo.option.ConnectionDriverName &lt;value&gt;...

    hadoop-eclipse-plugin-2.7.3.zip

    5. 集群连接:插件允许开发者连接到远程Hadoop集群,进行分布式测试和生产部署。 总的来说,"hadoop-eclipse-plugin-2.7.3.zip"是Hadoop开发者的重要工具,它提高了开发效率,简化了MapReduce程序的生命周期管理,...

    Hadoop_2.X,eclipse开发插件

    在Windows 7操作系统上,使用Eclipse开发Hadoop应用时,需要特定的插件来实现与远程Hadoop集群的连接。 Hadoop是一个开源框架,主要用于存储和处理大规模数据。Hadoop 2.x的核心组件包括HDFS(Hadoop Distributed ...

    大数据云计算技术系列 hadoop搭建与eclipse开发环境设置-已验证通过(共13页).rar

    5. 运行和调试:使用Eclipse的Run或Debug功能,可以直接在本地运行MapReduce程序,或者连接到远程Hadoop集群进行测试。 此外,对于更高效的开发,可以学习使用Hadoop的高级特性,如Pig、Hive、Spark等工具,它们...

    零基础学习hadoop编程篇 (3).docx

    同时,为了远程连接到Hadoop集群,你可能需要安装Hadoop-Eclipse-Plugin插件。 4. **MapReduce编程模型**:MapReduce是Hadoop的核心计算模型,它将大任务分解为小任务在集群中并行处理。一个MapReduce程序通常包括...

    ecplise plugins of hadoop2.7.1 and 2.x

    - 配置Hadoop集群连接,允许在本地或远程集群上进行调试和运行作业。 - 直接通过Eclipse IDE浏览HDFS,上传和下载文件。 - 自动完成Hadoop相关的类库和API,提高编码效率。 - 提供错误检查和代码提示,帮助避免常见...

    sqoopOperate

    执行 Sqoop 命令时,可以使用`--debug`选项来开启调试模式,这将使每个MapReduce任务在启动时等待远程调试器的连接。例如,一个基本的 Sqoop 导入命令可能是这样的: ``` sqoop import --connect 'jdbc:mysql://...

    Intellij Hadoop插件

    1. **项目配置**:Hadoop插件允许用户在IntelliJ IDEA中直接配置Hadoop集群信息,包括HDFS和YARN的地址,使得开发者可以在本地进行模拟测试或者远程连接到实际的Hadoop集群。 2. **MapReduce支持**:通过插件,可以...

    大数据课程体系.docx

    - **HiveServer2与JDBC**:解释如何通过JDBC连接HiveServer2,实现Java应用程序与Hive的交互。 - **用户自定义函数(UDF和UDAF)的开发与演示**:指导如何开发UDF和UDAF,并提供示例代码。 - **Hive优化**:分享Hive...

    hadoop3.0+工具

    - 调试和运行:使用Eclipse的调试工具,可以直接在本地运行或提交到远程Hadoop集群。 5. **Hadoop开发实践**: - **MapReduce编程模型**:理解Map和Reduce阶段,学习如何处理键值对数据。 - **Pig和Hive**:基于...

    hadoop全套课件

    "第6讲:eclipse与Hadoop集群连接"则探讨了如何使用开发工具Eclipse与Hadoop集群进行交互,实现本地开发与远程调试,提升开发效率。 "第7讲:Hive数据仓库"介绍了Hadoop生态中的数据仓库工具Hive,它是基于SQL的...

    分布式集群技术.pdf

    API 是分布式集群技术的基础,Topology、Spout、Bolt、Storm 分组策略(stream groupings)、Storm 项目 maven 环境搭建、使用 Strom 开发一个 WordCount 例子、Storm 程序本地模式 debug、Storm 程序远程 ...

    搭建Hadoop大数据处理环境.pdf

    最后,开发工具方面,Eclipse IDE是一个广泛使用的Java开发环境,也可以用来进行Hadoop项目的源码编译。无论是Linux还是Windows,开发者都需要掌握源码编译技术,以便于调试和优化Hadoop程序。 在搭建Hadoop环境的...

    Windows装Hadoop的一些必须文件和包

    8. **SSH客户端**: 虽然Windows没有内置的SSH服务,但你可以安装PuTTY或其他SSH客户端来连接运行Hadoop的远程服务器,这对于集群管理非常有用。 9. **额外工具**: 可能还需要其他辅助工具,如Hadoop命令行工具、...

    SparkJavaPractice:使用Java的Spark练习代码

    - 开发者可能需要配置本地Spark环境或连接到远程集群,如YARN、Mesos或Kubernetes。 - IDE如IntelliJ IDEA或Eclipse可以与Maven或Gradle集成,提供便捷的Spark项目开发体验。 9. **调试和测试** - Spark应用程序...

Global site tag (gtag.js) - Google Analytics