- 浏览: 243332 次
最新评论
基础环境:
namenode 192.168.1.187 kafka3
datanode 192.168.1.188 kafka4
datanode 192.168.1.189 kafka1
这个集群是自己下的hadoop-*.tar.gz包逐个服务安装的,因此配置文件都需要手动修改,相对cloudera manager的要复杂一些。
hadoop 2.6.2
hive 2.0.1 --只安装在了187上面
1.启动hadoop
./start-all.sh
2.配置hive
[root@kafka3 conf]# cat hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/hadoop/hive/log</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>mapred.job.tracker</name>
<value>http://192.168.1.187:9001</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>192.168.1.187</value>
</property>
<property>
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>
<name>hive.hwi.listen.port </name>
<value>9999</value>
<description>This is the port the Hive Web Interface will listen on </description>
</property>
<property>
<name>datanucleus.autoCreateSchema </name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore </name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.1.189:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>kafka1,kafka4,kafka3</value>
</property>
</configuration>
3.启动hiveserver服务
[root@kafka3 bin]# ./hiveserver2
命令行模式:
hive --service hiveserver2
服务模式:
./hiveserver2
4.测试连接:
不用写jdbc程序,运行 bin/beeline.sh
[root@kafka3 bin]# ./beeline
ls: 无法访问/opt/apache-hive-2.0.1-bin//lib/hive-jdbc-*-standalone.jar: 没有那个文件或目录
Beeline version 2.0.1 by Apache Hive
beeline> !connect jdbc:hive://192.168.1.187:10000 root root
scan complete in 1ms
scan complete in 7577ms
No known driver to handle "jdbc:hive://192.168.1.187:10000" ---不用hive,改用hive2
beeline>
找到了这个包
注意要讲hive的lib下的所有jar包都放到eclipse里面去
[root@kafka3 bin]# cp /opt/apache-hive-2.0.1-bin/jdbc/hive-jdbc-2.0.1-standalone.jar /opt/apache-hive-2.0.1-bin/lib/
beeline> !connect jdbc:hive2://192.168.1.187:10000
Connecting to jdbc:hive2://192.168.1.187:10000
Enter username for jdbc:hive2://192.168.1.187:10000: root
Enter password for jdbc:hive2://192.168.1.187:10000: root
beeline> !connect jdbc:hive2://192.168.1.187:10000
Connecting to jdbc:hive2://192.168.1.187:10000
Enter username for jdbc:hive2://192.168.1.187:10000: root
Enter password for jdbc:hive2://192.168.1.187:10000:
Enter password for jdbc:hive2://192.168.1.187:10000: Error: Failed to open new session: java.lang.RuntimeException:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
User: root is not allowed to impersonate root (state=,code=0)
重启hadoop后,还是不行,但是报错内容换了。
在hadoop的core-site.xml中添加内容:
<property>
<name>hadoop.proxyuser.hadoop.hosts</name> --刚开始这里写错了一直不知道!不是hadoop用户,我是用的root用户
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>root</value>
</property>
正确的是:
在hadoop的core-site.xml中添加内容:
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>root</value>
</property>
beeline> !connect jdbc:hive2://192.168.1.187:10000
Connecting to jdbc:hive2://192.168.1.187:10000
Enter username for jdbc:hive2://192.168.1.187:10000: root
Enter password for jdbc:hive2://192.168.1.187:10000:
16/06/02 11:22:00 [main]: INFO jdbc.HiveConnection: Transport Used for JDBC connection: null
Error: Could not open client transport with JDBC Uri: jdbc:hive2://192.168.1.187:10000: java.net.ConnectException: 拒绝连接 (state=08S01,code=0)
把$HIVE_HOME/lib下的所有hive开头的jar包都拷贝过去
[root@kafka3 bin]# ./beeline
Beeline version 2.0.1 by Apache Hive --报错没有了
beeline>
开启hive的log
cd /opt/apache-hive-2.0.1-bin/conf
cp hive-log4j2.properties.template hive-log4j2.properties
vi hive-log4j2.properties
property.hive.log.dir = /hadoop/hive/log
property.hive.log.file = hive.log
[root@kafka3 log]# more hive.log
2016-06-03T10:20:16,883 INFO [main]: service.AbstractService (AbstractService.java:init(89)) - Service:OperationManager is inited.
2016-06-03T10:20:16,884 INFO [main]: service.AbstractService (AbstractService.java:init(89)) - Service:SessionManager is inited.
2016-06-03T10:20:16,884 INFO [main]: service.AbstractService (AbstractService.java:init(89)) - Service:CLIService is inited.
2016-06-03T10:20:16,884 INFO [main]: service.AbstractService (AbstractService.java:init(89)) - Service:ThriftBinaryCLIService is inited.
2016-06-03T10:20:16,884 INFO [main]: service.AbstractService (AbstractService.java:init(89)) - Service:HiveServer2 is inited.
2016-06-03T10:20:17,022 INFO [main]: service.AbstractService (AbstractService.java:start(104)) - Service:OperationManager is started.
2016-06-03T10:20:17,022 INFO [main]: service.AbstractService (AbstractService.java:start(104)) - Service:SessionManager is started.
2016-06-03T10:20:17,023 INFO [main]: service.AbstractService (AbstractService.java:start(104)) - Service:CLIService is started.
2016-06-03T10:20:17,023 INFO [main]: service.AbstractService (AbstractService.java:start(104)) - Service:ThriftBinaryCLIService is started.
2016-06-03T10:20:17,023 INFO [main]: service.AbstractService (AbstractService.java:start(104)) - Service:HiveServer2 is started.
2016-06-03T10:20:17,038 INFO [main]: server.Server (Server.java:doStart(252)) - jetty-7.6.0.v20120127
2016-06-03T10:20:17,064 INFO [main]: webapp.WebInfConfiguration (WebInfConfiguration.java:unpack(455)) - Extract jar:file:/opt/apache-hive-2.0.1-bin/lib/hive-jdbc-2.
0.1-standalone.jar!/hive-webapps/hiveserver2/ to /tmp/jetty-0.0.0.0-10002-hiveserver2-_-any-/webapp
2016-06-03T10:20:17,582 INFO [Thread-10]: thrift.ThriftCLIService (ThriftBinaryCLIService.java:run(100)) - Starting ThriftBinaryCLIService on port 10000 with 5...500
worker threads
2016-06-03T10:20:17,023 INFO [main]: service.AbstractService (AbstractService.java:start(104)) - Service:ThriftBinaryCLIService is started.
2016-06-03T10:20:17,023 INFO [main]: service.AbstractService (AbstractService.java:start(104)) - Service:HiveServer2 is started.
2016-06-03T10:20:17,038 INFO [main]: server.Server (Server.java:doStart(252)) - jetty-7.6.0.v20120127
2016-06-03T10:20:17,064 INFO [main]: webapp.WebInfConfiguration (WebInfConfiguration.java:unpack(455)) - Extract jar:file:/opt/apache-hive-2.0.1-bin/lib/hive-jdbc-2.0.1-standalone.jar!/hive-webapps/hiveserver2/ to /tmp/jetty-0.0.0.0-10002-hiveserver2-_-any-/webapp
2016-06-03T10:20:17,582 INFO [Thread-10]: thrift.ThriftCLIService (ThriftBinaryCLIService.java:run(100)) - Starting ThriftBinaryCLIService on port 10000 with 5...500 worker threads
2016-06-03T10:20:17,781 INFO [main]: handler.ContextHandler (ContextHandler.java:startContext(737)) - started o.e.j.w.WebAppContext{/,file:/tmp/jetty-0.0.0.0-10002-hiveserver2-_-any-/webapp/},jar:file:/opt/apache-hive-2.0.1-bin/lib/hive-jdbc-2.0.1-standalone.jar!/hive-webapps/hiveserver2
2016-06-03T10:20:17,827 INFO [main]: handler.ContextHandler (ContextHandler.java:startContext(737)) - started o.e.j.s.ServletContextHandler{/static,jar:file:/opt/apache-hive-2.0.1-bin/lib/hive-jdbc-2.0.1-standalone.jar!/hive-webapps/static}
2016-06-03T10:20:17,827 INFO [main]: handler.ContextHandler (ContextHandler.java:startContext(737)) - started o.e.j.s.ServletContextHandler{/logs,file:/hadoop/hive/log/}
2016-06-03T10:20:17,841 INFO [main]: server.AbstractConnector (AbstractConnector.java:doStart(333)) - Started SelectChannelConnector@0.0.0.0:10002
2016-06-03T10:20:17,841 INFO [main]: server.HiveServer2 (HiveServer2.java:start(438)) - Web UI has started on port 10002
网页可以打开,看到hiveserver2
http://192.168.1.187:10002/hiveserver2.jsp
1..通过日志,可以看到hiveserver2是正常开启的,但就是一直报错: User: root is not allowed to impersonate root
设置hadoop的core-site.xml<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.1.187:9000</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/hadoop/name</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>192.168.1.187</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>root</value>
</property>
<property>
<name>fs.checkpoint.period</name>
<value>3600</value>
<description>The number of seconds between two periodic checkpoints.</description>
</property>
<property>
<name>fs.checkpoint.size</name>
<value>67108864</value>
</property>
<property>
<name>fs.checkpoint.dir</name>
<value>/hadoop/namesecondary</value>
</property>
</configuration>
搞了很久才发现,在187上对hadoop的core-site.xml做的修改,没有传到另外两个节点
2. 设置impersonation,这样hive server会以提交用户的身份去执行语句,如果设置为false,则会以起hive server daemon的admin user来执行语句
[html]
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>
3. JDBC方式
hive server 1的driver classname是org.apache.hadoop.hive.jdbc.HiveDriver,Hive Server 2的是org.apache.hive.jdbc.HiveDriver,这两个容易混淆。
[root@kafka3 bin]# hiveserver2 --终于成功啦!!
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-2.0.1-bin/lib/hive-jdbc-2.0.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.6.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
OK
[root@kafka3 hadoop]# cd /opt/apache-hive-2.0.1-bin/bin
[root@kafka3 bin]# ./beeline
Beeline version 2.0.1 by Apache Hive
beeline> !connect jdbc:hive2://192.168.1.187:10000
Connecting to jdbc:hive2://192.168.1.187:10000
Enter username for jdbc:hive2://192.168.1.187:10000: root
Enter password for jdbc:hive2://192.168.1.187:10000:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-2.0.1-bin/lib/hive-jdbc-2.0.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.6.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connected to: Apache Hive (version 2.0.1)
Driver: Hive JDBC (version 2.0.1)
16/06/03 15:44:19 [main]: WARN jdbc.HiveConnection: Request to set autoCommit to false; Hive does not support autoCommit=false.
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://192.168.1.187:10000>
0: jdbc:hive2://192.168.1.187:10000> show tables;
INFO : Compiling command(queryId=root_20160603154642_dd611020-8d3f-4abe-9bd5-7f2fda519007): show tables
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
INFO : Completed compiling command(queryId=root_20160603154642_dd611020-8d3f-4abe-9bd5-7f2fda519007); Time taken: 0.291 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=root_20160603154642_dd611020-8d3f-4abe-9bd5-7f2fda519007): show tables
INFO : Starting task [Stage-0:DDL] in serial mode
INFO : Completed executing command(queryId=root_20160603154642_dd611020-8d3f-4abe-9bd5-7f2fda519007); Time taken: 0.199 seconds
INFO : OK
+---------------------------+--+
| tab_name |
+---------------------------+--+
| c2 |
| hbase_runningrecord_temp |
| rc_file |
| rc_file1 |
| runningrecord_old |
| sequence_file |
| studentinfo |
| t2 |
| test_table |
| test_table1 |
| tina |
+---------------------------+--+
11 rows selected (1.194 seconds)
0: jdbc:hive2://192.168.1.187:10000>
创建项目:hivecon
新建包:hivecon
新建类:testhive
package hivecon;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
public class testhive {
public static void main(String[] args)throws Exception {
// TODO Auto-generated method stub
Class.forName("org.apache.hive.jdbc.HiveDriver");
Connection conn=DriverManager.getConnection("jdbc:hive2://192.168.1.187:10000","root","");
System.out.println("连接:"+conn);
Statement stmt=conn.createStatement();
//String tablename="";
String query_sql="select systemno from runningrecord_old limit 1";
ResultSet rs=stmt.executeQuery(query_sql);
System.out.println("是否有数据:"+rs.next());
}
}
可以直接执行:
ERROR StatusLogger Unrecognized format specifier [msg]
ERROR StatusLogger Unrecognized conversion specifier [msg] starting at position 54 in conversion pattern.
ERROR StatusLogger Unrecognized format specifier [n]
ERROR StatusLogger Unrecognized conversion specifier [n] starting at position 56 in conversion pattern. --日志的报错暂时忽略
连接:org.apache.hive.jdbc.HiveConnection@64485a47
是否有数据:false
---再添加一些操作:
package hivecon;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
public class testhive {
private static String sql = "";
private static ResultSet res;
public static void main(String[] args)throws Exception {
// TODO Auto-generated method stub
Class.forName("org.apache.hive.jdbc.HiveDriver");
Connection conn=DriverManager.getConnection("jdbc:hive2://192.168.1.187:10000","root","");
System.out.println("连接:"+conn);
Statement stmt=conn.createStatement();
String query_sql="select systemno from runningrecord_old limit 1";
ResultSet rs=stmt.executeQuery(query_sql);
System.out.println("是否有数据:"+rs.next());
//创建的表名
String tableName = "tinatest";
/** 第一步:存在就先删除 **/
sql = "drop table " + tableName;
stmt.execute(sql);
/** 第二步:不存在就创建 **/
sql = "create table " + tableName + " (key int, value string) row format delimited fields terminated by ','";
stmt.execute(sql);
// 执行“show tables”操作
sql = "show tables '" + tableName + "'";
System.out.println("Running:" + sql);
res = stmt.executeQuery(sql);
System.out.println("执行“show tables”运行结果:");
if (res.next()) {
System.out.println(res.getString(1));
}
// 执行“describe table”操作
sql = "describe " + tableName;
System.out.println("Running:" + sql);
res = stmt.executeQuery(sql);
System.out.println("执行“describe table”运行结果:");
while (res.next()) {
System.out.println(res.getString(1) + "\t" + res.getString(2));
}
// 执行“load data into table”操作
String filepath = "/tmp/test2.txt";
sql = "load data local inpath '" + filepath + "' into table " + tableName;
System.out.println("Running:" + sql);
stmt.executeUpdate(sql);
// 执行“select * query”操作
sql = "select * from " + tableName;
System.out.println("Running:" + sql);
res = stmt.executeQuery(sql);
System.out.println("执行“select * query”运行结果:");
while (res.next()) {
System.out.println(res.getInt(1) + "\t" + res.getString(2));
}
conn.close();
conn = null;
}
}
--执行结果:
连接:org.apache.hive.jdbc.HiveConnection@64485a47
是否有数据:true
Running:show tables 'tinatest'
执行“show tables”运行结果:
tinatest
Running:describe tinatest
执行“describe table”运行结果:
key int
value string
Running:load data local inpath '/tmp/test2.txt' into table tinatest
Running:select * from tinatest
执行“select * query”运行结果:
1 a
2 b
3 tina
去hive里面验证:
hive> show tables;
OK
c2
hbase_runningrecord_temp
rc_file
rc_file1
runningrecord_old
sequence_file
studentinfo
t2
test_table
test_table1
tina
tinatest
Time taken: 0.065 seconds, Fetched: 12 row(s)
hive> select * from tinatest;
OK
1 a
2 b
3 tina
Time taken: 3.065 seconds, Fetched: 3 row(s)
发表评论
-
使用cloudera manager 安装CDH5
2016-06-13 16:13 20824使用cloudera manager安装cdh5 [root@ ... -
使用eclipse远程连接hbase
2016-06-13 16:03 1773基础环境: CDH 5.4.10 hadoop 2.6.0 ... -
使用eclipse远程连接hive---基于CDH5
2016-06-13 15:56 3164我已经用cloudera manager安装好了CDH5.4 ... -
hadoop学习笔记2
2016-05-19 16:09 1016.机架 rack--机架 一个block的三个副本通常会保 ... -
hadoop伪分布(单节点集群)安装测试
2016-05-18 15:43 20hadoop伪分布安装测试 因为测试环境有限,只有一台机器可 ...
相关推荐
hive无法连接本地eclipse等
9. **建立连接**:在Eclipse中配置Hadoop和Spark连接,使得Eclipse能够与本地或远程Hadoop和Spark集群通信。 10. **开发源码**:现在你可以在Eclipse中编写Hadoop MapReduce、Spark应用和Hive查询。使用Eclipse的...
10. **libthrift-0.9.3.jar**:Thrift库,是Facebook开源的一种跨语言的服务框架,Hive使用Thrift来提供RPC服务。 这个压缩包提供的jar文件是构建Hive JDBC连接的基础,可以用于开发应用程序,通过JDBC接口访问和...
- 使用IDE(如Eclipse或IntelliJ IDEA)的远程调试功能,设置与`HADOOP_OPTS`中指定的地址相同的调试端口。 - 当作业运行时,IDE将自动连接到运行中的Java进程,从而可以在Win7上进行远程调试。 8. **监控和故障...
6. **与其他Hadoop组件的配合**:Hadoop生态系统中有许多其他组件,如HBase、Hive、Pig等,Hadoop-Eclipse插件可能也提供了与这些组件的集成,使得开发者能够方便地在Eclipse中编写和测试相关应用。 7. **最佳实践*...
- Hive使用JDBC连接MySQL作为元数据存储,因此需要配置数据库的相关信息,包括驱动、URL、用户名和密码。 - 示例配置如下: ```xml <name>javax.jdo.option.ConnectionDriverName <value>...
5. 集群连接:插件允许开发者连接到远程Hadoop集群,进行分布式测试和生产部署。 总的来说,"hadoop-eclipse-plugin-2.7.3.zip"是Hadoop开发者的重要工具,它提高了开发效率,简化了MapReduce程序的生命周期管理,...
在Windows 7操作系统上,使用Eclipse开发Hadoop应用时,需要特定的插件来实现与远程Hadoop集群的连接。 Hadoop是一个开源框架,主要用于存储和处理大规模数据。Hadoop 2.x的核心组件包括HDFS(Hadoop Distributed ...
5. 运行和调试:使用Eclipse的Run或Debug功能,可以直接在本地运行MapReduce程序,或者连接到远程Hadoop集群进行测试。 此外,对于更高效的开发,可以学习使用Hadoop的高级特性,如Pig、Hive、Spark等工具,它们...
同时,为了远程连接到Hadoop集群,你可能需要安装Hadoop-Eclipse-Plugin插件。 4. **MapReduce编程模型**:MapReduce是Hadoop的核心计算模型,它将大任务分解为小任务在集群中并行处理。一个MapReduce程序通常包括...
- 配置Hadoop集群连接,允许在本地或远程集群上进行调试和运行作业。 - 直接通过Eclipse IDE浏览HDFS,上传和下载文件。 - 自动完成Hadoop相关的类库和API,提高编码效率。 - 提供错误检查和代码提示,帮助避免常见...
执行 Sqoop 命令时,可以使用`--debug`选项来开启调试模式,这将使每个MapReduce任务在启动时等待远程调试器的连接。例如,一个基本的 Sqoop 导入命令可能是这样的: ``` sqoop import --connect 'jdbc:mysql://...
1. **项目配置**:Hadoop插件允许用户在IntelliJ IDEA中直接配置Hadoop集群信息,包括HDFS和YARN的地址,使得开发者可以在本地进行模拟测试或者远程连接到实际的Hadoop集群。 2. **MapReduce支持**:通过插件,可以...
- **HiveServer2与JDBC**:解释如何通过JDBC连接HiveServer2,实现Java应用程序与Hive的交互。 - **用户自定义函数(UDF和UDAF)的开发与演示**:指导如何开发UDF和UDAF,并提供示例代码。 - **Hive优化**:分享Hive...
- 调试和运行:使用Eclipse的调试工具,可以直接在本地运行或提交到远程Hadoop集群。 5. **Hadoop开发实践**: - **MapReduce编程模型**:理解Map和Reduce阶段,学习如何处理键值对数据。 - **Pig和Hive**:基于...
"第6讲:eclipse与Hadoop集群连接"则探讨了如何使用开发工具Eclipse与Hadoop集群进行交互,实现本地开发与远程调试,提升开发效率。 "第7讲:Hive数据仓库"介绍了Hadoop生态中的数据仓库工具Hive,它是基于SQL的...
API 是分布式集群技术的基础,Topology、Spout、Bolt、Storm 分组策略(stream groupings)、Storm 项目 maven 环境搭建、使用 Strom 开发一个 WordCount 例子、Storm 程序本地模式 debug、Storm 程序远程 ...
最后,开发工具方面,Eclipse IDE是一个广泛使用的Java开发环境,也可以用来进行Hadoop项目的源码编译。无论是Linux还是Windows,开发者都需要掌握源码编译技术,以便于调试和优化Hadoop程序。 在搭建Hadoop环境的...
8. **SSH客户端**: 虽然Windows没有内置的SSH服务,但你可以安装PuTTY或其他SSH客户端来连接运行Hadoop的远程服务器,这对于集群管理非常有用。 9. **额外工具**: 可能还需要其他辅助工具,如Hadoop命令行工具、...
- 开发者可能需要配置本地Spark环境或连接到远程集群,如YARN、Mesos或Kubernetes。 - IDE如IntelliJ IDEA或Eclipse可以与Maven或Gradle集成,提供便捷的Spark项目开发体验。 9. **调试和测试** - Spark应用程序...