- 浏览: 234833 次
- 性别:
- 来自: 上海
文章分类
最新评论
-
lwb314:
你的这个是创建的临时的hive表,数据也是通过文件录入进去的, ...
Spark SQL操作Hive数据库 -
yixiaoqi2010:
你好 我的提交上去 总是报错,找不到hive表,可能是哪里 ...
Spark SQL操作Hive数据库 -
bo_hai:
target jvm版本也要选择正确。不能选择太高。2.10对 ...
eclipse开发spark程序配置本地运行
Hadoop2.6.0集群搭建完毕后,下面介绍一下eclipse是如何开发hadoop程序(即MapReduce程序)的。
1.jdk安装hadoop集群的搭建,不再详述,参考http://kevin12.iteye.com/blog/2273556;
下面运行下hadoop自带的wordcount例子:
2.先将hadoop-2.6.0目录下面的README.txt和LICENSE.txt文件put到集群的/library/hadoop/data目录下面,如果目录不存在先创建;
也可以通过浏览器查看文件内容,将/library/hadoop/wordcount_output1目录下面的part-r-00000文件下载打开查看即可:
3.下载eclipse liunx版64位的,直接去eclipse官网下载,不再详述;
4.将下载好的eclipse解压到/usr/local/目录下面,进入eclipse目录,执行eclipse,打开eclipse,工作空间选择默认(/root/workspace)即可。
5.本地开发hadoop程序调试运行需要hadoop-eclipse-plugin-2.6.0.jar,将该jar包拷贝到/usr/local/eclipse目录下面的plugins目录下面。在eclipse中选择File->restart重启eclipse,重启打开后选择Window->Show View->Other选中MapReduceTools下面的Map/Reduce Locations并将点击OK。
6.创建Hadoop Location,在下面的Map/Reduce Locations中新建一个Hadoop location,配置好Host和Port后保存,选择Java EE的浏览方式,在Project Explorer下面就可以看到DFS Loations,并且显示了集群的根目录文件信息。
7.指定hadoop的安装目录,选择Window->Preferences->Data Management选中Hadoop Map/Reduce在右面的页签中点击“Browse..."按钮,选择hadoop的安装目录,然后点击Apply,并保存退出。
8.创建项目,点击File->New->Other,在弹出框中选择Map/Reduce Project.点击Next,填写项目名称,点击Finish,在弹出框中选择ok,eclipse自动会将hadoop安装目录下的架包引用进来,这样我们就可以开发MapReduce程序了。
9.关联源码.
首先将hadoop-2.6.0-src.tar.gz拷贝到虚拟机中并解压到/usr/local/hadoop目录下,并用tar命令解压缩;
再按Shift+Ctrl+T组合键,在弹出框中输入NameNode选择Hadoop的NamedNodeMap,在class页签中点击Attach Source,在弹出的对话框中选择External location,并点击Extenal Folder按钮选择刚才解压缩的源码hadoop-2.6.0-src,点击OK就可以关联上源码了。
10.运行WordCount例子
解压缩源码,在hadoop-2.6.0-src\hadoop-mapreduce-project\hadoop-mapreduce-examples\src\main\java\org\apache\hadoop\examples中找到WordCount.java文件,拷贝到HadoopApps的com.imf.hadoop包中(如果包不存在先创建);
WordCount源码如下:
然后选择WordCount右键->Run As->Run Configruations,在Java Application中选择WordCount在右面的页签中配置Arguments的Program arguments参数如下:hdfs://master1:9000/library/hadoop/data hdfs://master1:9000/library/hadoop/wordcount_output1,点击Apply,并点击Run按钮运行。
11.查看结果:在浏览器中查看会发现输出目录中会多出part-r-00000文件,这个就是我们统计的结果,也可以用命令hdfs dfs -cat /library/hadoop/wordcount_output1/part-r-00000查看。
注意,如果输出目录已经存在则运行会报错,可以更改一个目录或者删除该目录再次运行即可;
12.下面将WordCount打包成jar文件在集群中运行。
右键HadoopApps选择Export,再弹出框中选择jar文件,点击Next,然后选择jar文件输出的目录,我输出的位置是/usr/local/tools 文件名称为HadoopApps.jar,然后一路Next即可。
注意:在最后一步不指定主函数,而是在运行jar包时进行指定,因为这样做有助于测试其他项目。
13.运行jar文件
打开浏览器,点击part-r-00000下载并查看:
两次运行结果进行比较,结果是一样的。
1.jdk安装hadoop集群的搭建,不再详述,参考http://kevin12.iteye.com/blog/2273556;
下面运行下hadoop自带的wordcount例子:
2.先将hadoop-2.6.0目录下面的README.txt和LICENSE.txt文件put到集群的/library/hadoop/data目录下面,如果目录不存在先创建;
root@master1:/usr/local/hadoop/hadoop-2.6.0# hdfs dfs -mkdir /library/hadoop/data root@master1:/usr/local/hadoop/hadoop-2.6.0# hdfs dfs -put ./LICENSE.txt /library/hadoop/data root@master1:/usr/local/hadoop/hadoop-2.6.0# hdfs dfs -put ./README.txt /library/hadoop/data root@master1:/usr/local/hadoop/hadoop-2.6.0/share/hadoop/mapreduce# hadoop jar hadoop-mapreduce-examples-2.6.0.jar wordcount /library/hadoop/data /library/hadoop/wordcount_output1 16/02/12 12:49:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/02/12 12:49:07 INFO client.RMProxy: Connecting to ResourceManager at master1/192.168.112.130:8032 16/02/12 12:49:08 INFO input.FileInputFormat: Total input paths to process : 2 16/02/12 12:49:08 INFO mapreduce.JobSubmitter: number of splits:2 16/02/12 12:49:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1455236431298_0002 16/02/12 12:49:09 INFO impl.YarnClientImpl: Submitted application application_1455236431298_0002 16/02/12 12:49:09 INFO mapreduce.Job: The url to track the job: http://master1:8088/proxy/application_1455236431298_0002/ 16/02/12 12:49:09 INFO mapreduce.Job: Running job: job_1455236431298_0002 16/02/12 12:49:20 INFO mapreduce.Job: Job job_1455236431298_0002 running in uber mode : false 16/02/12 12:49:20 INFO mapreduce.Job: map 0% reduce 0% 16/02/12 12:49:30 INFO mapreduce.Job: map 100% reduce 0% 16/02/12 12:49:42 INFO mapreduce.Job: map 100% reduce 100% 16/02/12 12:49:42 INFO mapreduce.Job: Job job_1455236431298_0002 completed successfully 16/02/12 12:49:43 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=12822 FILE: Number of bytes written=342551 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=17026 HDFS: Number of bytes written=8943 HDFS: Number of read operations=9 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=17371 Total time spent by all reduces in occupied slots (ms)=7667 Total time spent by all map tasks (ms)=17371 Total time spent by all reduce tasks (ms)=7667 Total vcore-seconds taken by all map tasks=17371 Total vcore-seconds taken by all reduce tasks=7667 Total megabyte-seconds taken by all map tasks=17787904 Total megabyte-seconds taken by all reduce tasks=7851008 Map-Reduce Framework Map input records=320 Map output records=2336 Map output bytes=24790 Map output materialized bytes=12828 Input split bytes=231 Combine input records=2336 Combine output records=886 Reduce input groups=838 Reduce shuffle bytes=12828 Reduce input records=886 Reduce output records=838 Spilled Records=1772 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=526 CPU time spent (ms)=2190 Physical memory (bytes) snapshot=476704768 Virtual memory (bytes) snapshot=5660733440 Total committed heap usage (bytes)=260173824 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=16795 File Output Format Counters Bytes Written=8943 #查看结果,这里只截取一部分内容: root@master1:/usr/local/hadoop/hadoop-2.6.0/share/hadoop/mapreduce# hdfs dfs -cat /library/hadoop/wordcount_output1/part-r-00000 16/02/12 12:51:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable "AS 4 "Contribution" 1 "Contributor" 1 "Derivative 1 "Legal 1 "License" 1 "License"); 1 "Licensor" 1 "NOTICE" 1 "Not 1 "Object" 1 "Source" 1 "Work" 1 "You" 1 "Your") 1 "[]" 1 "control" 1 "printed 1 "submitted" 1
也可以通过浏览器查看文件内容,将/library/hadoop/wordcount_output1目录下面的part-r-00000文件下载打开查看即可:
3.下载eclipse liunx版64位的,直接去eclipse官网下载,不再详述;
4.将下载好的eclipse解压到/usr/local/目录下面,进入eclipse目录,执行eclipse,打开eclipse,工作空间选择默认(/root/workspace)即可。
5.本地开发hadoop程序调试运行需要hadoop-eclipse-plugin-2.6.0.jar,将该jar包拷贝到/usr/local/eclipse目录下面的plugins目录下面。在eclipse中选择File->restart重启eclipse,重启打开后选择Window->Show View->Other选中MapReduceTools下面的Map/Reduce Locations并将点击OK。
6.创建Hadoop Location,在下面的Map/Reduce Locations中新建一个Hadoop location,配置好Host和Port后保存,选择Java EE的浏览方式,在Project Explorer下面就可以看到DFS Loations,并且显示了集群的根目录文件信息。
7.指定hadoop的安装目录,选择Window->Preferences->Data Management选中Hadoop Map/Reduce在右面的页签中点击“Browse..."按钮,选择hadoop的安装目录,然后点击Apply,并保存退出。
8.创建项目,点击File->New->Other,在弹出框中选择Map/Reduce Project.点击Next,填写项目名称,点击Finish,在弹出框中选择ok,eclipse自动会将hadoop安装目录下的架包引用进来,这样我们就可以开发MapReduce程序了。
9.关联源码.
首先将hadoop-2.6.0-src.tar.gz拷贝到虚拟机中并解压到/usr/local/hadoop目录下,并用tar命令解压缩;
再按Shift+Ctrl+T组合键,在弹出框中输入NameNode选择Hadoop的NamedNodeMap,在class页签中点击Attach Source,在弹出的对话框中选择External location,并点击Extenal Folder按钮选择刚才解压缩的源码hadoop-2.6.0-src,点击OK就可以关联上源码了。
10.运行WordCount例子
解压缩源码,在hadoop-2.6.0-src\hadoop-mapreduce-project\hadoop-mapreduce-examples\src\main\java\org\apache\hadoop\examples中找到WordCount.java文件,拷贝到HadoopApps的com.imf.hadoop包中(如果包不存在先创建);
WordCount源码如下:
/** * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package com.imf.hadoop; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length < 2) { System.err.println("Usage: wordcount <in> [<in>...] <out>"); System.exit(2); } Job job = new Job(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); for (int i = 0; i < otherArgs.length - 1; ++i) { FileInputFormat.addInputPath(job, new Path(otherArgs[i])); } FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
然后选择WordCount右键->Run As->Run Configruations,在Java Application中选择WordCount在右面的页签中配置Arguments的Program arguments参数如下:hdfs://master1:9000/library/hadoop/data hdfs://master1:9000/library/hadoop/wordcount_output1,点击Apply,并点击Run按钮运行。
11.查看结果:在浏览器中查看会发现输出目录中会多出part-r-00000文件,这个就是我们统计的结果,也可以用命令hdfs dfs -cat /library/hadoop/wordcount_output1/part-r-00000查看。
注意,如果输出目录已经存在则运行会报错,可以更改一个目录或者删除该目录再次运行即可;
12.下面将WordCount打包成jar文件在集群中运行。
右键HadoopApps选择Export,再弹出框中选择jar文件,点击Next,然后选择jar文件输出的目录,我输出的位置是/usr/local/tools 文件名称为HadoopApps.jar,然后一路Next即可。
注意:在最后一步不指定主函数,而是在运行jar包时进行指定,因为这样做有助于测试其他项目。
13.运行jar文件
root@master1:/usr/local/tools# hadoop jar HadoopApps.jar com.imf.hadoop.WordCount hdfs://master1:9000/library/hadoop/data/ hdfs://master1:9000/library/hadoop/wordcount_output2 16/02/13 14:36:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/02/13 14:36:59 INFO client.RMProxy: Connecting to ResourceManager at master1/192.168.112.130:8032 16/02/13 14:37:00 INFO input.FileInputFormat: Total input paths to process : 2 16/02/13 14:37:00 INFO mapreduce.JobSubmitter: number of splits:2 16/02/13 14:37:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1455315784285_0004 16/02/13 14:37:01 INFO impl.YarnClientImpl: Submitted application application_1455315784285_0004 16/02/13 14:37:01 INFO mapreduce.Job: The url to track the job: http://master1:8088/proxy/application_1455315784285_0004/ 16/02/13 14:37:01 INFO mapreduce.Job: Running job: job_1455315784285_0004 16/02/13 14:37:08 INFO mapreduce.Job: Job job_1455315784285_0004 running in uber mode : false 16/02/13 14:37:08 INFO mapreduce.Job: map 0% reduce 0% 16/02/13 14:37:19 INFO mapreduce.Job: map 100% reduce 0% 16/02/13 14:37:26 INFO mapreduce.Job: map 100% reduce 100% 16/02/13 14:37:27 INFO mapreduce.Job: Job job_1455315784285_0004 completed successfully 16/02/13 14:37:28 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=12822 FILE: Number of bytes written=342443 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=17026 HDFS: Number of bytes written=8943 HDFS: Number of read operations=9 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=17479 Total time spent by all reduces in occupied slots (ms)=4161 Total time spent by all map tasks (ms)=17479 Total time spent by all reduce tasks (ms)=4161 Total vcore-seconds taken by all map tasks=17479 Total vcore-seconds taken by all reduce tasks=4161 Total megabyte-seconds taken by all map tasks=17898496 Total megabyte-seconds taken by all reduce tasks=4260864 Map-Reduce Framework Map input records=320 Map output records=2336 Map output bytes=24790 Map output materialized bytes=12828 Input split bytes=231 Combine input records=2336 Combine output records=886 Reduce input groups=838 Reduce shuffle bytes=12828 Reduce input records=886 Reduce output records=838 Spilled Records=1772 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=387 CPU time spent (ms)=2030 Physical memory (bytes) snapshot=483635200 Virtual memory (bytes) snapshot=5660704768 Total committed heap usage (bytes)=259067904 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=16795 File Output Format Counters Bytes Written=8943
打开浏览器,点击part-r-00000下载并查看:
两次运行结果进行比较,结果是一样的。
发表评论
-
Spark Streaming 统计单词的例子
2016-06-19 12:29 3681测试Spark Streaming 统计单词的例子 1.准备 ... -
Spark SQL操作Hive数据库
2016-04-13 22:37 17601本次例子通过scala编程实现Spark SQL操作Hive数 ... -
Spark SQL on hive配置和实战
2016-03-26 18:40 5557spark sql 官网:http://spark ... -
hadoop2.6.0集群的搭建方法
2016-01-23 22:37 33711.集群环境的安装 1.1工具软件版本说明(软件尽量去官网下载 ... -
Hadoop Shuffle(洗牌)过程
2014-03-25 14:26 1037博客来源:http://www.wnt.c ... -
hadoop2.2运行wordcount例子
2014-03-10 11:46 2614转载请注明出处:http://kevin12.iteye.co ...
相关推荐
hadoop环境搭建和eclipse开发
HadoopEclipse插件是专为Eclipse开发环境设计的插件,它将Hadoop类库集成到Eclipse中,使得开发者可以在一个图形化的界面中编写、调试和运行Hadoop程序。该插件的主要特点包括: - **简化开发流程**:通过自动化...
### Eclipse安装Hadoop插件详解 #### 一、前言 随着大数据技术的快速发展,Hadoop作为处理海量数据的重要工具之一,其应用越来越广泛。...希望本指南能够帮助您顺利完成Eclipse上的Hadoop开发环境搭建工作。
涉及到了Hadoop2.0、Hbase、Sqoop、Flume、Hive、Zookeeper的具体环境搭建
在大数据编程领域,Eclipse作为一个强大的集成开发环境(IDE)被广泛用于开发Hadoop相关的项目。Hadoop是一个开源框架,主要用于处理和存储大规模数据,而Eclipse通过安装特定的插件可以提供对Hadoop开发的良好支持...
目的很简单,为进行研究与学习,部署一个hadoop运行环境,并搭建一个hadoop开发与测试环境。 具体目标是: 1.在ubuntu系统上部署hadoop 2.在windows 上能够使用eclipse连接ubuntu系统上部署的hadoop进行开发与测试 3...
在Windows 7操作系统中,使用Eclipse开发Hadoop应用程序的过程涉及多个步骤,涵盖了从环境配置到实际编程的各个层面。以下是对这个主题的详细讲解: 首先,我们需要了解Hadoop和Eclipse的基础。Hadoop是一个开源的...
接下来,"Cygwin+Eclipse搭建Hadoop开发环境"文档将指导你如何配置Eclipse IDE,使其能够与Cygwin集成,用于Hadoop项目开发。Eclipse是Java开发者常用的一款强大IDE,它提供了丰富的插件支持,包括Hadoop开发插件,...
Eclipse搭建Hadoop3.1.0开发环境需要的插件,下载直接放在eclipse\plugins\目录下,重启Eclipse即可。
在Windows7 x64 + Eclipse + Hadoop 2.5.2搭建MapReduce开发环境,下载的文件中包括下载的文件包括:hadoop 2.5.2.tar.gz,hadoop-common-2.2.0-bin-master.zip,hadoop-eclipse-plugin-2.5.2.jar。应用这些软件的...
在本文中,我们将深入探讨如何使用Cygwin和Eclipse搭建Hadoop的单机开发环境。Cygwin是一个为Windows操作系统提供Linux-like环境的开源工具,而Eclipse是一款流行的集成开发环境(IDE),广泛用于Java应用程序的开发...
在本教程中,我们将深入探讨如何使用Cygwin和Eclipse搭建Hadoop的单机开发环境,这将有助于你理解Hadoop的基础知识以及如何在Windows操作系统上进行开发和测试。Cygwin是一个在Windows上模拟Linux环境的工具,它允许...
在介绍Hadoop-Eclipse开发环境配置之前,我们首先要了解Hadoop和Eclipse的基本概念。Hadoop是一个由Apache基金会开发的开源框架,能够支持在普通硬件上运行的分布式应用。它旨在从单一服务器扩展到数千台机器上,...
在本教程中,我们将深入探讨如何使用Cygwin和Eclipse搭建Hadoop的单机开发环境,以便在Windows操作系统上进行高效的数据处理和分析。Cygwin是一个提供Linux-like环境的开源工具集,使得Windows用户可以运行原本为...
Hadoop搭建与Eclipse开发环境设置 TITLE:Hadoop搭建与Eclipse开发环境设置 DESCRIPTION:本文将指导读者如何在Ubuntu系统上搭建Hadoop环境,并在Windows上使用Eclipse连接Ubuntu系统上的Hadoop环境进行开发和...
本文主要介绍了如何在Eclipse环境下搭建Hadoop开发环境,并通过一个典型的WordCount程序演示整个流程。 #### 一、准备工作:配置Hadoop开发环境 1. **下载并安装Eclipse** 首先需要确保已经安装了Eclipse IDE,...
"windows 下 eclipse 开发 hdfs程序.docx"是一份详细的文档,指导如何在Windows环境下使用Eclipse开发HDFS程序。遵循文档中的步骤,你可以配置Eclipse的Java项目,设置Hadoop的本地路径,以及创建和运行MapReduce...
综上所述,这个压缩包"eclipse-hadoop3x-master.zip"是一个关于在Eclipse中开发Hadoop 3.x应用的资源集合。用户可以导入此项目到Eclipse中,然后利用其中的配置和源代码开始编写、测试和部署Hadoop应用程序。它涵盖...