如何在Windows下的eclipse调试Hadoop2.2.0分布式集群

浏览 3428 次

锁定老帖子主题：如何在Windows下的eclipse调试Hadoop2.2.0分布式集群精华帖 (0) :: 良好帖 (0) :: 新手帖 (0) :: 隐藏帖 (0)
作者	正文
qindongliang1922 等级: 性别: 文章: 170 积分: 840 来自: 北京	发表时间：2014-06-11 相关推荐: Windows环境下采用eclipse连接虚拟机中的Hadoop伪分布式集群 windows10下 eclipse连接虚拟机中的Hadoop伪分布式集群 Eclipse配置hadoop2.2.0环境使用Eclipse连接Hadoop集群的实践笔记如何在win7下的eclipse中以local模式调试Hadoop2.2.0的程序更多相关推荐 Windows Eclipse Hadoop mapreducem 上篇文章，散仙已经在eclipse中通过local的模式可以正确的调试hadoop2.2，那么本篇，散仙将重点叙述下，如何在eclipse中，真真正正的提交作业到yarn上，开启分布式模式的调试，通过在eclipse上调试，hadoop的MapReduce程序，可以使我们学习Hadoop更加容易，清晰。如果没有看过，散仙的如何在eclipse中使用local模式调试hadoop的文章，可以先看下上篇，熟悉下基本的问题的解决。下面进入正题，由于散仙在上篇中，已经使用eclipse成功的使用了local模式的调试，所以本次改成分布式模式的调试，也不算太困难。使用eclipse作为客户端像yarn集群上提交作业，需要将整个项目打包成一个jar，散仙在这里使用的是一个ant脚本，文章最后，散仙会附上来，直接遇到的最大的一个问题如下异常： <pre name="code" class="java">2014-06-11 17:32:19,761 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1401177251807_0034_01_000001 and exit code: 1 org.apache.hadoop.util.Shell$ExitCodeException: /bin/bash: line 0: fg: no job control at org.apache.hadoop.util.Shell.runCommand(Shell.java:505) at org.apache.hadoop.util.Shell.run(Shell.java:418) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) </pre> 这个问题，在网上已经得到解决，需要下2个patch包，进行打补丁，比较繁琐，散仙，在参考了这位兄弟的文章后，http://blog.csdn.net/fansy1990/article/details/27526167 感觉使用方法解决，比较简洁方便。引起上述异常的主要原因就是，Linux和Windows的环境变量符号不一致导致的问题win上是%而linux上是$所以直接导致了上述原因，当然这个问题再linux上的eclipse是不存在，只有在win上的eclipse中，才会出现，所以我们要做的就是，改变org.apache.hadoop.mapred.YARNRunner类里面的一些方法，来消除此异常。具体步骤，改写YARNRunner源码中的一些方法（YARNRunner.java源码类在hadoop-mapreduce-client-jobclient的maven项目中的org.apache.hadoop.mapred包下）需要在src下建同样的包名，类名，覆盖原来jar包里面自带的类。 YarnRunner.java的390行（Apache Hadoop2.2的源码） <pre name="code" class="java">// Setup the command to run the AM List&lt;String&gt; vargs = new ArrayList&lt;String&gt;(8); vargs.add(Environment.JAVA_HOME.$() + "/bin/java"); </pre> 改为 <pre name="code" class="java">vargs.add("$JAVA_HOME/bin/java"); </pre> 在YarnRunner.java类中，新增一个路径转换的方法 <pre name="code" class="java">private void replaceEnvironment(Map&lt;String, String&gt; environment) { String tmpClassPath = environment.get("CLASSPATH"); tmpClassPath=tmpClassPath.replaceAll(";", ":"); tmpClassPath=tmpClassPath.replaceAll("%PWD%", "\\$PWD"); tmpClassPath=tmpClassPath.replaceAll("%HADOOP_MAPRED_HOME%", "\\$HADOOP_MAPRED_HOME"); tmpClassPath= tmpClassPath.replaceAll("\\\\", "/" ); environment.put("CLASSPATH",tmpClassPath); } </pre> 在YarnRunner.java的在466行添加： <pre name="code" class="java">replaceEnvironment(environment); </pre> 通过，这样设置后，原来的异常就得到解决了，散仙在这里分布式测试的例子依旧是hellow world，源码如下： <pre name="code" class="java">package com.qin.wordcount; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.YARNRunner; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; /*** * * Hadoop2.2.0完全分布式测试 * 放WordCount的例子 * * @author qindongliang * * hadoop技术交流群： 376932160 * * * / public class MyWordCount { /* * Mapper * * / private static class WMapper extends Mapper&lt;LongWritable, Text, Text, IntWritable&gt;{ private IntWritable count=new IntWritable(1); private Text text=new Text(); @Override protected void map(LongWritable key, Text value,Context context) throws IOException, InterruptedException { String values[]=value.toString().split("#"); //System.out.println(values[0]+"========"+values[1]); count.set(Integer.parseInt(values[1])); text.set(values[0]); context.write(text,count); } } / * Reducer * * / private static class WReducer extends Reducer&lt;Text, IntWritable, Text, Text&gt;{ private Text t=new Text(); @Override protected void reduce(Text key, Iterable&lt;IntWritable&gt; value,Context context) throws IOException, InterruptedException { int count=0; for(IntWritable i:value){ count+=i.get(); } t.set(count+""); context.write(key,t); } } / * 改动一 * (1)shell源码里添加checkHadoopHome的路径 * (2)974行，FileUtils里面 * / public static void main(String[] args) throws Exception{ Configuration conf=new Configuration(); conf.set("mapreduce.job.jar", "myjob.jar"); conf.set("fs.defaultFS","hdfs://192.168.46.28:9000"); conf.set("mapreduce.framework.name", "yarn"); conf.set("yarn.resourcemanager.address", "192.168.46.28:8032"); /Job任务**/ //Job job=new Job(conf, "testwordcount");//废弃此API Job job=Job.getInstance(conf, "new api"); job.setJarByClass(MyWordCount.class); System.out.println("模式： "+conf.get("mapreduce.jobtracker.address"));; // job.setCombinerClass(PCombine.class); // job.setNumReduceTasks(3);//设置为3 job.setMapperClass(WMapper.class); job.setReducerClass(WReducer.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); String path="hdfs://192.168.46.28:9000/qin/output"; FileSystem fs=FileSystem.get(conf); Path p=new Path(path); if(fs.exists(p)){ fs.delete(p, true); System.out.println("输出路径存在，已删除！"); } FileInputFormat.setInputPaths(job, "hdfs://192.168.46.28:9000/qin/input"); FileOutputFormat.setOutputPath(job,p ); System.exit(job.waitForCompletion(true) ? 0 : 1); } } </pre> 在运行的时候，需要注意把，hadoop集群上的配置文件core-site.xml,hdfs-site.xml,mapred-site.xml,yarn-site.xml文件拷贝到src的根目录下，最好也放一个log4j.xml方便查看日志。并在mapred-site.xml里面，添加如下属性： <pre name="code" class="xml"> &lt;name&gt;mapred.remote.os&lt;/name&gt; &lt;value&gt;Linux&lt;/value&gt; &lt;description&gt;RemoteMapReduce framework's OS, can be either Linux orWindows&lt;/description&gt; &lt;/property&gt;</pre> 然后，把项目打成jar包，运行提交作业，散仙的控制台打印内容如下： <pre name="code" class="java">模式： hp1:8021 输出路径存在，已删除！ INFO - RMProxy.createRMProxy(56) \| Connecting to ResourceManager at /192.168.46.28:8032 WARN - JobSubmitter.copyAndConfigureFiles(149) \| Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. INFO - FileInputFormat.listStatus(287) \| Total input paths to process : 1 INFO - JobSubmitter.submitJobInternal(394) \| number of splits:1 INFO - Configuration.warnOnceIfDeprecated(840) \| user.name is deprecated. Instead, use mapreduce.job.user.name INFO - Configuration.warnOnceIfDeprecated(840) \| mapred.jar is deprecated. Instead, use mapreduce.job.jar INFO - Configuration.warnOnceIfDeprecated(840) \| fs.default.name is deprecated. Instead, use fs.defaultFS INFO - Configuration.warnOnceIfDeprecated(840) \| mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class INFO - Configuration.warnOnceIfDeprecated(840) \| mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class INFO - Configuration.warnOnceIfDeprecated(840) \| mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class INFO - Configuration.warnOnceIfDeprecated(840) \| mapred.job.name is deprecated. Instead, use mapreduce.job.name INFO - Configuration.warnOnceIfDeprecated(840) \| mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class INFO - Configuration.warnOnceIfDeprecated(840) \| mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class INFO - Configuration.warnOnceIfDeprecated(840) \| mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir INFO - Configuration.warnOnceIfDeprecated(840) \| mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir INFO - Configuration.warnOnceIfDeprecated(840) \| mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class INFO - Configuration.warnOnceIfDeprecated(840) \| mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps INFO - Configuration.warnOnceIfDeprecated(840) \| mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class INFO - Configuration.warnOnceIfDeprecated(840) \| mapred.mapoutput.key.class is deprecated. Instead, use mapreduce.map.output.key.class INFO - Configuration.warnOnceIfDeprecated(840) \| mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir INFO - JobSubmitter.printTokens(477) \| Submitting tokens for job: job_1402492118962_0004 INFO - YarnClientImpl.submitApplication(174) \| Submitted application application_1402492118962_0004 to ResourceManager at /192.168.46.28:8032 INFO - Job.submit(1272) \| The url to track the job: http://hp1:8088/proxy/application_1402492118962_0004/ INFO - Job.monitorAndPrintJob(1317) \| Running job: job_1402492118962_0004 INFO - Job.monitorAndPrintJob(1338) \| Job job_1402492118962_0004 running in uber mode : false INFO - Job.monitorAndPrintJob(1345) \| map 0% reduce 0% INFO - Job.monitorAndPrintJob(1345) \| map 100% reduce 0% INFO - Job.monitorAndPrintJob(1345) \| map 100% reduce 100% INFO - Job.monitorAndPrintJob(1356) \| Job job_1402492118962_0004 completed successfully INFO - Job.monitorAndPrintJob(1363) \| Counters: 43 File System Counters FILE: Number of bytes read=58 FILE: Number of bytes written=159667 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=147 HDFS: Number of bytes written=27 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=6155 Total time spent by all reduces in occupied slots (ms)=4929 Map-Reduce Framework Map input records=4 Map output records=4 Map output bytes=44 Map output materialized bytes=58 Input split bytes=109 Combine input records=0 Combine output records=0 Reduce input groups=3 Reduce shuffle bytes=58 Reduce input records=4 Reduce output records=3 Spilled Records=8 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=99 CPU time spent (ms)=1060 Physical memory (bytes) snapshot=309071872 Virtual memory (bytes) snapshot=1680531456 Total committed heap usage (bytes)=136450048 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=38 File Output Format Counters Bytes Written=27 </pre> 作业在8088界面上显示情况如下： wordcount的执行结果，也正确，至此，我们的eclipse调试hadoop2.2分布式集群，已经成功了，大家可以去试一试了。大小: 489.2 KB build.zip (924 Bytes) 下载次数: 19 查看图片附件声明：ITeye文章版权属于作者，受法律保护。没有作者书面许可不得转载。推荐链接
返回顶楼

论坛首页 → 入门技术版

跳转论坛: