Hadoop workshop homework.
Since I am an Intellij Idea guy now (I shifted to Intellij Idea from Eclipse several months ago because Intellij Idea is much much better than Eclipse now). Currently Intellij does't have any Hadoop plugins, so I package the output into a jar file, then copy the jar (cdh4-example.jar) into Hadoop cluster with scp.
[root@n1 hadoop-examples]# scp gsun@192.168.1.102:~/prog/hadoop/cdh4-examples/cdh4-examples.jar . Password: cdh4-examples.jar
First example: WordCount01. Code as below:
package wc; import java.io.IOException; import java.text.DateFormat; import java.text.SimpleDateFormat; import java.util.Date; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class WordCount01 extends Configured implements Tool { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } @Override public int run(String[] args) throws Exception { Configuration conf = getConf(); //conf.set("mapred.job.tracker", "192.168.1.201:9001"); Job job = new Job(conf, "word count"); job.setJarByClass(WordCount01.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); System.out.println("Task name: " + job.getJobName()); System.out.println("Task success? " + (job.isSuccessful() ? "Yes" : "No")); return job.isSuccessful() ? 0 : 1; } public static void main(String[] args) throws Exception { if (args.length != 2) { System.err.println("Usage: wordcount <in> <out>"); System.exit(2); } DateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); Date start = new Date(); int res = ToolRunner.run(new Configuration(), new WordCount01(), args); Date end = new Date(); float time = (float) ((end.getTime() - start.getTime()) / 60000.0); System.out.println("Task start: " + formatter.format(start)); System.out.println("Task end: " + formatter.format(end)); System.out.println("Time elapsed: " + String.valueOf(time) + " minutes."); System.exit(res); } }
Output:
[root@n1 hadoop-examples]# hadoop jar cdh4-examples.jar wc.WordCount01 Shakespeare.txt output 13/07/12 23:02:56 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/07/12 23:02:57 INFO input.FileInputFormat: Total input paths to process : 1 13/07/12 23:02:59 INFO mapred.JobClient: Running job: job_201307122107_0006 13/07/12 23:03:00 INFO mapred.JobClient: map 0% reduce 0% 13/07/12 23:03:32 INFO mapred.JobClient: map 26% reduce 0% 13/07/12 23:03:36 INFO mapred.JobClient: map 100% reduce 0% 13/07/12 23:03:49 INFO mapred.JobClient: map 100% reduce 100% 13/07/12 23:03:56 INFO mapred.JobClient: Job complete: job_201307122107_0006 13/07/12 23:03:56 INFO mapred.JobClient: Counters: 32 13/07/12 23:03:56 INFO mapred.JobClient: File System Counters 13/07/12 23:03:56 INFO mapred.JobClient: FILE: Number of bytes read=2151353 13/07/12 23:03:56 INFO mapred.JobClient: FILE: Number of bytes written=2933308 13/07/12 23:03:56 INFO mapred.JobClient: FILE: Number of read operations=0 13/07/12 23:03:56 INFO mapred.JobClient: FILE: Number of large read operations=0 13/07/12 23:03:56 INFO mapred.JobClient: FILE: Number of write operations=0 13/07/12 23:03:56 INFO mapred.JobClient: HDFS: Number of bytes read=10185958 13/07/12 23:03:56 INFO mapred.JobClient: HDFS: Number of bytes written=707043 13/07/12 23:03:56 INFO mapred.JobClient: HDFS: Number of read operations=2 13/07/12 23:03:56 INFO mapred.JobClient: HDFS: Number of large read operations=0 13/07/12 23:03:56 INFO mapred.JobClient: HDFS: Number of write operations=1 13/07/12 23:03:56 INFO mapred.JobClient: Job Counters 13/07/12 23:03:56 INFO mapred.JobClient: Launched map tasks=1 13/07/12 23:03:56 INFO mapred.JobClient: Launched reduce tasks=1 13/07/12 23:03:56 INFO mapred.JobClient: Data-local map tasks=1 13/07/12 23:03:56 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=32808 13/07/12 23:03:56 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=9469 13/07/12 23:03:56 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/07/12 23:03:56 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/07/12 23:03:56 INFO mapred.JobClient: Map-Reduce Framework 13/07/12 23:03:56 INFO mapred.JobClient: Map input records=417884 13/07/12 23:03:56 INFO mapred.JobClient: Map output records=1612019 13/07/12 23:03:56 INFO mapred.JobClient: Map output bytes=15218645 13/07/12 23:03:56 INFO mapred.JobClient: Input split bytes=117 13/07/12 23:03:56 INFO mapred.JobClient: Combine input records=1852684 13/07/12 23:03:56 INFO mapred.JobClient: Combine output records=306113 13/07/12 23:03:56 INFO mapred.JobClient: Reduce input groups=65448 13/07/12 23:03:56 INFO mapred.JobClient: Reduce shuffle bytes=470288 13/07/12 23:03:56 INFO mapred.JobClient: Reduce input records=65448 13/07/12 23:03:56 INFO mapred.JobClient: Reduce output records=65448 13/07/12 23:03:56 INFO mapred.JobClient: Spilled Records=371561 13/07/12 23:03:56 INFO mapred.JobClient: CPU time spent (ms)=9970 13/07/12 23:03:56 INFO mapred.JobClient: Physical memory (bytes) snapshot=288149504 13/07/12 23:03:56 INFO mapred.JobClient: Virtual memory (bytes) snapshot=3245924352 13/07/12 23:03:56 INFO mapred.JobClient: Total committed heap usage (bytes)=142344192 Task name: word count Tash success? Yes Task start: 2013-07-12 23:02:53 Task end: 2013-07-12 23:03:56 Time elapsed: 1.046 minutes.
Note that in this example, we used a Combiner class, which is the same to the Reducer class(they must be the same). The combiner is used as local aggregator to reduce the data copied from mapper to reducer. We can see the difference between WordCount01 with WordCount02 below, from which the Combiner will be removed.
The second example:
package wc; import java.io.IOException; import java.text.DateFormat; import java.text.SimpleDateFormat; import java.util.Date; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class WordCount02 extends Configured implements Tool { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } @Override public int run(String[] args) throws Exception { Configuration conf = getConf(); //conf.set("mapred.job.tracker", "192.168.1.201:9001"); Job job = new Job(conf, "word count"); job.setJarByClass(WordCount02.class); job.setMapperClass(TokenizerMapper.class); // job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); System.out.println("Task name: " + job.getJobName()); System.out.println("Task success? " + (job.isSuccessful() ? "Yes" : "No")); return job.isSuccessful() ? 0 : 1; } public static void main(String[] args) throws Exception { if (args.length != 2) { System.err.println("Usage: wordcount <in> <out>"); System.exit(2); } DateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); Date start = new Date(); int res = ToolRunner.run(new Configuration(), new WordCount02(), args); Date end = new Date(); float time = (float) ((end.getTime() - start.getTime()) / 60000.0); System.out.println("Task start: " + formatter.format(start)); System.out.println("Task end: " + formatter.format(end)); System.out.println("Time elapsed: " + String.valueOf(time) + " minutes."); System.exit(res); } }
Output:
[root@n1 hadoop-examples]# hadoop jar cdh4-examples.jar wc.WordCount02 Shakespeare.txt output 13/07/12 23:16:20 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/07/12 23:16:20 INFO input.FileInputFormat: Total input paths to process : 1 13/07/12 23:16:24 INFO mapred.JobClient: Running job: job_201307122107_0007 13/07/12 23:16:25 INFO mapred.JobClient: map 0% reduce 0% 13/07/12 23:16:42 INFO mapred.JobClient: map 100% reduce 0% 13/07/12 23:17:03 INFO mapred.JobClient: map 100% reduce 67% 13/07/12 23:17:06 INFO mapred.JobClient: map 100% reduce 100% 13/07/12 23:17:12 INFO mapred.JobClient: Job complete: job_201307122107_0007 13/07/12 23:17:12 INFO mapred.JobClient: Counters: 32 13/07/12 23:17:12 INFO mapred.JobClient: File System Counters 13/07/12 23:17:12 INFO mapred.JobClient: FILE: Number of bytes read=3871328 13/07/12 23:17:12 INFO mapred.JobClient: FILE: Number of bytes written=5564882 13/07/12 23:17:12 INFO mapred.JobClient: FILE: Number of read operations=0 13/07/12 23:17:12 INFO mapred.JobClient: FILE: Number of large read operations=0 13/07/12 23:17:12 INFO mapred.JobClient: FILE: Number of write operations=0 13/07/12 23:17:12 INFO mapred.JobClient: HDFS: Number of bytes read=10185958 13/07/12 23:17:12 INFO mapred.JobClient: HDFS: Number of bytes written=707043 13/07/12 23:17:12 INFO mapred.JobClient: HDFS: Number of read operations=2 13/07/12 23:17:12 INFO mapred.JobClient: HDFS: Number of large read operations=0 13/07/12 23:17:12 INFO mapred.JobClient: HDFS: Number of write operations=1 13/07/12 23:17:12 INFO mapred.JobClient: Job Counters 13/07/12 23:17:12 INFO mapred.JobClient: Launched map tasks=1 13/07/12 23:17:12 INFO mapred.JobClient: Launched reduce tasks=1 13/07/12 23:17:12 INFO mapred.JobClient: Data-local map tasks=1 13/07/12 23:17:12 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=20788 13/07/12 23:17:12 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=20552 13/07/12 23:17:12 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/07/12 23:17:12 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/07/12 23:17:12 INFO mapred.JobClient: Map-Reduce Framework 13/07/12 23:17:12 INFO mapred.JobClient: Map input records=417884 13/07/12 23:17:12 INFO mapred.JobClient: Map output records=1612019 13/07/12 23:17:12 INFO mapred.JobClient: Map output bytes=15218645 13/07/12 23:17:12 INFO mapred.JobClient: Input split bytes=117 13/07/12 23:17:12 INFO mapred.JobClient: Combine input records=0 13/07/12 23:17:12 INFO mapred.JobClient: Combine output records=0 13/07/12 23:17:12 INFO mapred.JobClient: Reduce input groups=65448 13/07/12 23:17:12 INFO mapred.JobClient: Reduce shuffle bytes=1382689 13/07/12 23:17:12 INFO mapred.JobClient: Reduce input records=1612019 13/07/12 23:17:12 INFO mapred.JobClient: Reduce output records=65448 13/07/12 23:17:12 INFO mapred.JobClient: Spilled Records=4836057 13/07/12 23:17:12 INFO mapred.JobClient: CPU time spent (ms)=11730 13/07/12 23:17:12 INFO mapred.JobClient: Physical memory (bytes) snapshot=275365888 13/07/12 23:17:12 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2282356736 13/07/12 23:17:12 INFO mapred.JobClient: Total committed heap usage (bytes)=91426816 Task name: word count Task success? Yes Task start: 2013-07-12 23:16:18 Task end: 2013-07-12 23:17:12 Time elapsed: 0.8969 minutes.
Note the difference between WordCount02 and WordCount01, in WordCount02, the combiner was removed, so the reducer input records increased, as the job output shown:
Reduce input records=1612019
In WordCount01, which was run with combiner:
Reduce input records=65448
The third example:
package wc; import java.io.IOException; import java.text.DateFormat; import java.text.SimpleDateFormat; import java.util.Date; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class WordCount03 extends Configured implements Tool { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } @Override public int run(String[] args) throws Exception { Configuration conf = getConf(); //conf.set("mapred.job.tracker", "192.168.1.201:9001"); Job job = new Job(conf, "word count"); job.setJarByClass(WordCount03.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setNumReduceTasks(2); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); System.out.println("Task name: " + job.getJobName()); System.out.println("Task success? " + (job.isSuccessful() ? "Yes" : "No")); return job.isSuccessful() ? 0 : 1; } public static void main(String[] args) throws Exception { if (args.length != 2) { System.err.println("Usage: wordcount <in> <out>"); System.exit(2); } DateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); Date start = new Date(); int res = ToolRunner.run(new Configuration(), new WordCount03(), args); Date end = new Date(); float time = (float) ((end.getTime() - start.getTime()) / 60000.0); System.out.println("Task start: " + formatter.format(start)); System.out.println("Task end: " + formatter.format(end)); System.out.println("Time elapsed: " + String.valueOf(time) + " minutes."); System.exit(res); } }
Output:
[root@n1 hadoop-examples]# hadoop jar cdh4-examples.jar wc.WordCount03 Shakespeare.txt output 13/07/12 23:18:53 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/07/12 23:18:53 INFO input.FileInputFormat: Total input paths to process : 1 13/07/12 23:18:55 INFO mapred.JobClient: Running job: job_201307122107_0008 13/07/12 23:18:56 INFO mapred.JobClient: map 0% reduce 0% 13/07/12 23:19:16 INFO mapred.JobClient: map 70% reduce 0% 13/07/12 23:19:19 INFO mapred.JobClient: map 100% reduce 0% 13/07/12 23:19:33 INFO mapred.JobClient: map 100% reduce 50% 13/07/12 23:19:34 INFO mapred.JobClient: map 100% reduce 100% 13/07/12 23:19:39 INFO mapred.JobClient: Job complete: job_201307122107_0008 13/07/12 23:19:39 INFO mapred.JobClient: Counters: 32 13/07/12 23:19:39 INFO mapred.JobClient: File System Counters 13/07/12 23:19:39 INFO mapred.JobClient: FILE: Number of bytes read=3042226 13/07/12 23:19:39 INFO mapred.JobClient: FILE: Number of bytes written=3277040 13/07/12 23:19:39 INFO mapred.JobClient: FILE: Number of read operations=0 13/07/12 23:19:39 INFO mapred.JobClient: FILE: Number of large read operations=0 13/07/12 23:19:39 INFO mapred.JobClient: FILE: Number of write operations=0 13/07/12 23:19:39 INFO mapred.JobClient: HDFS: Number of bytes read=10185958 13/07/12 23:19:39 INFO mapred.JobClient: HDFS: Number of bytes written=707043 13/07/12 23:19:39 INFO mapred.JobClient: HDFS: Number of read operations=2 13/07/12 23:19:39 INFO mapred.JobClient: HDFS: Number of large read operations=0 13/07/12 23:19:39 INFO mapred.JobClient: HDFS: Number of write operations=2 13/07/12 23:19:39 INFO mapred.JobClient: Job Counters 13/07/12 23:19:39 INFO mapred.JobClient: Launched map tasks=1 13/07/12 23:19:39 INFO mapred.JobClient: Launched reduce tasks=2 13/07/12 23:19:39 INFO mapred.JobClient: Data-local map tasks=1 13/07/12 23:19:39 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=25064 13/07/12 23:19:39 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=21033 13/07/12 23:19:39 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/07/12 23:19:39 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/07/12 23:19:39 INFO mapred.JobClient: Map-Reduce Framework 13/07/12 23:19:39 INFO mapred.JobClient: Map input records=417884 13/07/12 23:19:39 INFO mapred.JobClient: Map output records=1612019 13/07/12 23:19:39 INFO mapred.JobClient: Map output bytes=15218645 13/07/12 23:19:39 INFO mapred.JobClient: Input split bytes=117 13/07/12 23:19:39 INFO mapred.JobClient: Combine input records=1852684 13/07/12 23:19:39 INFO mapred.JobClient: Combine output records=306113 13/07/12 23:19:39 INFO mapred.JobClient: Reduce input groups=65448 13/07/12 23:19:39 INFO mapred.JobClient: Reduce shuffle bytes=503734 13/07/12 23:19:39 INFO mapred.JobClient: Reduce input records=65448 13/07/12 23:19:39 INFO mapred.JobClient: Reduce output records=65448 13/07/12 23:19:39 INFO mapred.JobClient: Spilled Records=371561 13/07/12 23:19:39 INFO mapred.JobClient: CPU time spent (ms)=13840 13/07/12 23:19:39 INFO mapred.JobClient: Physical memory (bytes) snapshot=395960320 13/07/12 23:19:39 INFO mapred.JobClient: Virtual memory (bytes) snapshot=3918856192 13/07/12 23:19:39 INFO mapred.JobClient: Total committed heap usage (bytes)=162201600 Task name: word count Task success? Yes Task start: 2013-07-12 23:18:50 Task end: 2013-07-12 23:19:39 Time elapsed: 0.80191666 minutes.
In the third example, the number of reduce tasks was changed to 2, so there will be two reduce output file:
-rw-r--r-- 3 root supergroup 353110 2013-07-12 23:19 output/part-r-00000 -rw-r--r-- 3 root supergroup 353933 2013-07-12 23:19 output/part-r-00001
In WordCount01 and WordCount02, there were only one reduce output file generated. The reason is that in my cluster, the property of mapreduce: mapred.tasktracker.reduce.tasks.maximum was set to 1, which means there will be only 1 reduce tasks that a TaskTracker can run simultaneously. But if you specify the number of reduce tasks with org.apache.hadoop.conf.Configuration, the default configuration will be overwritten by the number you specified.
相关推荐
Hadoop 3.x(MapReduce)----【Hadoop 序列化】---- 代码 Hadoop 3.x(MapReduce)----【Hadoop 序列化】---- 代码 Hadoop 3.x(MapReduce)----【Hadoop 序列化】---- 代码 Hadoop 3.x(MapReduce)----【Hadoop ...
- 在Hadoop环境下执行WordCount任务,命令为`hadoop jar /usr/hadoop/hadoop-1.2.1/hadoop-examples-1.2.1.jar wordcount input output`。 #### 七、查看控制台输出及Web界面 1. **控制台输出**: - 查看...
本资源是spark-2.0.0-bin-hadoop2.6.tgz百度网盘资源下载,本资源是spark-2.0.0-bin-hadoop2.6.tgz百度网盘资源下载
flink-shaded-hadoop-3下载
在这个特定的兼容包中,我们可以看到两个文件:flink-shaded-hadoop-3-uber-3.1.1.7.1.1.0-565-9.0.jar(实际的兼容库)和._flink-shaded-hadoop-3-uber-3.1.1.7.1.1.0-565-9.0.jar(可能是Mac OS的元数据文件,通常...
实验2的目的是在Hadoop平台上部署WordCount程序,以此来理解和体验云计算的基础应用。这个实验主要涉及以下几个关键知识点: 1. **Linux系统基础**:实验要求学生具备Linux系统的使用知识,包括基本的命令行操作、...
hadoop-mapreduce-examples-2.7.1.jar
Apache Flink 是一个流行的开源大数据处理框架,而 `flink-shaded-hadoop-2-uber-2.7.5-10.0.jar.zip` 文件是针对 Flink 优化的一个特殊版本的 Hadoop 库。这个压缩包中的 `flink-shaded-hadoop-2-uber-2.7.5-10.0....
# 解压命令 tar -zxvf flink-shaded-hadoop-2-uber-3.0.0-cdh6.2.0-7.0.jar.tar.gz # 介绍 用于CDH部署 Flink所依赖的jar包
这个"spark-3.1.3-bin-without-hadoop.tgz"压缩包是Spark的3.1.3版本,不含Hadoop依赖的二进制发行版。这意味着在部署时,你需要自行配置Hadoop环境,或者在不依赖Hadoop的环境中运行Spark。 Spark的核心特性包括...
Spark-3.0.0-bin-hadoop2.7.tgz 是Spark 3.0.0版本的预编译二进制包,其中包含了针对Hadoop 2.7版本的兼容性构建。这个版本的发布对于数据科学家和大数据工程师来说至关重要,因为它提供了许多性能优化和新功能。 1...
这个压缩包"spark-3.2.0-bin-hadoop3.2.tgz"包含了Spark 3.2.0版本的二进制文件,以及针对Hadoop 3.2的兼容构建。 Spark的核心组件包括:Spark Core、Spark SQL、Spark Streaming、MLlib(机器学习库)和GraphX(图...
hadoop-mapreduce-examples-2.6.5.jar 官方案例源码
spark-2.4.5-bin-hadoop2.7.tgz的安装包,适用ubuntu,Redhat等linux系统,解压即可安装,解压命令:tar -zxvf spark-2.4.5-bin-hadoop2.7.tar.gz -C dst(解压后存放路径)
在这个特定的压缩包"spark-3.1.3-bin-hadoop3.2.tgz"中,我们得到了Spark的3.1.3版本,它已经预编译为与Hadoop 3.2兼容。这个版本的Spark不仅提供了源码,还包含了预编译的二进制文件,使得在Linux环境下快速部署和...
flink-shaded-hadoop-2-uber-2.7.5-10.0.jar
8. **Examples**:压缩包中通常会包含一些示例代码,帮助用户快速理解和使用Spark的各种功能。 安装和配置Spark-2.4.0-bin-hadoop2.7时,你需要设置环境变量,如`SPARK_HOME`,并将Spark和Hadoop的bin目录添加到...
这个版本是针对Scala 2.12编译的,并且与Hadoop 3.2兼容,这意味着它可以充分利用Hadoop生态系统的最新功能。在Linux环境下,Spark可以很好地运行并与其他Hadoop组件集成。 **Spark核心概念** 1. **DAG(有向无环...
在Ubuntu里安装spark,spark-2.1.0-bin-without-hadoop该版本直接下载到本地后解压即可使用。 Apache Spark 是一种用于大数据工作负载的分布式开源处理系统。它使用内存中缓存和优化的查询执行方式,可针对任何规模...
- `examples`:包含一些Spark示例程序,可以帮助初学者了解如何使用Spark。 在使用这个压缩包时,你需要确保你的环境中已经安装了Java,并配置好Hadoop环境。然后,你可以解压这个文件,根据`conf`目录下的配置文件...