wget http://mirrors.cnnic.cn/apache/pig/pig-0.13.0/pig-0.13.0-src.tar.gz tar -xvf pig-0.13.0-src.tar.gz hadoop@hadoop:~$ vim .bashrc export ANT_OPTS="-Dhttp.proxyHost=proxy.com -Dhttp.proxyPort=8080" hadoop@hadoop:~$ sudo vim /etc/profile #pig export PIG_HOME=/usr/local/pig export PATH=$PIG_HOME/bin/:$PATH 配置hadoop安装目录 cd pig-0.13.0 ant clean jar-withouthadoop -Dhadoopversion=23 Buildfile: /opt/hn/hadoop_family/pig/pig-0.13.0-src/build.xml clean: clean: clean: ivy-download: [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.2.0/ivy-2.2.0.jar ...... copyHadoop1: copyHadoop2: [copy] Copying 1 file to /opt/hn/hadoop_family/pig/pig-0.13.0-src BUILD SUCCESSFUL Total time: 3 minutes 24 seconds Pig两种模式: 1.本地模式. 这种模式下Pig运行在一个JVM里,访问的是本地的文件系统,只适合于小规模数据集,一般是用来体验Pig。 pig -x local 2.Hadoop模式. 这种模式下,Pig才真正的把查询转换为相应的MapReduce Jobs,并提交到Hadoop集群去运行. pig -x mapreduce 常用语句: LOAD : 指出载入数据的方法 FOREACH:逐行扫描进行某种处理 FILTER:过滤行 DUMP:把结果显示到屏幕 STORE:把结果保存到文件 通常书写执行顺序: LOAD ——〉FOREACH——〉STORE $ pig hdfs://nn.mydomain.com:9020/myscripts/script.pig hadoop@hadoopMaster:~$ hdfsCAT hdfs://hadoopMaster:9000/input/sample.txt 1950 0 1 1950 22 1 1950 -11 1 1949 111 1 1949 78 1 records = LOAD 'hdfs://hadoopMaster:9000/input/sample.txt' AS (year:chararray, temperature:int, quality:int); filtered_records = FILTER records BY temperature != 9999 AND (quality == 0 OR quality == 1 OR quality == 4 OR quality == 5 OR quality == 9); grouped_records = GROUP filtered_records BY year; max_temp = FOREACH grouped_records GENERATE group, MAX(filtered_records.temperature); DUMP max_temp; ILLUSTRATE max_temp; .... Output(s): Successfully stored 2 records (26 bytes) in: "hdfs://hadoopMaster:9000/tmp/temp247096677/tmp640867181" Counters: Total records written : 2 Total bytes written : 26 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_1405586574373_0003 2014-07-17 17:32:05,699 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! 2014-07-17 17:32:05,703 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2014-07-17 17:32:05,703 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.job.counters.limit is deprecated. Instead, use mapreduce.job.counters.max 2014-07-17 17:32:05,704 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 2014-07-17 17:32:05,705 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2014-07-17 17:32:05,753 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2014-07-17 17:32:05,753 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 (1949,111) (1950,22) grunt> ILLUSTRATE max_temp; 2014-07-17 17:32:19,155 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2014-07-17 17:32:19,155 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapreduce.job.counters.limit is deprecated. Instead, use mapreduce.job.counters.max 2014-07-17 17:32:19,156 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 2014-07-17 17:32:19,159 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://hadoopMaster:9000 2014-07-17 17:32:19,175 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[LoadTypeCastInserter, StreamTypeCastInserter], RULES_DISABLED=[AddForEach, ColumnMapKeyPrune, FilterLogicExpressionSimplifier, GroupByConstParallelSetter, LimitOptimizer, MergeFilter, MergeForEach, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter]} 2014-07-17 17:32:19,192 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-07-17 17:32:19,195 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2014-07-17 17:32:19,195 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2014-07-17 17:32:19,198 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2014-07-17 17:32:19,198 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-07-17 17:32:19,219 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code. 2014-07-17 17:32:19,219 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche 2014-07-17 17:32:19,219 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize [] 2014-07-17 17:32:19,344 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2014-07-17 17:32:19,351 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10] C: R: 2014-07-17 17:32:19,361 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2014-07-17 17:32:19,361 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2014-07-17 17:32:19,414 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-07-17 17:32:19,444 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2014-07-17 17:32:19,444 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2014-07-17 17:32:19,447 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode. 2014-07-17 17:32:19,449 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2014-07-17 17:32:19,454 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-07-17 17:32:19,456 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2014-07-17 17:32:19,456 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2014-07-17 17:32:19,467 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51 2014-07-17 17:32:19,467 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2014-07-17 17:32:19,643 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:19,698 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:19,769 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:19,796 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:19,819 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-07-17 17:32:19,828 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2014-07-17 17:32:19,828 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2014-07-17 17:32:19,830 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode. 2014-07-17 17:32:19,832 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2014-07-17 17:32:19,837 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-07-17 17:32:19,839 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2014-07-17 17:32:19,839 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2014-07-17 17:32:19,845 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51 2014-07-17 17:32:19,845 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2014-07-17 17:32:19,991 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:20,024 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:20,062 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:20,088 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:20,107 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-07-17 17:32:20,126 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2014-07-17 17:32:20,126 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2014-07-17 17:32:20,128 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode. 2014-07-17 17:32:20,131 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2014-07-17 17:32:20,135 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-07-17 17:32:20,136 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2014-07-17 17:32:20,136 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2014-07-17 17:32:20,144 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51 2014-07-17 17:32:20,144 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2014-07-17 17:32:20,293 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:20,303 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:20,317 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:20,330 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:20,340 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-07-17 17:32:20,347 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2014-07-17 17:32:20,348 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2014-07-17 17:32:20,349 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode. 2014-07-17 17:32:20,351 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2014-07-17 17:32:20,357 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-07-17 17:32:20,359 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2014-07-17 17:32:20,360 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2014-07-17 17:32:20,368 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51 2014-07-17 17:32:20,368 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2014-07-17 17:32:20,465 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:20,492 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:20,504 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:20,511 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:20,515 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-07-17 17:32:20,530 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2014-07-17 17:32:20,533 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2014-07-17 17:32:20,535 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode. 2014-07-17 17:32:20,540 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2014-07-17 17:32:20,543 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-07-17 17:32:20,545 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2014-07-17 17:32:20,547 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2014-07-17 17:32:20,554 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51 2014-07-17 17:32:20,554 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2014-07-17 17:32:20,602 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:20,618 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:20,650 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:20,655 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:20,666 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-07-17 17:32:20,674 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2014-07-17 17:32:20,674 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2014-07-17 17:32:20,676 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode. 2014-07-17 17:32:20,681 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2014-07-17 17:32:20,684 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-07-17 17:32:20,689 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2014-07-17 17:32:20,689 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2014-07-17 17:32:20,694 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51 2014-07-17 17:32:20,694 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2014-07-17 17:32:20,751 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:20,770 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:20,783 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:20,789 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:20,793 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-07-17 17:32:20,799 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2014-07-17 17:32:20,799 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2014-07-17 17:32:20,801 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode. 2014-07-17 17:32:20,805 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2014-07-17 17:32:20,808 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-07-17 17:32:20,809 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2014-07-17 17:32:20,809 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2014-07-17 17:32:20,814 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51 2014-07-17 17:32:20,814 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2014-07-17 17:32:20,843 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:20,850 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:20,862 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:20,868 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:20,872 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-07-17 17:32:20,889 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2014-07-17 17:32:20,889 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2014-07-17 17:32:20,892 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode. 2014-07-17 17:32:20,896 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2014-07-17 17:32:20,901 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-07-17 17:32:20,906 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2014-07-17 17:32:20,906 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2014-07-17 17:32:20,914 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51 2014-07-17 17:32:20,914 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2014-07-17 17:32:21,028 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:21,061 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:21,089 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:21,110 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] (1949,78,1) 2014-07-17 17:32:21,125 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-07-17 17:32:21,131 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2014-07-17 17:32:21,131 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2014-07-17 17:32:21,132 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode. 2014-07-17 17:32:21,133 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2014-07-17 17:32:21,135 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-07-17 17:32:21,136 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2014-07-17 17:32:21,136 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2014-07-17 17:32:21,141 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51 2014-07-17 17:32:21,141 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2014-07-17 17:32:21,172 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:21,179 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:21,189 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:21,196 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:21,201 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-07-17 17:32:21,214 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2014-07-17 17:32:21,214 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2014-07-17 17:32:21,215 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode. 2014-07-17 17:32:21,216 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2014-07-17 17:32:21,218 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-07-17 17:32:21,231 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2014-07-17 17:32:21,233 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2014-07-17 17:32:21,241 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51 2014-07-17 17:32:21,241 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2014-07-17 17:32:21,293 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:21,310 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:21,331 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:21,344 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:21,353 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-07-17 17:32:21,360 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2014-07-17 17:32:21,363 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2014-07-17 17:32:21,364 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode. 2014-07-17 17:32:21,373 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2014-07-17 17:32:21,374 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-07-17 17:32:21,377 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2014-07-17 17:32:21,378 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2014-07-17 17:32:21,385 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51 2014-07-17 17:32:21,385 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2014-07-17 17:32:21,419 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:21,426 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:21,437 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:21,444 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:21,448 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-07-17 17:32:21,451 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2014-07-17 17:32:21,451 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2014-07-17 17:32:21,452 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode. 2014-07-17 17:32:21,453 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2014-07-17 17:32:21,460 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-07-17 17:32:21,461 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2014-07-17 17:32:21,461 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2014-07-17 17:32:21,466 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51 2014-07-17 17:32:21,466 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2014-07-17 17:32:21,499 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:21,511 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:21,556 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:21,565 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:21,572 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-07-17 17:32:21,575 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2014-07-17 17:32:21,575 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2014-07-17 17:32:21,576 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode. 2014-07-17 17:32:21,577 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2014-07-17 17:32:21,581 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-07-17 17:32:21,582 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2014-07-17 17:32:21,582 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2014-07-17 17:32:21,587 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51 2014-07-17 17:32:21,587 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2014-07-17 17:32:21,625 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:21,636 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:21,647 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:21,658 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:21,661 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-07-17 17:32:21,664 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2014-07-17 17:32:21,664 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2014-07-17 17:32:21,664 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode. 2014-07-17 17:32:21,666 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2014-07-17 17:32:21,667 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-07-17 17:32:21,668 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2014-07-17 17:32:21,668 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2014-07-17 17:32:21,673 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51 2014-07-17 17:32:21,674 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2014-07-17 17:32:21,708 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:21,715 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:21,727 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:21,731 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:21,733 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-07-17 17:32:21,743 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2014-07-17 17:32:21,744 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2014-07-17 17:32:21,745 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode. 2014-07-17 17:32:21,751 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2014-07-17 17:32:21,753 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-07-17 17:32:21,753 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2014-07-17 17:32:21,753 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2014-07-17 17:32:21,758 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51 2014-07-17 17:32:21,758 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2014-07-17 17:32:21,799 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:21,806 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:21,816 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:21,827 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:21,836 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-07-17 17:32:21,839 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2014-07-17 17:32:21,839 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2014-07-17 17:32:21,840 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode. 2014-07-17 17:32:21,842 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2014-07-17 17:32:21,843 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-07-17 17:32:21,844 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2014-07-17 17:32:21,844 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2014-07-17 17:32:21,849 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51 2014-07-17 17:32:21,849 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2014-07-17 17:32:21,881 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:21,895 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:21,908 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:21,914 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:21,917 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-07-17 17:32:21,921 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2014-07-17 17:32:21,922 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2014-07-17 17:32:21,923 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode. 2014-07-17 17:32:21,924 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2014-07-17 17:32:21,925 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-07-17 17:32:21,931 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2014-07-17 17:32:21,931 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2014-07-17 17:32:21,937 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51 2014-07-17 17:32:21,937 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2014-07-17 17:32:21,974 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:21,984 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:21,997 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:22,005 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:22,008 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-07-17 17:32:22,010 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2014-07-17 17:32:22,014 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2014-07-17 17:32:22,015 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AccumulatorOptimizer - Reducer is to run in accumulative mode. 2014-07-17 17:32:22,016 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2014-07-17 17:32:22,018 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-07-17 17:32:22,018 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2014-07-17 17:32:22,022 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2014-07-17 17:32:22,027 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=51 2014-07-17 17:32:22,027 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2014-07-17 17:32:22,057 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:22,064 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] 2014-07-17 17:32:22,076 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2014-07-17 17:32:22,080 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce - Aliases being processed per job phase (AliasName[line,offset]): M: records[1,10],records[-1,-1],filtered_records[2,19],grouped_records[3,18] C: R: max_temp[4,11] ---------------------------------------------------------------------------- | records | year:chararray | temperature:int | quality:int | ---------------------------------------------------------------------------- | | 1949 | 78 | 1 | | | 1949 | 111 | 1 | | | 1949 | 9999 | 1 | ---------------------------------------------------------------------------- ------------------------------------------------------------------------------------- | filtered_records | year:chararray | temperature:int | quality:int | ------------------------------------------------------------------------------------- | | 1949 | 78 | 1 | | | 1949 | 111 | 1 | ------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------- | grouped_records | group:chararray | filtered_records:bag{:tuple(year:chararray,temperature:int,quality:int)} | -------------------------------------------------------------------------------------------------------------------------------------------- | | 1949 | {(1949, 78, 1), (1949, 111, 1)} | | | 1949 | {(1949, 78, 1), (1949, 111, 1)} | -------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------------------- | max_temp | group:chararray | :int | ------------------------------------------------- | | 1949 | 111 | ------------------------------------------------- grunt>
相关推荐
然而,在分布式数据库系统如HBase中,由于其底层依赖Hadoop的HDFS文件系统,SQL查询需要通过MapReduce作业来实现。MapReduce是一种并行处理模型,虽然抽象层次较高,但直接编写MapReduce程序仍然较为复杂,尤其对于...
Pig编写的MapReduce程序通过Pig Latin语言编写的脚本来执行,Pig Latin提供了一系列的操作符来处理数据集,最后通过执行引擎将Pig脚本翻译成MapReduce作业运行。 在Hadoop开发培训中,实验部分是一个重要的环节。...
- **YARN(Yet Another Resource Negotiator)**:资源管理系统,负责调度集群中的计算资源,协调MapReduce作业的执行。 3. **Hadoop生态系统**:包括HBase(分布式NoSQL数据库)、Hive(数据仓库工具)、Pig...
- **Pig**:提供了一个高层次的数据流语言(Pig Latin),简化了编写复杂MapReduce作业的过程。 - **Spark**:虽然不是Hadoop项目的一部分,但它经常与Hadoop一起使用,因为它可以在Hadoop集群上高效地运行计算任务...
这本书的中文PDF扫描版为读者提供了便捷的阅读方式,特别是对于中文环境下的学习者来说,无需翻译就能直接理解其中的专业术语和概念。108MB的文件大小表明该版本包含了丰富的内容和高质量的图像,确保了阅读体验。 ...
Hive让不熟悉Java编程的用户也能够通过类SQL语言HiveQL进行数据查询和分析,而Pig则通过Pig Latin语言简化了复杂的数据转换过程。本书中,作者也会对Hive和Pig的使用方法和高级特性进行详细介绍。 本书不会忽略对...
MapReduce的工作流程、编程接口以及如何设计高效的MapReduce作业,都是Hadoop开发者必须掌握的知识点。 5. YARN资源管理:YARN作为Hadoop的资源管理组件,负责集群资源的分配和任务调度。理解YARN如何优化资源利用...
本资源包括英文原版、中文翻译版以及相关代码,中文版虽然为扫描版,但内容完整,方便中文读者学习。 Hadoop是Apache基金会开发的一个开源项目,最初设计用于处理和存储大规模数据集。其核心包括两个主要组件:...
MapReduce部分则解析了作业提交流程、任务调度、容错机制等关键概念。此外,还介绍了Hadoop生态系统的其他组件,如HBase(一个分布式NoSQL数据库)、Hive(数据仓库工具)、Pig(数据分析平台)和YARN(资源管理器)...
mrjob是一个Python库,用于在本地或亚马逊EMR集群上运行Hadoop MapReduce作业。它简化了编写和执行MapReduce任务的过程。 **3.7 Caffeine** Caffeine是一个Java库,专注于提供高性能的缓存解决方案,适用于需要...