0.准备工作 hadoop 服务器
10.156.50.35 yanfabu2-35.base.app.dev.yf zk1 hadoop1 master1 master 10.156.50.36 yanfabu2-36.base.app.dev.yf zk2 hadoop2 master2 10.156.50.37 yanfabu2-37.base.app.dev.yf zk3 hadoop3 slaver1
2.解压pig
tar xf pig-0.17.0.tar.gz mv pig-0.17.0 pig vim ~/.bash_profile export PIG_HOME=/home/zkkafka/pig export PATH=$PATH:$PIG_HOME/bin source ~/.bash_profile scp -r ~/.bash_profile zkkafka@10.156.50.36:/home/zkkafka/
3.配置文件修改
vim pig.properties fs.default.name=hdfs://master #core-site 配置 mapred.job.tracker=master1:10020 #maper-site 配置 jobhistory scp -r ../conf/ zkkafka@10.156.50.36:/home/zkkafka/pig/conf/ scp -r ../conf/ zkkafka@10.156.50.37:/home/zkkafka/pig/conf/
4.pig 版本
pig -version [zkkafka@yanfabu2-35 pig]$ pig -version 19/06/05 19:58:19 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS Apache Pig version 0.17.0 (r1797386) compiled Jun 02 2017, 15:41:58
5.准备数据
vim tel.txt
1363157985066 13726230503 00-FD-07-A4-72-B8:CMCC 120.196.100.82 i02.c.aliimg.com 24 27 2481 24681 200
hdfs dfs -mkdir -p /hdfs/pig/ hdfs dfs -put /home/zkkafka/pig/data/tel.txt /hdfs/pig/ hdfs dfs -lsr /hdfs/pig
[zkkafka@yanfabu2-35 conf]$ hdfs dfs -lsr /hdfs/pig lsr: DEPRECATED: Please use 'ls -R' instead. -rw-r--r-- 2 zkkafka supergroup 2546 2019-06-05 21:03 /hdfs/pig/tel.txt
6.进入pig 命令
[zkkafka@yanfabu2-37 ~]$ pig 19/06/06 16:44:27 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL 19/06/06 16:44:27 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE 19/06/06 16:44:27 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType 2019-06-06 16:44:27,558 [main] INFO org.apache.pig.Main - Apache Pig version 0.17.0 (r1797386) compiled Jun 02 2017, 15:41:58 2019-06-06 16:44:27,558 [main] INFO org.apache.pig.Main - Logging error messages to: /home/zkkafka/pig_1559810667556.log 2019-06-06 16:44:27,605 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/zkkafka/.pigbootup not found SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/zkkafka/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/zkkafka/hbase/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2019-06-06 16:44:28,312 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2019-06-06 16:44:28,312 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master/ 2019-06-06 16:44:28,859 [main] INFO org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-3d2427ca-7fdf-4252-ab78-cfb6ed2be36e 2019-06-06 16:44:28,859 [main] WARN org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false
7.使用pig
7.1导入数据到hive
t_wlan = LOAD '/hdfs/pig/tel.txt' USING PigStorage('\t') AS (t0:long, msisdn:chararray, t2:chararray, t3:chararray, t4:chararray, t5:chararray, t6:long, t7:long, t8:long, t9:long, t10:chararray);
7.2 查询 表 t_wlan
dump t_wlan; grunt> dump t_wlan; 2019-06-06 16:59:05,805 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2019-06-06 16:59:05,840 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2019-06-06 16:59:05,840 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NestedLimitOptimizer, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]} 2019-06-06 16:59:05,847 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2019-06-06 16:59:05,848 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2019-06-06 16:59:05,848 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2019-06-06 16:59:05,880 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2019-06-06 16:59:05,881 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2019-06-06 16:59:05,883 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process 2019-06-06 16:59:06,472 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp-1906860032/tmp-489322267/pig-0.17.0-core-h2.jar 2019-06-06 16:59:06,598 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-1906860032/tmp1532488090/automaton-1.11-8.jar 2019-06-06 16:59:07,094 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-1906860032/tmp731737639/antlr-runtime-3.4.jar 2019-06-06 16:59:07,190 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp-1906860032/tmp-2081706505/joda-time-2.9.3.jar 2019-06-06 16:59:07,192 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2019-06-06 16:59:07,192 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code. 2019-06-06 16:59:07,193 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche 2019-06-06 16:59:07,193 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize [] 2019-06-06 16:59:07,202 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2019-06-06 16:59:07,264 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2019-06-06 16:59:07,286 [JobControl] INFO org.apache.pig.builtin.PigStorage - Using PigTextInputFormat 2019-06-06 16:59:07,289 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2019-06-06 16:59:07,289 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2019-06-06 16:59:07,291 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2019-06-06 16:59:07,487 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1 2019-06-06 16:59:07,590 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1559370613628_0014 2019-06-06 16:59:07,598 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources. 2019-06-06 16:59:07,856 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1559370613628_0014 2019-06-06 16:59:07,862 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://master1:8088/proxy/application_1559370613628_0014/ 2019-06-06 16:59:07,862 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1559370613628_0014 2019-06-06 16:59:07,862 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases t_wlan 2019-06-06 16:59:07,862 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: t_wlan[4,9],t_wlan[-1,-1] C: R: 2019-06-06 16:59:07,872 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2019-06-06 16:59:07,873 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0014] 2019-06-06 16:59:20,161 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete 2019-06-06 16:59:20,161 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0014] 2019-06-06 16:59:23,200 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 16:59:23,409 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 16:59:23,505 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 16:59:23,573 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2019-06-06 16:59:23,574 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.6.5 0.17.0 zkkafka 2019-06-06 16:59:05 2019-06-06 16:59:23 UNKNOWN Success! Job Stats (time in seconds): JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs job_1559370613628_0014 1 0 4 4 4 4 0 0 0 0 t_wlan MAP_ONLY hdfs://master/tmp/temp-1906860032/tmp1645766804, Input(s): Successfully read 1 records (459 bytes) from: "/hdfs/pig/tel.txt" Output(s): Successfully stored 1 records (106 bytes) in: "hdfs://master/tmp/temp-1906860032/tmp1645766804" Counters: Total records written : 1 Total bytes written : 106 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_1559370613628_0014 2019-06-06 16:59:23,582 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 16:59:23,639 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 16:59:23,698 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 16:59:23,753 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning ACCESSING_NON_EXISTENT_FIELD 1 time(s). 2019-06-06 16:59:23,753 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! 2019-06-06 16:59:23,755 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2019-06-06 16:59:23,764 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2019-06-06 16:59:23,764 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 (1363157985066,13726230503,00-FD-07-A4-72-B8:CMCC,120.196.100.82,i02.c.aliimg.com,24,27,2481,24681,200,)
7.2 A 表中抽出数据成B 表
t_wlan_simple = FOREACH t_wlan GENERATE msisdn, t6, t7, t8, t9; dump t_wlan_simple;
grunt> dump t_wlan_simple; 2019-06-06 17:03:42,827 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2019-06-06 17:03:42,869 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2019-06-06 17:03:42,870 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NestedLimitOptimizer, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]} 2019-06-06 17:03:42,884 [main] INFO org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned for t_wlan: $0, $2, $3, $4, $5, $10 2019-06-06 17:03:42,891 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2019-06-06 17:03:42,893 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2019-06-06 17:03:42,893 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2019-06-06 17:03:42,923 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2019-06-06 17:03:42,923 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2019-06-06 17:03:42,924 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process 2019-06-06 17:03:43,081 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp-1906860032/tmp1408006038/pig-0.17.0-core-h2.jar 2019-06-06 17:03:43,178 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-1906860032/tmp1149486211/automaton-1.11-8.jar 2019-06-06 17:03:43,281 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-1906860032/tmp1835019327/antlr-runtime-3.4.jar 2019-06-06 17:03:43,378 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp-1906860032/tmp2065709292/joda-time-2.9.3.jar 2019-06-06 17:03:43,382 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2019-06-06 17:03:43,383 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code. 2019-06-06 17:03:43,383 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche 2019-06-06 17:03:43,383 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize [] 2019-06-06 17:03:43,399 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2019-06-06 17:03:43,481 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2019-06-06 17:03:43,510 [JobControl] INFO org.apache.pig.builtin.PigStorage - Using PigTextInputFormat 2019-06-06 17:03:43,519 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2019-06-06 17:03:43,519 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2019-06-06 17:03:43,522 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2019-06-06 17:03:44,131 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1 2019-06-06 17:03:44,228 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1559370613628_0015 2019-06-06 17:03:44,232 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources. 2019-06-06 17:03:44,471 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1559370613628_0015 2019-06-06 17:03:44,475 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://master1:8088/proxy/application_1559370613628_0015/ 2019-06-06 17:03:44,475 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1559370613628_0015 2019-06-06 17:03:44,475 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases t_wlan,t_wlan_simple 2019-06-06 17:03:44,475 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: t_wlan[4,9],t_wlan_simple[-1,-1] C: R: 2019-06-06 17:03:44,480 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2019-06-06 17:03:44,480 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0015] 2019-06-06 17:03:58,648 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete 2019-06-06 17:03:58,649 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0015] 2019-06-06 17:04:04,679 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 17:04:04,910 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 17:04:04,977 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 17:04:05,043 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2019-06-06 17:04:05,044 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.6.5 0.17.0 zkkafka 2019-06-06 17:03:42 2019-06-06 17:04:05 UNKNOWN Success! Job Stats (time in seconds): JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs job_1559370613628_0015 1 0 4 4 4 4 0 0 0 0 t_wlan,t_wlan_simple MAP_ONLY hdfs://master/tmp/temp-1906860032/tmp1236017200, Input(s): Successfully read 1 records (459 bytes) from: "/hdfs/pig/tel.txt" Output(s): Successfully stored 1 records (29 bytes) in: "hdfs://master/tmp/temp-1906860032/tmp1236017200" Counters: Total records written : 1 Total bytes written : 29 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_1559370613628_0015 2019-06-06 17:04:05,058 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 17:04:05,137 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 17:04:05,223 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 17:04:05,335 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! 2019-06-06 17:04:05,337 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2019-06-06 17:04:05,382 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2019-06-06 17:04:05,382 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 (13726230503,27,2481,24681,200)
7.3 分组数据
t_wlan_simple_group = GROUP t_wlan_simple BY msisdn; dump t_wlan_simple_group;
grunt> dump t_wlan_simple_group; 2019-06-06 17:06:28,589 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY 2019-06-06 17:06:28,640 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2019-06-06 17:06:28,641 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NestedLimitOptimizer, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]} 2019-06-06 17:06:28,646 [main] INFO org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned for t_wlan: $0, $2, $3, $4, $5, $10 2019-06-06 17:06:28,661 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2019-06-06 17:06:28,674 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2019-06-06 17:06:28,674 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2019-06-06 17:06:28,715 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2019-06-06 17:06:28,716 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2019-06-06 17:06:28,717 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2019-06-06 17:06:28,723 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2019-06-06 17:06:28,729 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=102 2019-06-06 17:06:28,729 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2019-06-06 17:06:28,730 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process 2019-06-06 17:06:28,929 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp-1906860032/tmp-412980928/pig-0.17.0-core-h2.jar 2019-06-06 17:06:29,039 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-1906860032/tmp-1182557529/automaton-1.11-8.jar 2019-06-06 17:06:29,543 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-1906860032/tmp-1112811524/antlr-runtime-3.4.jar 2019-06-06 17:06:30,043 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp-1906860032/tmp432932811/joda-time-2.9.3.jar 2019-06-06 17:06:30,046 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2019-06-06 17:06:30,047 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code. 2019-06-06 17:06:30,047 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche 2019-06-06 17:06:30,047 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize [] 2019-06-06 17:06:30,111 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2019-06-06 17:06:30,174 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2019-06-06 17:06:30,189 [JobControl] INFO org.apache.pig.builtin.PigStorage - Using PigTextInputFormat 2019-06-06 17:06:30,191 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2019-06-06 17:06:30,191 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2019-06-06 17:06:30,193 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2019-06-06 17:06:30,391 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1 2019-06-06 17:06:30,488 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1559370613628_0016 2019-06-06 17:06:30,492 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources. 2019-06-06 17:06:30,734 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1559370613628_0016 2019-06-06 17:06:30,738 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://master1:8088/proxy/application_1559370613628_0016/ 2019-06-06 17:06:30,738 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1559370613628_0016 2019-06-06 17:06:30,738 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases t_wlan,t_wlan_simple,t_wlan_simple_group 2019-06-06 17:06:30,738 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: t_wlan[4,9],t_wlan_simple[-1,-1],t_wlan_simple_group[6,22] C: R: 2019-06-06 17:06:30,745 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2019-06-06 17:06:30,745 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0016] 2019-06-06 17:06:44,943 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete 2019-06-06 17:06:44,943 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0016] 2019-06-06 17:06:50,964 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0016] 2019-06-06 17:06:55,983 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 17:06:56,181 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 17:06:56,283 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 17:06:56,335 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2019-06-06 17:06:56,335 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.6.5 0.17.0 zkkafka 2019-06-06 17:06:28 2019-06-06 17:06:56 GROUP_BY Success! Job Stats (time in seconds): JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs job_1559370613628_0016 1 1 4 4 4 4 4 4 4 4 t_wlan,t_wlan_simple,t_wlan_simple_group GROUP_BY hdfs://master/tmp/temp-1906860032/tmp912427234, Input(s): Successfully read 1 records (459 bytes) from: "/hdfs/pig/tel.txt" Output(s): Successfully stored 1 records (46 bytes) in: "hdfs://master/tmp/temp-1906860032/tmp912427234" Counters: Total records written : 1 Total bytes written : 46 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_1559370613628_0016 2019-06-06 17:06:56,345 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 17:06:56,403 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 17:06:56,474 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 17:06:56,554 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! 2019-06-06 17:06:56,556 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2019-06-06 17:06:56,568 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2019-06-06 17:06:56,568 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 (13726230503,{(13726230503,27,2481,24681,200)})
7.4 流量汇总
t_wlan_simple_group_sum = FOREACH t_wlan_simple_group GENERATE group, SUM(t_wlan_simple.t6), SUM(t_wlan_simple.t7), SUM(t_wlan_simple.t8), SUM(t_wlan_simple.t9); dump t_wlan_simple_group_sum;
grunt> dump t_wlan_simple_group_sum 2019-06-06 17:15:39,824 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY 2019-06-06 17:15:39,877 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2019-06-06 17:15:39,878 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NestedLimitOptimizer, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]} 2019-06-06 17:15:39,885 [main] INFO org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned for t_wlan: $0, $2, $3, $4, $5, $10 2019-06-06 17:15:39,904 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2019-06-06 17:15:39,908 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.CombinerOptimizerUtil - Choosing to move algebraic foreach to combiner 2019-06-06 17:15:39,972 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2019-06-06 17:15:39,972 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2019-06-06 17:15:40,000 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2019-06-06 17:15:40,001 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2019-06-06 17:15:40,002 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2019-06-06 17:15:40,002 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2019-06-06 17:15:40,005 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=102 2019-06-06 17:15:40,005 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2019-06-06 17:15:40,005 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process 2019-06-06 17:15:40,602 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp-1906860032/tmp-784677978/pig-0.17.0-core-h2.jar 2019-06-06 17:15:40,699 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-1906860032/tmp-1113714067/automaton-1.11-8.jar 2019-06-06 17:15:40,796 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-1906860032/tmp-1701171835/antlr-runtime-3.4.jar 2019-06-06 17:15:40,910 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp-1906860032/tmp-725132195/joda-time-2.9.3.jar 2019-06-06 17:15:40,914 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2019-06-06 17:15:40,915 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code. 2019-06-06 17:15:40,915 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche 2019-06-06 17:15:40,915 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize [] 2019-06-06 17:15:40,968 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2019-06-06 17:15:41,035 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2019-06-06 17:15:41,055 [JobControl] INFO org.apache.pig.builtin.PigStorage - Using PigTextInputFormat 2019-06-06 17:15:41,057 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2019-06-06 17:15:41,057 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2019-06-06 17:15:41,060 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2019-06-06 17:15:41,282 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1 2019-06-06 17:15:41,432 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1559370613628_0018 2019-06-06 17:15:41,438 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources. 2019-06-06 17:15:41,686 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1559370613628_0018 2019-06-06 17:15:41,691 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://master1:8088/proxy/application_1559370613628_0018/ 2019-06-06 17:15:41,692 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1559370613628_0018 2019-06-06 17:15:41,692 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases t_wlan,t_wlan_simple,t_wlan_simple_group,t_wlan_simple_group_sum 2019-06-06 17:15:41,692 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: t_wlan[4,9],t_wlan_simple[-1,-1],t_wlan_simple_group_sum[7,26],t_wlan_simple_group[6,22] C: t_wlan_simple_group_sum[7,26],t_wlan_simple_group[6,22] R: t_wlan_simple_group_sum[7,26] 2019-06-06 17:15:41,698 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2019-06-06 17:15:41,698 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0018] 2019-06-06 17:15:55,903 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete 2019-06-06 17:15:55,903 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0018] 2019-06-06 17:16:00,962 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0018] 2019-06-06 17:16:06,981 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 17:16:07,185 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 17:16:07,257 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 17:16:07,332 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2019-06-06 17:16:07,333 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.6.5 0.17.0 zkkafka 2019-06-06 17:15:39 2019-06-06 17:16:07 GROUP_BY Success! Job Stats (time in seconds): JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs job_1559370613628_0018 1 1 3 3 3 3 3 3 3 3 t_wlan,t_wlan_simple,t_wlan_simple_group,t_wlan_simple_group_sum GROUP_BY,COMBINER hdfs://master/tmp/temp-1906860032/tmp2100428296, Input(s): Successfully read 1 records (459 bytes) from: "/hdfs/pig/tel.txt" Output(s): Successfully stored 1 records (29 bytes) in: "hdfs://master/tmp/temp-1906860032/tmp2100428296" Counters: Total records written : 1 Total bytes written : 29 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_1559370613628_0018 2019-06-06 17:16:07,343 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 17:16:07,402 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 17:16:07,456 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-06 17:16:07,512 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! 2019-06-06 17:16:07,513 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2019-06-06 17:16:07,529 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2019-06-06 17:16:07,529 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 (13726230503,27,2481,24681,200)
7.5 存储到HDFS中
STORE t_wlan_simple_group_sum INTO '/hdfs/pig/wlan_result';
[zkkafka@yanfabu2-36 ~]$ hdfs dfs -text /hdfs/pig/wlan_result/part-r-00000 13726230503 27 2481 24681 200 [zkkafka@yanfabu2-36 ~]$
7.6 排序
t_wlan_simple_group_sum_group = ORDER t_wlan_simple_group_sum BY group; DUMP t_wlan_simple_group_sum_group;
grunt> DUMP t_wlan_simple_group_sum_group; 2019-06-12 15:35:33,188 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,ORDER_BY 2019-06-12 15:35:33,235 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2019-06-12 15:35:33,236 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NestedLimitOptimizer, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]} 2019-06-12 15:35:33,242 [main] INFO org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned for t_wlan: $0, $2, $3, $4, $5, $10 2019-06-12 15:35:33,255 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2019-06-12 15:35:33,280 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.CombinerOptimizerUtil - Choosing to move algebraic foreach to combiner 2019-06-12 15:35:33,291 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SecondaryKeyOptimizerMR - Using Secondary Key Optimization for MapReduce node scope-283 2019-06-12 15:35:33,292 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3 2019-06-12 15:35:33,292 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3 2019-06-12 15:35:33,328 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2019-06-12 15:35:33,329 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2019-06-12 15:35:33,330 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2019-06-12 15:35:33,330 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2019-06-12 15:35:33,332 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=102 2019-06-12 15:35:33,333 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2019-06-12 15:35:33,333 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process 2019-06-12 15:35:33,510 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp1544583298/tmp-955805369/pig-0.17.0-core-h2.jar 2019-06-12 15:35:33,595 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp1544583298/tmp712002240/automaton-1.11-8.jar 2019-06-12 15:35:34,074 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp1544583298/tmp1938988919/antlr-runtime-3.4.jar 2019-06-12 15:35:34,154 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp1544583298/tmp1704097364/joda-time-2.9.3.jar 2019-06-12 15:35:34,157 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2019-06-12 15:35:34,158 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code. 2019-06-12 15:35:34,158 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche 2019-06-12 15:35:34,158 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize [] 2019-06-12 15:35:34,193 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2019-06-12 15:35:34,256 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2019-06-12 15:35:34,277 [JobControl] INFO org.apache.pig.builtin.PigStorage - Using PigTextInputFormat 2019-06-12 15:35:34,288 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2019-06-12 15:35:34,289 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2019-06-12 15:35:34,291 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2019-06-12 15:35:34,450 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1 2019-06-12 15:35:34,952 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1559370613628_0024 2019-06-12 15:35:34,960 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources. 2019-06-12 15:35:35,211 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1559370613628_0024 2019-06-12 15:35:35,216 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://master1:8088/proxy/application_1559370613628_0024/ 2019-06-12 15:35:35,216 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1559370613628_0024 2019-06-12 15:35:35,216 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases t_wlan,t_wlan_simple,t_wlan_simple_group,t_wlan_simple_group_sum 2019-06-12 15:35:35,216 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: t_wlan[1,9],t_wlan_simple[-1,-1],t_wlan_simple_group_sum[4,26],t_wlan_simple_group[3,22] C: t_wlan_simple_group_sum[4,26],t_wlan_simple_group[3,22] R: t_wlan_simple_group_sum[4,26] 2019-06-12 15:35:35,231 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2019-06-12 15:35:35,231 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0024] 2019-06-12 15:35:47,386 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 16% complete 2019-06-12 15:35:47,386 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0024] 2019-06-12 15:35:54,902 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 33% complete 2019-06-12 15:35:54,902 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0024] 2019-06-12 15:36:00,424 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-12 15:36:00,596 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-12 15:36:00,651 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-12 15:36:00,688 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2019-06-12 15:36:00,688 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2019-06-12 15:36:00,689 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2019-06-12 15:36:00,689 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator 2019-06-12 15:36:00,698 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=29 2019-06-12 15:36:00,699 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2019-06-12 15:36:00,699 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process 2019-06-12 15:36:01,245 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp1544583298/tmp-87045202/pig-0.17.0-core-h2.jar 2019-06-12 15:36:01,308 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp1544583298/tmp568012746/automaton-1.11-8.jar 2019-06-12 15:36:01,405 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp1544583298/tmp780878190/antlr-runtime-3.4.jar 2019-06-12 15:36:01,485 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp1544583298/tmp772462384/joda-time-2.9.3.jar 2019-06-12 15:36:01,487 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2019-06-12 15:36:01,487 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code. 2019-06-12 15:36:01,487 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche 2019-06-12 15:36:01,487 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize [] 2019-06-12 15:36:01,508 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2019-06-12 15:36:01,559 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2019-06-12 15:36:01,582 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2019-06-12 15:36:01,582 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2019-06-12 15:36:01,582 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2019-06-12 15:36:01,749 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1 2019-06-12 15:36:02,233 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1559370613628_0025 2019-06-12 15:36:02,237 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources. 2019-06-12 15:36:02,472 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1559370613628_0025 2019-06-12 15:36:02,476 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://master1:8088/proxy/application_1559370613628_0025/ 2019-06-12 15:36:02,476 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1559370613628_0025 2019-06-12 15:36:02,476 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases t_wlan_simple_group_sum_group 2019-06-12 15:36:02,476 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: t_wlan_simple_group_sum_group[6,32] C: R: 2019-06-12 15:36:16,558 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete 2019-06-12 15:36:16,558 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0025] 2019-06-12 15:36:24,572 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete 2019-06-12 15:36:24,572 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0025] 2019-06-12 15:36:27,589 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-12 15:36:27,756 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-12 15:36:27,814 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-12 15:36:27,850 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2019-06-12 15:36:27,850 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2019-06-12 15:36:27,854 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers. 2019-06-12 15:36:27,854 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1 2019-06-12 15:36:27,854 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process 2019-06-12 15:36:27,995 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp1544583298/tmp-1238945561/pig-0.17.0-core-h2.jar 2019-06-12 15:36:28,103 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp1544583298/tmp1385874378/automaton-1.11-8.jar 2019-06-12 15:36:28,223 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp1544583298/tmp2107107107/antlr-runtime-3.4.jar 2019-06-12 15:36:28,297 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp1544583298/tmp-637573401/joda-time-2.9.3.jar 2019-06-12 15:36:28,301 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2019-06-12 15:36:28,302 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code. 2019-06-12 15:36:28,302 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche 2019-06-12 15:36:28,302 [main] INFO org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize [] 2019-06-12 15:36:28,374 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2019-06-12 15:36:28,445 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2019-06-12 15:36:28,465 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2019-06-12 15:36:28,465 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2019-06-12 15:36:28,465 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2019-06-12 15:36:28,599 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1 2019-06-12 15:36:28,675 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1559370613628_0026 2019-06-12 15:36:28,679 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources. 2019-06-12 15:36:28,918 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1559370613628_0026 2019-06-12 15:36:28,921 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://master1:8088/proxy/application_1559370613628_0026/ 2019-06-12 15:36:28,921 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1559370613628_0026 2019-06-12 15:36:28,921 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases t_wlan_simple_group_sum_group 2019-06-12 15:36:28,921 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: t_wlan_simple_group_sum_group[6,32] C: R: 2019-06-12 15:36:44,145 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 83% complete 2019-06-12 15:36:44,146 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0026] 2019-06-12 15:36:51,164 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0026] 2019-06-12 15:36:54,180 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-12 15:36:54,330 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-12 15:36:54,369 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-12 15:36:54,401 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2019-06-12 15:36:54,527 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.6.5 0.17.0 zkkafka 2019-06-12 15:35:33 2019-06-12 15:36:54 GROUP_BY,ORDER_BY Success! Job Stats (time in seconds): JobId Maps Reduces MaxMapTime MinMapTime AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs job_1559370613628_0024 1 1 3 3 3 3 4 4 4 4 t_wlan,t_wlan_simple,t_wlan_simple_group,t_wlan_simple_group_sum GROUP_BY,COMBINER job_1559370613628_0025 1 1 5 5 5 5 5 5 5 5 t_wlan_simple_group_sum_group SAMPLER job_1559370613628_0026 1 1 3 3 3 3 4 4 4 4 t_wlan_simple_group_sum_group ORDER_BY hdfs://master/tmp/temp1544583298/tmp-717585849, Input(s): Successfully read 1 records (459 bytes) from: "/hdfs/pig/tel.txt" Output(s): Successfully stored 1 records (29 bytes) in: "hdfs://master/tmp/temp1544583298/tmp-717585849" Counters: Total records written : 1 Total bytes written : 29 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_1559370613628_0024 -> job_1559370613628_0025, job_1559370613628_0025 -> job_1559370613628_0026, job_1559370613628_0026 2019-06-12 15:36:54,532 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-12 15:36:54,584 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-12 15:36:54,623 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-12 15:36:54,664 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-12 15:36:54,702 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-12 15:36:54,735 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-12 15:36:54,776 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-12 15:36:54,836 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-12 15:36:54,871 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2019-06-12 15:36:54,928 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! 2019-06-12 15:36:54,929 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2019-06-12 15:36:54,934 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2019-06-12 15:36:54,934 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 (13726230503,27,2481,24681,200)
8.脚本
pig -x mapreduce t_wlan.pig
捐助开发者
在兴趣的驱动下,写一个免费
的东西,有欣喜,也还有汗水,希望你喜欢我的作品,同时也能支持一下。 当然,有钱捧个钱场(支持支付宝和微信 以及扣扣群),没钱捧个人场,谢谢各位。
个人主页:http://knight-black-bob.iteye.com/
谢谢您的赞助,我会做的更好!
相关推荐
在Linux系统中安装Pig通常涉及以下步骤: 1. **安装Java开发工具(JDK)**:Pig依赖于Java环境,确保系统已安装JDK。 2. **获取Hadoop**:Pig是构建在Hadoop之上的,因此需要先安装Hadoop。 3. **下载Pig**:从...
提供的“安装说明.md”文件应该包含了详细的安装指南,包括针对不同Linux发行版的特定注意事项和解决常见问题的方法。在遇到困难时,建议仔细阅读这份文档,或者查阅VMware官方文档以获取更多帮助。 总的来说,...
接下来,你可以进一步学习如何使用Hadoop进行数据处理和分析,或者将其与其他大数据工具(如Hive、Pig、Spark等)集成,构建更复杂的大数据解决方案。记得定期更新Hadoop以获取最新的安全修复和功能改进。
【标题】"pig-0.16.0.tar安装包" 涉及的主要知识点是Apache Pig的安装和使用,这是一个基于Hadoop的数据流编程平台,用于处理大规模数据集。Pig Latin是Pig的编程语言,它允许用户编写复杂的数据处理任务,而无需...
《Apache Pig 0.17.0 安装与配置指南》 ...正确安装并配置Pig,结合Pig Latin的使用,可以极大地提高数据处理的效率和便捷性。通过不断的实践和学习,你将能更好地掌握Pig的潜力,为大数据分析带来强大的动力。
描述提到这个文件在"linux下使用",这意味着它是为Linux操作系统设计的,并且可能需要在Linux环境下进行解压和安装。在Linux系统中,".tar.gz"或".tgz"是常见的归档和压缩格式,它首先使用tar命令将多个文件和目录...
在实际使用中,用户首先需要在Linux服务器上安装Hadoop环境,然后解压并配置Pig 0.12.1,使其能够与Hadoop集群通信。接着,可以使用Pig命令行或交互式Shell(如Grunt)编写和执行Pig Latin脚本,进行数据处理任务。...
### Maven、Oozie、Pig在...通过以上步骤,可以在Linux环境中成功安装和配置Maven、Oozie和Pig,从而为后续的数据处理和分析工作提供强大的支持。这些工具的结合使用,可以极大地提升数据处理流程的自动化程度和效率。
- 安装JDK使用`rpm -ivh jdk-8u66-linux-x64.rpm`,然后配置环境变量,确保`JAVA_HOME`在`PATH`前面。 6. **Hadoop的伪分布式安装** - 解压Hadoop安装包至`/opt/hadoop`。 - 编辑`/etc/profile`或`~/.bashrc`...
1. **安装与配置**:在使用 Pig 之前,需要在 Hadoop 环境中安装和配置 Pig。这通常涉及下载源码,编译,然后将编译后的 JAR 包添加到 Hadoop 的类路径中。此外,还需要配置 Pig 的配置文件 pig.properties,指定 ...
配置Pig语法高亮,可以通过安装`piglatin-mode`插件实现。下载`piglatin.el`文件后,将其重命名为`.piglatin.el`并放置在与`.emacs`配置文件相同的目录下。在`.emacs`文件中添加`(load-file "/home/abc/.piglatin.el...
### Hadoop、HBase与Pig的安装与配置详解 #### Hadoop的安装与配置 在部署Hadoop之前,首先需要确保系统中已...以上步骤详尽地介绍了Hadoop、HBase与Pig在Linux环境下的安装与配置流程,是构建大数据处理平台的基础。
而在Linux环境下,可以通过安装相应的插件来实现Pig语法高亮,例如Emacs的PigLatin模式。这种配置可以提高代码的可读性,并且能更好地突出代码中的关键部分,从而加快编写和调试Pig脚本的速度。 Pig广泛被企业用于...
接着,可以通过添加Apache的官方仓库,使用apt-get命令来安装Pig。安装完成后,配置Pig的环境变量,包括HADOOP_HOME和PIG_HOME,并将Pig的bin目录添加到PATH变量中,这样就可以在终端直接运行Pig命令了。 **使用...
在这个针对Linux系统的安装包中,我们将探讨Hadoop的基础知识、安装步骤以及在Linux环境下的配置和操作。 Hadoop由Apache软件基金会开发,它主要由两个核心组件构成:Hadoop Distributed File System (HDFS) 和 ...
Go语言的特性使得Pigo具备跨平台兼容性,可以在包括Linux、Windows、MacOS以及嵌入式设备在内的多种操作系统上运行。 面部检测是计算机视觉领域的一个关键任务,用于在图像或视频流中定位和识别人脸。Pigo库使用了...
"d2l-zh-1.0.zip_D2L 文件_d2lzh安装_deeplearning_mxnet_pig17v" 是该教程的一个版本,它包含了一系列用于学习和实践的代码,主要基于MXNet框架。MXNet是一个高效的深度学习库,支持多种编程语言,如Python,便于...
8. **Hadoop的使用**:一旦Hadoop安装并配置完成,用户可以通过编写MapReduce程序或者使用Hadoop的其他组件(如Hive、Pig、Spark等)来处理大数据。文档`hadoop2.0环境搭建.docx`应该详细介绍了这些过程。 总之,这...
在本案例中,我们关注的是"Hive安装包Linux包",这是一个适用于Linux环境的Hive 0.13.1版本的tgz压缩文件,适用于集群安装。 ### Hive的基本概念 1. **Hive架构**: Hive主要由Client、MetaStore、Driver、Compiler...