`
knight_black_bob
  • 浏览: 858161 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

linux pig 安裝使用

阅读更多

 

0.准备工作 hadoop 服务器

10.156.50.35 yanfabu2-35.base.app.dev.yf zk1  hadoop1 master1 master
10.156.50.36 yanfabu2-36.base.app.dev.yf zk2  hadoop2 master2
10.156.50.37 yanfabu2-37.base.app.dev.yf zk3  hadoop3 slaver1

 

2.解压pig

 tar xf pig-0.17.0.tar.gz 
 mv pig-0.17.0 pig

vim ~/.bash_profile

export PIG_HOME=/home/zkkafka/pig
export PATH=$PATH:$PIG_HOME/bin

source ~/.bash_profile

scp -r ~/.bash_profile  zkkafka@10.156.50.36:/home/zkkafka/

 

3.配置文件修改

vim pig.properties

fs.default.name=hdfs://master     #core-site 配置
mapred.job.tracker=master1:10020  #maper-site 配置 jobhistory

scp -r ../conf/  zkkafka@10.156.50.36:/home/zkkafka/pig/conf/
scp -r ../conf/  zkkafka@10.156.50.37:/home/zkkafka/pig/conf/

 

4.pig 版本

pig -version
[zkkafka@yanfabu2-35 pig]$ pig -version
19/06/05 19:58:19 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS
Apache Pig version 0.17.0 (r1797386) 
compiled Jun 02 2017, 15:41:58

 

 

5.准备数据

vim tel.txt

1363157985066	13726230503	00-FD-07-A4-72-B8:CMCC	120.196.100.82	i02.c.aliimg.com	24	27	2481	24681	200

 

hdfs dfs -mkdir -p /hdfs/pig/
hdfs dfs -put /home/zkkafka/pig/data/tel.txt  /hdfs/pig/
hdfs dfs -lsr /hdfs/pig

 

[zkkafka@yanfabu2-35 conf]$ hdfs dfs -lsr /hdfs/pig
lsr: DEPRECATED: Please use 'ls -R' instead.
-rw-r--r--   2 zkkafka supergroup       2546 2019-06-05 21:03 /hdfs/pig/tel.txt

 

6.进入pig 命令

 

[zkkafka@yanfabu2-37 ~]$ pig
19/06/06 16:44:27 INFO pig.ExecTypeProvider: Trying ExecType : LOCAL
19/06/06 16:44:27 INFO pig.ExecTypeProvider: Trying ExecType : MAPREDUCE
19/06/06 16:44:27 INFO pig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2019-06-06 16:44:27,558 [main] INFO  org.apache.pig.Main - Apache Pig version 0.17.0 (r1797386) compiled Jun 02 2017, 15:41:58
2019-06-06 16:44:27,558 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/zkkafka/pig_1559810667556.log
2019-06-06 16:44:27,605 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /home/zkkafka/.pigbootup not found
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/zkkafka/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/zkkafka/hbase/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2019-06-06 16:44:28,312 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2019-06-06 16:44:28,312 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master/
2019-06-06 16:44:28,859 [main] INFO  org.apache.pig.PigServer - Pig Script ID for the session: PIG-default-3d2427ca-7fdf-4252-ab78-cfb6ed2be36e
2019-06-06 16:44:28,859 [main] WARN  org.apache.pig.PigServer - ATS is disabled since yarn.timeline-service.enabled set to false

 

7.使用pig

7.1导入数据到hive

 t_wlan = LOAD '/hdfs/pig/tel.txt' USING PigStorage('\t')   AS (t0:long, msisdn:chararray, t2:chararray, t3:chararray, t4:chararray, t5:chararray, t6:long, t7:long, t8:long, t9:long, t10:chararray);

 

7.2 查询 表 t_wlan

dump t_wlan;

grunt> dump t_wlan;
2019-06-06 16:59:05,805 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2019-06-06 16:59:05,840 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2019-06-06 16:59:05,840 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NestedLimitOptimizer, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2019-06-06 16:59:05,847 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2019-06-06 16:59:05,848 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2019-06-06 16:59:05,848 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2019-06-06 16:59:05,880 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2019-06-06 16:59:05,881 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2019-06-06 16:59:05,883 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2019-06-06 16:59:06,472 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp-1906860032/tmp-489322267/pig-0.17.0-core-h2.jar
2019-06-06 16:59:06,598 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-1906860032/tmp1532488090/automaton-1.11-8.jar
2019-06-06 16:59:07,094 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-1906860032/tmp731737639/antlr-runtime-3.4.jar
2019-06-06 16:59:07,190 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp-1906860032/tmp-2081706505/joda-time-2.9.3.jar
2019-06-06 16:59:07,192 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2019-06-06 16:59:07,192 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2019-06-06 16:59:07,193 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2019-06-06 16:59:07,193 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2019-06-06 16:59:07,202 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2019-06-06 16:59:07,264 [JobControl] WARN  org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2019-06-06 16:59:07,286 [JobControl] INFO  org.apache.pig.builtin.PigStorage - Using PigTextInputFormat
2019-06-06 16:59:07,289 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2019-06-06 16:59:07,289 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2019-06-06 16:59:07,291 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2019-06-06 16:59:07,487 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2019-06-06 16:59:07,590 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1559370613628_0014
2019-06-06 16:59:07,598 [JobControl] INFO  org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
2019-06-06 16:59:07,856 [JobControl] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1559370613628_0014
2019-06-06 16:59:07,862 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - The url to track the job: http://master1:8088/proxy/application_1559370613628_0014/
2019-06-06 16:59:07,862 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1559370613628_0014
2019-06-06 16:59:07,862 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases t_wlan
2019-06-06 16:59:07,862 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: t_wlan[4,9],t_wlan[-1,-1] C:  R: 
2019-06-06 16:59:07,872 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2019-06-06 16:59:07,873 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0014]
2019-06-06 16:59:20,161 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2019-06-06 16:59:20,161 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0014]
2019-06-06 16:59:23,200 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 16:59:23,409 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 16:59:23,505 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 16:59:23,573 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2019-06-06 16:59:23,574 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: 

HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
2.6.5	0.17.0	zkkafka	2019-06-06 16:59:05	2019-06-06 16:59:23	UNKNOWN

Success!

Job Stats (time in seconds):
JobId	Maps	Reduces	MaxMapTime	MinMapTime	AvgMapTime	MedianMapTime	MaxReduceTime	MinReduceTime	AvgReduceTime	MedianReducetime	Alias	Feature	Outputs
job_1559370613628_0014	1	0	4	4	4	4	0	0	0	0	t_wlan	MAP_ONLY	hdfs://master/tmp/temp-1906860032/tmp1645766804,

Input(s):
Successfully read 1 records (459 bytes) from: "/hdfs/pig/tel.txt"

Output(s):
Successfully stored 1 records (106 bytes) in: "hdfs://master/tmp/temp-1906860032/tmp1645766804"

Counters:
Total records written : 1
Total bytes written : 106
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1559370613628_0014


2019-06-06 16:59:23,582 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 16:59:23,639 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 16:59:23,698 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 16:59:23,753 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning ACCESSING_NON_EXISTENT_FIELD 1 time(s).
2019-06-06 16:59:23,753 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2019-06-06 16:59:23,755 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2019-06-06 16:59:23,764 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2019-06-06 16:59:23,764 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(1363157985066,13726230503,00-FD-07-A4-72-B8:CMCC,120.196.100.82,i02.c.aliimg.com,24,27,2481,24681,200,)

 

7.2 A 表中抽出数据成B 表

 

t_wlan_simple = FOREACH t_wlan GENERATE msisdn, t6, t7, t8, t9;
dump t_wlan_simple;

 

grunt> dump t_wlan_simple;
2019-06-06 17:03:42,827 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2019-06-06 17:03:42,869 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2019-06-06 17:03:42,870 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NestedLimitOptimizer, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2019-06-06 17:03:42,884 [main] INFO  org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned for t_wlan: $0, $2, $3, $4, $5, $10
2019-06-06 17:03:42,891 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2019-06-06 17:03:42,893 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2019-06-06 17:03:42,893 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2019-06-06 17:03:42,923 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2019-06-06 17:03:42,923 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2019-06-06 17:03:42,924 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2019-06-06 17:03:43,081 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp-1906860032/tmp1408006038/pig-0.17.0-core-h2.jar
2019-06-06 17:03:43,178 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-1906860032/tmp1149486211/automaton-1.11-8.jar
2019-06-06 17:03:43,281 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-1906860032/tmp1835019327/antlr-runtime-3.4.jar
2019-06-06 17:03:43,378 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp-1906860032/tmp2065709292/joda-time-2.9.3.jar
2019-06-06 17:03:43,382 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2019-06-06 17:03:43,383 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2019-06-06 17:03:43,383 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2019-06-06 17:03:43,383 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2019-06-06 17:03:43,399 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2019-06-06 17:03:43,481 [JobControl] WARN  org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2019-06-06 17:03:43,510 [JobControl] INFO  org.apache.pig.builtin.PigStorage - Using PigTextInputFormat
2019-06-06 17:03:43,519 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2019-06-06 17:03:43,519 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2019-06-06 17:03:43,522 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2019-06-06 17:03:44,131 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2019-06-06 17:03:44,228 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1559370613628_0015
2019-06-06 17:03:44,232 [JobControl] INFO  org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
2019-06-06 17:03:44,471 [JobControl] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1559370613628_0015
2019-06-06 17:03:44,475 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - The url to track the job: http://master1:8088/proxy/application_1559370613628_0015/
2019-06-06 17:03:44,475 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1559370613628_0015
2019-06-06 17:03:44,475 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases t_wlan,t_wlan_simple
2019-06-06 17:03:44,475 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: t_wlan[4,9],t_wlan_simple[-1,-1] C:  R: 
2019-06-06 17:03:44,480 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2019-06-06 17:03:44,480 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0015]
2019-06-06 17:03:58,648 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2019-06-06 17:03:58,649 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0015]
2019-06-06 17:04:04,679 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 17:04:04,910 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 17:04:04,977 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 17:04:05,043 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2019-06-06 17:04:05,044 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: 

HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
2.6.5	0.17.0	zkkafka	2019-06-06 17:03:42	2019-06-06 17:04:05	UNKNOWN

Success!

Job Stats (time in seconds):
JobId	Maps	Reduces	MaxMapTime	MinMapTime	AvgMapTime	MedianMapTime	MaxReduceTime	MinReduceTime	AvgReduceTime	MedianReducetime	Alias	Feature	Outputs
job_1559370613628_0015	1	0	4	4	4	4	0	0	0	0	t_wlan,t_wlan_simple	MAP_ONLY	hdfs://master/tmp/temp-1906860032/tmp1236017200,

Input(s):
Successfully read 1 records (459 bytes) from: "/hdfs/pig/tel.txt"

Output(s):
Successfully stored 1 records (29 bytes) in: "hdfs://master/tmp/temp-1906860032/tmp1236017200"

Counters:
Total records written : 1
Total bytes written : 29
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1559370613628_0015


2019-06-06 17:04:05,058 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 17:04:05,137 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 17:04:05,223 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 17:04:05,335 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2019-06-06 17:04:05,337 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2019-06-06 17:04:05,382 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2019-06-06 17:04:05,382 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(13726230503,27,2481,24681,200)

 

7.3 分组数据

 

t_wlan_simple_group = GROUP t_wlan_simple BY msisdn;	
dump t_wlan_simple_group;

 

grunt> dump t_wlan_simple_group;
2019-06-06 17:06:28,589 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY
2019-06-06 17:06:28,640 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2019-06-06 17:06:28,641 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NestedLimitOptimizer, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2019-06-06 17:06:28,646 [main] INFO  org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned for t_wlan: $0, $2, $3, $4, $5, $10
2019-06-06 17:06:28,661 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2019-06-06 17:06:28,674 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2019-06-06 17:06:28,674 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2019-06-06 17:06:28,715 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2019-06-06 17:06:28,716 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2019-06-06 17:06:28,717 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
2019-06-06 17:06:28,723 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2019-06-06 17:06:28,729 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=102
2019-06-06 17:06:28,729 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2019-06-06 17:06:28,730 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2019-06-06 17:06:28,929 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp-1906860032/tmp-412980928/pig-0.17.0-core-h2.jar
2019-06-06 17:06:29,039 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-1906860032/tmp-1182557529/automaton-1.11-8.jar
2019-06-06 17:06:29,543 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-1906860032/tmp-1112811524/antlr-runtime-3.4.jar
2019-06-06 17:06:30,043 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp-1906860032/tmp432932811/joda-time-2.9.3.jar
2019-06-06 17:06:30,046 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2019-06-06 17:06:30,047 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2019-06-06 17:06:30,047 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2019-06-06 17:06:30,047 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2019-06-06 17:06:30,111 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2019-06-06 17:06:30,174 [JobControl] WARN  org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2019-06-06 17:06:30,189 [JobControl] INFO  org.apache.pig.builtin.PigStorage - Using PigTextInputFormat
2019-06-06 17:06:30,191 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2019-06-06 17:06:30,191 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2019-06-06 17:06:30,193 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2019-06-06 17:06:30,391 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2019-06-06 17:06:30,488 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1559370613628_0016
2019-06-06 17:06:30,492 [JobControl] INFO  org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
2019-06-06 17:06:30,734 [JobControl] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1559370613628_0016
2019-06-06 17:06:30,738 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - The url to track the job: http://master1:8088/proxy/application_1559370613628_0016/
2019-06-06 17:06:30,738 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1559370613628_0016
2019-06-06 17:06:30,738 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases t_wlan,t_wlan_simple,t_wlan_simple_group
2019-06-06 17:06:30,738 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: t_wlan[4,9],t_wlan_simple[-1,-1],t_wlan_simple_group[6,22] C:  R: 
2019-06-06 17:06:30,745 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2019-06-06 17:06:30,745 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0016]
2019-06-06 17:06:44,943 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2019-06-06 17:06:44,943 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0016]
2019-06-06 17:06:50,964 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0016]
2019-06-06 17:06:55,983 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 17:06:56,181 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 17:06:56,283 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 17:06:56,335 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2019-06-06 17:06:56,335 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: 

HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
2.6.5	0.17.0	zkkafka	2019-06-06 17:06:28	2019-06-06 17:06:56	GROUP_BY

Success!

Job Stats (time in seconds):
JobId	Maps	Reduces	MaxMapTime	MinMapTime	AvgMapTime	MedianMapTime	MaxReduceTime	MinReduceTime	AvgReduceTime	MedianReducetime	Alias	Feature	Outputs
job_1559370613628_0016	1	1	4	4	4	4	4	4	4	4	t_wlan,t_wlan_simple,t_wlan_simple_group	GROUP_BY	hdfs://master/tmp/temp-1906860032/tmp912427234,

Input(s):
Successfully read 1 records (459 bytes) from: "/hdfs/pig/tel.txt"

Output(s):
Successfully stored 1 records (46 bytes) in: "hdfs://master/tmp/temp-1906860032/tmp912427234"

Counters:
Total records written : 1
Total bytes written : 46
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1559370613628_0016


2019-06-06 17:06:56,345 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 17:06:56,403 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 17:06:56,474 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 17:06:56,554 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2019-06-06 17:06:56,556 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2019-06-06 17:06:56,568 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2019-06-06 17:06:56,568 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(13726230503,{(13726230503,27,2481,24681,200)})

 

7.4 流量汇总

 

t_wlan_simple_group_sum = FOREACH t_wlan_simple_group GENERATE group, SUM(t_wlan_simple.t6), SUM(t_wlan_simple.t7), SUM(t_wlan_simple.t8), SUM(t_wlan_simple.t9);
dump t_wlan_simple_group_sum;

 

grunt> dump t_wlan_simple_group_sum
2019-06-06 17:15:39,824 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY
2019-06-06 17:15:39,877 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2019-06-06 17:15:39,878 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NestedLimitOptimizer, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2019-06-06 17:15:39,885 [main] INFO  org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned for t_wlan: $0, $2, $3, $4, $5, $10
2019-06-06 17:15:39,904 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2019-06-06 17:15:39,908 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.CombinerOptimizerUtil - Choosing to move algebraic foreach to combiner
2019-06-06 17:15:39,972 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2019-06-06 17:15:39,972 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2019-06-06 17:15:40,000 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2019-06-06 17:15:40,001 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2019-06-06 17:15:40,002 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
2019-06-06 17:15:40,002 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2019-06-06 17:15:40,005 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=102
2019-06-06 17:15:40,005 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2019-06-06 17:15:40,005 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2019-06-06 17:15:40,602 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp-1906860032/tmp-784677978/pig-0.17.0-core-h2.jar
2019-06-06 17:15:40,699 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-1906860032/tmp-1113714067/automaton-1.11-8.jar
2019-06-06 17:15:40,796 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp-1906860032/tmp-1701171835/antlr-runtime-3.4.jar
2019-06-06 17:15:40,910 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp-1906860032/tmp-725132195/joda-time-2.9.3.jar
2019-06-06 17:15:40,914 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2019-06-06 17:15:40,915 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2019-06-06 17:15:40,915 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2019-06-06 17:15:40,915 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2019-06-06 17:15:40,968 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2019-06-06 17:15:41,035 [JobControl] WARN  org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2019-06-06 17:15:41,055 [JobControl] INFO  org.apache.pig.builtin.PigStorage - Using PigTextInputFormat
2019-06-06 17:15:41,057 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2019-06-06 17:15:41,057 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2019-06-06 17:15:41,060 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2019-06-06 17:15:41,282 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2019-06-06 17:15:41,432 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1559370613628_0018
2019-06-06 17:15:41,438 [JobControl] INFO  org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
2019-06-06 17:15:41,686 [JobControl] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1559370613628_0018
2019-06-06 17:15:41,691 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - The url to track the job: http://master1:8088/proxy/application_1559370613628_0018/
2019-06-06 17:15:41,692 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1559370613628_0018
2019-06-06 17:15:41,692 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases t_wlan,t_wlan_simple,t_wlan_simple_group,t_wlan_simple_group_sum
2019-06-06 17:15:41,692 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: t_wlan[4,9],t_wlan_simple[-1,-1],t_wlan_simple_group_sum[7,26],t_wlan_simple_group[6,22] C: t_wlan_simple_group_sum[7,26],t_wlan_simple_group[6,22] R: t_wlan_simple_group_sum[7,26]
2019-06-06 17:15:41,698 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2019-06-06 17:15:41,698 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0018]
2019-06-06 17:15:55,903 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2019-06-06 17:15:55,903 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0018]
2019-06-06 17:16:00,962 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0018]
2019-06-06 17:16:06,981 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 17:16:07,185 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 17:16:07,257 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 17:16:07,332 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2019-06-06 17:16:07,333 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: 

HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
2.6.5	0.17.0	zkkafka	2019-06-06 17:15:39	2019-06-06 17:16:07	GROUP_BY

Success!

Job Stats (time in seconds):
JobId	Maps	Reduces	MaxMapTime	MinMapTime	AvgMapTime	MedianMapTime	MaxReduceTime	MinReduceTime	AvgReduceTime	MedianReducetime	Alias	Feature	Outputs
job_1559370613628_0018	1	1	3	3	3	3	3	3	3	3	t_wlan,t_wlan_simple,t_wlan_simple_group,t_wlan_simple_group_sum	GROUP_BY,COMBINER	hdfs://master/tmp/temp-1906860032/tmp2100428296,

Input(s):
Successfully read 1 records (459 bytes) from: "/hdfs/pig/tel.txt"

Output(s):
Successfully stored 1 records (29 bytes) in: "hdfs://master/tmp/temp-1906860032/tmp2100428296"

Counters:
Total records written : 1
Total bytes written : 29
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1559370613628_0018


2019-06-06 17:16:07,343 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 17:16:07,402 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 17:16:07,456 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-06 17:16:07,512 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2019-06-06 17:16:07,513 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2019-06-06 17:16:07,529 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2019-06-06 17:16:07,529 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(13726230503,27,2481,24681,200)

 

7.5 存储到HDFS中

STORE t_wlan_simple_group_sum INTO '/hdfs/pig/wlan_result';

 

[zkkafka@yanfabu2-36 ~]$ hdfs dfs -text /hdfs/pig/wlan_result/part-r-00000
13726230503	27	2481	24681	200
[zkkafka@yanfabu2-36 ~]$ 

 

7.6 排序

t_wlan_simple_group_sum_group = ORDER t_wlan_simple_group_sum BY group;
DUMP t_wlan_simple_group_sum_group;

 

grunt> DUMP t_wlan_simple_group_sum_group;
2019-06-12 15:35:33,188 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,ORDER_BY
2019-06-12 15:35:33,235 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2019-06-12 15:35:33,236 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NestedLimitOptimizer, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2019-06-12 15:35:33,242 [main] INFO  org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned for t_wlan: $0, $2, $3, $4, $5, $10
2019-06-12 15:35:33,255 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2019-06-12 15:35:33,280 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.CombinerOptimizerUtil - Choosing to move algebraic foreach to combiner
2019-06-12 15:35:33,291 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SecondaryKeyOptimizerMR - Using Secondary Key Optimization for MapReduce node scope-283
2019-06-12 15:35:33,292 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
2019-06-12 15:35:33,292 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 3
2019-06-12 15:35:33,328 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2019-06-12 15:35:33,329 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2019-06-12 15:35:33,330 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
2019-06-12 15:35:33,330 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2019-06-12 15:35:33,332 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=102
2019-06-12 15:35:33,333 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2019-06-12 15:35:33,333 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2019-06-12 15:35:33,510 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp1544583298/tmp-955805369/pig-0.17.0-core-h2.jar
2019-06-12 15:35:33,595 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp1544583298/tmp712002240/automaton-1.11-8.jar
2019-06-12 15:35:34,074 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp1544583298/tmp1938988919/antlr-runtime-3.4.jar
2019-06-12 15:35:34,154 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp1544583298/tmp1704097364/joda-time-2.9.3.jar
2019-06-12 15:35:34,157 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2019-06-12 15:35:34,158 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2019-06-12 15:35:34,158 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2019-06-12 15:35:34,158 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2019-06-12 15:35:34,193 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2019-06-12 15:35:34,256 [JobControl] WARN  org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2019-06-12 15:35:34,277 [JobControl] INFO  org.apache.pig.builtin.PigStorage - Using PigTextInputFormat
2019-06-12 15:35:34,288 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2019-06-12 15:35:34,289 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2019-06-12 15:35:34,291 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2019-06-12 15:35:34,450 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2019-06-12 15:35:34,952 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1559370613628_0024
2019-06-12 15:35:34,960 [JobControl] INFO  org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
2019-06-12 15:35:35,211 [JobControl] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1559370613628_0024
2019-06-12 15:35:35,216 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - The url to track the job: http://master1:8088/proxy/application_1559370613628_0024/
2019-06-12 15:35:35,216 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1559370613628_0024
2019-06-12 15:35:35,216 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases t_wlan,t_wlan_simple,t_wlan_simple_group,t_wlan_simple_group_sum
2019-06-12 15:35:35,216 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: t_wlan[1,9],t_wlan_simple[-1,-1],t_wlan_simple_group_sum[4,26],t_wlan_simple_group[3,22] C: t_wlan_simple_group_sum[4,26],t_wlan_simple_group[3,22] R: t_wlan_simple_group_sum[4,26]
2019-06-12 15:35:35,231 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2019-06-12 15:35:35,231 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0024]
2019-06-12 15:35:47,386 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 16% complete
2019-06-12 15:35:47,386 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0024]
2019-06-12 15:35:54,902 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 33% complete
2019-06-12 15:35:54,902 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0024]
2019-06-12 15:36:00,424 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-12 15:36:00,596 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-12 15:36:00,651 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-12 15:36:00,688 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2019-06-12 15:36:00,688 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2019-06-12 15:36:00,689 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
2019-06-12 15:36:00,689 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2019-06-12 15:36:00,698 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=29
2019-06-12 15:36:00,699 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2019-06-12 15:36:00,699 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2019-06-12 15:36:01,245 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp1544583298/tmp-87045202/pig-0.17.0-core-h2.jar
2019-06-12 15:36:01,308 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp1544583298/tmp568012746/automaton-1.11-8.jar
2019-06-12 15:36:01,405 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp1544583298/tmp780878190/antlr-runtime-3.4.jar
2019-06-12 15:36:01,485 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp1544583298/tmp772462384/joda-time-2.9.3.jar
2019-06-12 15:36:01,487 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2019-06-12 15:36:01,487 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2019-06-12 15:36:01,487 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2019-06-12 15:36:01,487 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2019-06-12 15:36:01,508 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2019-06-12 15:36:01,559 [JobControl] WARN  org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2019-06-12 15:36:01,582 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2019-06-12 15:36:01,582 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2019-06-12 15:36:01,582 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2019-06-12 15:36:01,749 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2019-06-12 15:36:02,233 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1559370613628_0025
2019-06-12 15:36:02,237 [JobControl] INFO  org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
2019-06-12 15:36:02,472 [JobControl] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1559370613628_0025
2019-06-12 15:36:02,476 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - The url to track the job: http://master1:8088/proxy/application_1559370613628_0025/
2019-06-12 15:36:02,476 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1559370613628_0025
2019-06-12 15:36:02,476 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases t_wlan_simple_group_sum_group
2019-06-12 15:36:02,476 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: t_wlan_simple_group_sum_group[6,32] C:  R: 
2019-06-12 15:36:16,558 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2019-06-12 15:36:16,558 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0025]
2019-06-12 15:36:24,572 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 66% complete
2019-06-12 15:36:24,572 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0025]
2019-06-12 15:36:27,589 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-12 15:36:27,756 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-12 15:36:27,814 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-12 15:36:27,850 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2019-06-12 15:36:27,850 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2019-06-12 15:36:27,854 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
2019-06-12 15:36:27,854 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2019-06-12 15:36:27,854 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2019-06-12 15:36:27,995 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp1544583298/tmp-1238945561/pig-0.17.0-core-h2.jar
2019-06-12 15:36:28,103 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp1544583298/tmp1385874378/automaton-1.11-8.jar
2019-06-12 15:36:28,223 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp1544583298/tmp2107107107/antlr-runtime-3.4.jar
2019-06-12 15:36:28,297 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/zkkafka/pig/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp1544583298/tmp-637573401/joda-time-2.9.3.jar
2019-06-12 15:36:28,301 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2019-06-12 15:36:28,302 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2019-06-12 15:36:28,302 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2019-06-12 15:36:28,302 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2019-06-12 15:36:28,374 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2019-06-12 15:36:28,445 [JobControl] WARN  org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2019-06-12 15:36:28,465 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2019-06-12 15:36:28,465 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2019-06-12 15:36:28,465 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2019-06-12 15:36:28,599 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2019-06-12 15:36:28,675 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1559370613628_0026
2019-06-12 15:36:28,679 [JobControl] INFO  org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources.
2019-06-12 15:36:28,918 [JobControl] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1559370613628_0026
2019-06-12 15:36:28,921 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - The url to track the job: http://master1:8088/proxy/application_1559370613628_0026/
2019-06-12 15:36:28,921 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1559370613628_0026
2019-06-12 15:36:28,921 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases t_wlan_simple_group_sum_group
2019-06-12 15:36:28,921 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: t_wlan_simple_group_sum_group[6,32] C:  R: 
2019-06-12 15:36:44,145 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 83% complete
2019-06-12 15:36:44,146 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0026]
2019-06-12 15:36:51,164 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1559370613628_0026]
2019-06-12 15:36:54,180 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-12 15:36:54,330 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-12 15:36:54,369 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-12 15:36:54,401 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2019-06-12 15:36:54,527 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: 

HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
2.6.5	0.17.0	zkkafka	2019-06-12 15:35:33	2019-06-12 15:36:54	GROUP_BY,ORDER_BY

Success!

Job Stats (time in seconds):
JobId	Maps	Reduces	MaxMapTime	MinMapTime	AvgMapTime	MedianMapTime	MaxReduceTime	MinReduceTime	AvgReduceTime	MedianReducetime	Alias	Feature	Outputs
job_1559370613628_0024	1	1	3	3	3	3	4	4	4	4	t_wlan,t_wlan_simple,t_wlan_simple_group,t_wlan_simple_group_sum	GROUP_BY,COMBINER	
job_1559370613628_0025	1	1	5	5	5	5	5	5	5	5	t_wlan_simple_group_sum_group	SAMPLER	
job_1559370613628_0026	1	1	3	3	3	3	4	4	4	4	t_wlan_simple_group_sum_group	ORDER_BY	hdfs://master/tmp/temp1544583298/tmp-717585849,

Input(s):
Successfully read 1 records (459 bytes) from: "/hdfs/pig/tel.txt"

Output(s):
Successfully stored 1 records (29 bytes) in: "hdfs://master/tmp/temp1544583298/tmp-717585849"

Counters:
Total records written : 1
Total bytes written : 29
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1559370613628_0024	->	job_1559370613628_0025,
job_1559370613628_0025	->	job_1559370613628_0026,
job_1559370613628_0026


2019-06-12 15:36:54,532 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-12 15:36:54,584 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-12 15:36:54,623 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-12 15:36:54,664 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-12 15:36:54,702 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-12 15:36:54,735 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-12 15:36:54,776 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-12 15:36:54,836 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-12 15:36:54,871 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2019-06-12 15:36:54,928 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2019-06-12 15:36:54,929 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2019-06-12 15:36:54,934 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2019-06-12 15:36:54,934 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(13726230503,27,2481,24681,200)

 

 

8.脚本

pig -x mapreduce  t_wlan.pig

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

捐助开发者 

在兴趣的驱动下,写一个免费的东西,有欣喜,也还有汗水,希望你喜欢我的作品,同时也能支持一下。 当然,有钱捧个钱场(支持支付宝和微信 以及扣扣群),没钱捧个人场,谢谢各位。

 

个人主页http://knight-black-bob.iteye.com/



 
 
 谢谢您的赞助,我会做的更好!

0
0
分享到:
评论

相关推荐

    大数据技术基础实验报告-pig的安装配置与应用.doc

    在Linux系统中安装Pig通常涉及以下步骤: 1. **安装Java开发工具(JDK)**:Pig依赖于Java环境,确保系统已安装JDK。 2. **获取Hadoop**:Pig是构建在Hadoop之上的,因此需要先安装Hadoop。 3. **下载Pig**:从...

    VMware tool linux下安装

    提供的“安装说明.md”文件应该包含了详细的安装指南,包括针对不同Linux发行版的特定注意事项和解决常见问题的方法。在遇到困难时,建议仔细阅读这份文档,或者查阅VMware官方文档以获取更多帮助。 总的来说,...

    Linux下安装Hadoop

    接下来,你可以进一步学习如何使用Hadoop进行数据处理和分析,或者将其与其他大数据工具(如Hive、Pig、Spark等)集成,构建更复杂的大数据解决方案。记得定期更新Hadoop以获取最新的安全修复和功能改进。

    pig-0.16.0.tar安装包

    【标题】"pig-0.16.0.tar安装包" 涉及的主要知识点是Apache Pig的安装和使用,这是一个基于Hadoop的数据流编程平台,用于处理大规模数据集。Pig Latin是Pig的编程语言,它允许用户编写复杂的数据处理任务,而无需...

    pig-0.17.0.tar的安装包,

    《Apache Pig 0.17.0 安装与配置指南》 ...正确安装并配置Pig,结合Pig Latin的使用,可以极大地提高数据处理的效率和便捷性。通过不断的实践和学习,你将能更好地掌握Pig的潜力,为大数据分析带来强大的动力。

    pig-0.9.2.tar.gz

    描述提到这个文件在"linux下使用",这意味着它是为Linux操作系统设计的,并且可能需要在Linux环境下进行解压和安装。在Linux系统中,".tar.gz"或".tgz"是常见的归档和压缩格式,它首先使用tar命令将多个文件和目录...

    pig-0.12.1.tar.gz

    在实际使用中,用户首先需要在Linux服务器上安装Hadoop环境,然后解压并配置Pig 0.12.1,使其能够与Hadoop集群通信。接着,可以使用Pig命令行或交互式Shell(如Grunt)编写和执行Pig Latin脚本,进行数据处理任务。...

    maven,oozie,pig LINUX 环境搭建.docx

    ### Maven、Oozie、Pig在...通过以上步骤,可以在Linux环境中成功安装和配置Maven、Oozie和Pig,从而为后续的数据处理和分析工作提供强大的支持。这些工具的结合使用,可以极大地提升数据处理流程的自动化程度和效率。

    linux环境下hadoop及其组件分别安装

    - 安装JDK使用`rpm -ivh jdk-8u66-linux-x64.rpm`,然后配置环境变量,确保`JAVA_HOME`在`PATH`前面。 6. **Hadoop的伪分布式安装** - 解压Hadoop安装包至`/opt/hadoop`。 - 编辑`/etc/profile`或`~/.bashrc`...

    pig基础操作

    1. **安装与配置**:在使用 Pig 之前,需要在 Hadoop 环境中安装和配置 Pig。这通常涉及下载源码,编译,然后将编译后的 JAR 包添加到 Hadoop 的类路径中。此外,还需要配置 Pig 的配置文件 pig.properties,指定 ...

    [原创]Apache_Pig的一些基础概念及用法总结

    配置Pig语法高亮,可以通过安装`piglatin-mode`插件实现。下载`piglatin.el`文件后,将其重命名为`.piglatin.el`并放置在与`.emacs`配置文件相同的目录下。在`.emacs`文件中添加`(load-file "/home/abc/.piglatin.el...

    Hadoop_HBase_Pig

    ### Hadoop、HBase与Pig的安装与配置详解 #### Hadoop的安装与配置 在部署Hadoop之前,首先需要确保系统中已...以上步骤详尽地介绍了Hadoop、HBase与Pig在Linux环境下的安装与配置流程,是构建大数据处理平台的基础。

    Apache Pig用法总结

    而在Linux环境下,可以通过安装相应的插件来实现Pig语法高亮,例如Emacs的PigLatin模式。这种配置可以提高代码的可读性,并且能更好地突出代码中的关键部分,从而加快编写和调试Pig脚本的速度。 Pig广泛被企业用于...

    pig-0.12.0(Ubuntu)

    接着,可以通过添加Apache的官方仓库,使用apt-get命令来安装Pig。安装完成后,配置Pig的环境变量,包括HADOOP_HOME和PIG_HOME,并将Pig的bin目录添加到PATH变量中,这样就可以在终端直接运行Pig命令了。 **使用...

    大数据 hadoop-3.1.3 linux 安装包

    在这个针对Linux系统的安装包中,我们将探讨Hadoop的基础知识、安装步骤以及在Linux环境下的配置和操作。 Hadoop由Apache软件基金会开发,它主要由两个核心组件构成:Hadoop Distributed File System (HDFS) 和 ...

    开源项目-esimov-pigo.zip

    Go语言的特性使得Pigo具备跨平台兼容性,可以在包括Linux、Windows、MacOS以及嵌入式设备在内的多种操作系统上运行。 面部检测是计算机视觉领域的一个关键任务,用于在图像或视频流中定位和识别人脸。Pigo库使用了...

    d2l-zh-1.0.zip_D2L 文件_d2lzh安装_deeplearning_mxnet_pig17v

    "d2l-zh-1.0.zip_D2L 文件_d2lzh安装_deeplearning_mxnet_pig17v" 是该教程的一个版本,它包含了一系列用于学习和实践的代码,主要基于MXNet框架。MXNet是一个高效的深度学习库,支持多种编程语言,如Python,便于...

    Hadoop在Linux系统上的搭建

    8. **Hadoop的使用**:一旦Hadoop安装并配置完成,用户可以通过编写MapReduce程序或者使用Hadoop的其他组件(如Hive、Pig、Spark等)来处理大数据。文档`hadoop2.0环境搭建.docx`应该详细介绍了这些过程。 总之,这...

    hive安装包Linux包

    在本案例中,我们关注的是"Hive安装包Linux包",这是一个适用于Linux环境的Hive 0.13.1版本的tgz压缩文件,适用于集群安装。 ### Hive的基本概念 1. **Hive架构**: Hive主要由Client、MetaStore、Driver、Compiler...

Global site tag (gtag.js) - Google Analytics