`

pig

 
阅读更多

hdfs://hadoopmaster:9000/user/hadoop/ncdc_data.txt<r 2> 71
hdfs://hadoopmaster:9000/user/hadoop/user       <dir>
grunt> cat ncdc_data.txt
1953:122:5
1954:83:5
1955:44:5
1956:33:5
1957:50:5
1958:33:5
1959:55:5
grunt> A = LOAD 'ncdc_data.txt' USING PigStorage(':') AS(year:int,temp:int,quality:int);
2015-08-22 00:28:39,239 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
grunt> B = FILTER A BY temp !=50 AND ((chararray)quality matches '[012345]');
grunt> C = GROUP B BY year;
grunt> describe C;
C: {group: int,B: {(year: int,temp: int,quality: int)}}
grunt> D = FOREACH C GENERATE group,MAX(B.temp) AS max_temp;
grunt> describe D;
D: {group: int,max_temp: int}
grunt> DUMP D;
2015-08-22 00:33:01,366 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,FILTER
2015-08-22 00:33:01,422 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2015-08-22 00:33:01,538 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2015-08-22 00:33:01,552 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner
2015-08-22 00:33:01,659 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2015-08-22 00:33:01,660 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2015-08-22 00:33:01,708 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-08-22 00:33:01,753 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
2015-08-22 00:33:01,965 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2015-08-22 00:33:01,972 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent
2015-08-22 00:33:01,972 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2015-08-22 00:33:01,972 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress
2015-08-22 00:33:01,975 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
2015-08-22 00:33:01,976 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2015-08-22 00:33:01,985 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=71
2015-08-22 00:33:01,985 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2015-08-22 00:33:01,985 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
2015-08-22 00:33:01,985 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2015-08-22 00:33:01,986 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job8689104904217919048.jar
2015-08-22 00:33:05,184 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job8689104904217919048.jar created
2015-08-22 00:33:05,184 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.jar is deprecated. Instead, use mapreduce.job.jar
2015-08-22 00:33:05,208 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2015-08-22 00:33:05,214 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2015-08-22 00:33:05,214 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2015-08-22 00:33:05,215 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2015-08-22 00:33:05,372 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2015-08-22 00:33:05,372 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker.http.address is deprecated. Instead, use mapreduce.jobtracker.http.address
2015-08-22 00:33:05,401 [JobControl] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
2015-08-22 00:33:05,425 [JobControl] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-08-22 00:33:06,020 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-08-22 00:33:06,020 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2015-08-22 00:33:06,055 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2015-08-22 00:33:06,153 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2015-08-22 00:33:06,167 [JobControl] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-08-22 00:33:06,686 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1440227566287_0001
2015-08-22 00:33:07,117 [JobControl] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1440227566287_0001
2015-08-22 00:33:07,179 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - The url to track the job: http://hadoopmaster:8088/proxy/application_1440227566287_0001/
2015-08-22 00:33:07,180 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1440227566287_0001
2015-08-22 00:33:07,180 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B,C,D
2015-08-22 00:33:07,180 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[1,4],A[-1,-1],B[2,4],D[4,4],C[3,4] C: D[4,4],C[3,4] R: D[4,4]
2015-08-22 00:33:07,189 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2015-08-22 00:33:07,189 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1440227566287_0001]
2015-08-22 00:33:24,290 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2015-08-22 00:33:24,290 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1440227566287_0001]
2015-08-22 00:33:31,308 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1440227566287_0001]
2015-08-22 00:33:32,507 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2015-08-22 00:33:32,508 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
2.6.0   0.13.0  hadoop  2015-08-22 00:33:01     2015-08-22 00:33:32     GROUP_BY,FILTER

Success!

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   MedianReducetime        Alias   Feature Outputs
job_1440227566287_0001  1       1       6       6       6       6       3       3       3       3       A,B,C,D GROUP_BY,COMBINER       hdfs://hadoopmaster:9000/tmp/temp-321901511/tmp399055604,

Input(s):
Successfully read 7 records (440 bytes) from: "hdfs://hadoopmaster:9000/user/hadoop/ncdc_data.txt"

Output(s):
Successfully stored 6 records (54 bytes) in: "hdfs://hadoopmaster:9000/tmp/temp-321901511/tmp399055604"

Counters:
Total records written : 6
Total bytes written : 54
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1440227566287_0001


2015-08-22 00:33:32,541 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2015-08-22 00:33:32,543 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-08-22 00:33:32,544 [main] INFO  org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code.
2015-08-22 00:33:32,557 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-08-22 00:33:32,557 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(1953,122)
(1954,83)
(1955,44)
(1956,33)
(1958,33)
(1959,55)
grunt> STORE D INTO 'max_temp' USING PigStorage(':');
2015-08-22 00:35:08,446 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-08-22 00:35:08,475 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - mapred.textoutputformat.separator is deprecated. Instead, use mapreduce.output.textoutputformat.separator
2015-08-22 00:35:08,493 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY,FILTER
2015-08-22 00:35:08,494 [main] INFO  org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[FilterLogicExpressionSimplifier]}
2015-08-22 00:35:08,500 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2015-08-22 00:35:08,502 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner
2015-08-22 00:35:08,507 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2015-08-22 00:35:08,507 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2015-08-22 00:35:08,518 [main] INFO  org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-08-22 00:35:08,519 [main] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
2015-08-22 00:35:08,523 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job
2015-08-22 00:35:08,524 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2015-08-22 00:35:08,524 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Reduce phase detected, estimating # of required reducers.
2015-08-22 00:35:08,524 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2015-08-22 00:35:08,527 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=71
2015-08-22 00:35:08,527 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2015-08-22 00:35:08,527 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process
2015-08-22 00:35:08,527 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job8143857507507996250.jar
2015-08-22 00:35:11,406 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job8143857507507996250.jar created
2015-08-22 00:35:11,414 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2015-08-22 00:35:11,415 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2015-08-22 00:35:11,415 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2015-08-22 00:35:11,415 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2015-08-22 00:35:11,454 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2015-08-22 00:35:11,457 [JobControl] INFO  org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
2015-08-22 00:35:11,639 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2015-08-22 00:35:11,639 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2015-08-22 00:35:11,642 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2015-08-22 00:35:11,686 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1
2015-08-22 00:35:11,755 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1440227566287_0002
2015-08-22 00:35:11,778 [JobControl] INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1440227566287_0002
2015-08-22 00:35:11,782 [JobControl] INFO  org.apache.hadoop.mapreduce.Job - The url to track the job: http://hadoopmaster:8088/proxy/application_1440227566287_0002/
2015-08-22 00:35:11,955 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1440227566287_0002
2015-08-22 00:35:11,955 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B,C,D
2015-08-22 00:35:11,955 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[1,4],A[-1,-1],B[2,4],D[4,4],C[3,4] C: D[4,4],C[3,4] R: D[4,4]
2015-08-22 00:35:11,963 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2015-08-22 00:35:11,963 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1440227566287_0002]
2015-08-22 00:35:23,989 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2015-08-22 00:35:23,990 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1440227566287_0002]
2015-08-22 00:35:29,002 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1440227566287_0002]
2015-08-22 00:35:32,106 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2015-08-22 00:35:32,107 [main] INFO  org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
2.6.0   0.13.0  hadoop  2015-08-22 00:35:08     2015-08-22 00:35:32     GROUP_BY,FILTER

Success!

Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   MedianReducetime        Alias   Feature Outputs
job_1440227566287_0002  1       1       3       3       3       3       3       3       3       3       A,B,C,D GROUP_BY,COMBINER       hdfs://hadoopmaster:9000/user/hadoop/max_temp,

Input(s):
Successfully read 7 records (440 bytes) from: "hdfs://hadoopmaster:9000/user/hadoop/ncdc_data.txt"

Output(s):
Successfully stored 6 records (49 bytes) in: "hdfs://hadoopmaster:9000/user/hadoop/max_temp"

Counters:
Total records written : 6
Total bytes written : 49
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_1440227566287_0002


2015-08-22 00:35:32,135 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
grunt> cat max_temp
1953:122
1954:83
1955:44
1956:33
1958:33
1959:55
grunt>

分享到:
评论

相关推荐

    pig的源码包

    《深入理解Pig:从源码剖析大数据处理框架》 Pig是Apache Hadoop生态系统中的一个数据处理框架,它提供了一种高级的编程语言——Pig Latin,用于编写大规模的数据处理作业。源码包是理解Pig工作原理、扩展功能和...

    pig源码0.15版

    《深入理解Pig 0.15源码:大数据处理框架的奥秘》 Pig是Apache Hadoop项目中的一个高级数据流语言和执行框架,主要用于处理大规模数据集。Pig 0.15版是Pig发展过程中的一个重要里程碑,它的源码为我们提供了深入...

    pig-0.7.0.tar.gz

    《Pig工具包在Hadoop云计算中的应用与详解》 Pig是Apache Hadoop生态系统中的一个强大工具,专为大规模数据处理而设计。"pig-0.7.0.tar.gz"是一个包含Pig 0.7.0版本的压缩包,它的出现为我们提供了一个高效的、基于...

    PIG微服务前后端源码

    【标题】"PIG微服务前后端源码"所涉及的知识点主要集中在微服务架构、前端开发和后端开发三个领域。PIG作为国内微服务热度最高的社区之一,其源码解析将帮助开发者深入理解微服务的设计理念和实现方式。 在微服务...

    pig-0.9.2.tar.gz下载

    《Pig语言与Map-Reduce:深入理解pig-0.9.2.tar.gz》 Apache Pig是Hadoop生态系统中的一个高级数据处理工具,它提供了一种面向用户的脚本语言,称为Pig Latin,用于构建Map-Reduce作业。Pig拉丁语简化了大数据处理...

    pig官方基础教程

    在Hadoop平台上,Pig是一种高级脚本语言,用于处理和分析大数据。Pig允许用户执行复杂的转换和数据查询,这些操作原本需要使用Java MapReduce编程,而Pig通过提供一套数据流语言和执行框架,简化了这一过程。 Pig...

    pig编程指南源码

    《Pig编程指南源码详解》 Pig是Apache Hadoop项目的一部分,它提供了一个高级数据流语言(Pig Latin)和一个用于处理大规模数据集的执行引擎。本指南将深入探讨Pig编程的核心概念,结合从GitHub下载的...

    大数据之pig 命令

    ### 大数据之pig命令详解 #### 一、Pig简介及与Hive的比较 Pig是一款基于Hadoop的数据处理工具,它提供了一种高级语言(Pig Latin),使得用户能够更容易地处理大规模数据集。Pig的核心设计思想是为了简化大数据...

    pig格式图片编辑工具

    "pig格式图片编辑工具"是一个专注于处理特定图像格式(如pig, pceg)的软件,它提供了方便的图片转换和尺寸变换功能。 "Pig"格式可能是一个相对少见的图像格式,这种格式可能由特定的硬件或软件所支持,不被大多数...

    [原创]Apache_Pig的一些基础概念及用法总结

    ### Apache Pig的基础概念及用法总结 #### 一、引言 Apache Pig是一种高级的数据流语言,用于在Hadoop平台上处理大规模数据集。它通过提供一种抽象层,简化了复杂的大规模数据处理任务,使用户能够更加专注于数据...

    pig java 编程jar包

    在IT行业中,Pig是Apache Hadoop项目的一部分,它提供了一种高级的、抽象的语言,称为Pig Latin,用于处理大规模数据集。Pig Java编程主要涉及到使用Java API与Pig Latin进行交互,以实现更灵活的数据处理需求。在本...

    pig-0.16.0.tar安装包

    【标题】"pig-0.16.0.tar安装包" 涉及的主要知识点是Apache Pig的安装和使用,这是一个基于Hadoop的数据流编程平台,用于处理大规模数据集。Pig Latin是Pig的编程语言,它允许用户编写复杂的数据处理任务,而无需...

    pig-0.9.2.tar.gz

    标题中的"pig-0.9.2.tar.gz"是一个典型的Linux平台上的软件打包文件,它包含了一个名为"Pig"的工具的版本0.9.2。Pig是Apache Hadoop项目的一部分,主要用于处理大规模数据集的高级数据分析。Pig Latin是它的主要语言...

    Apache Pig的性能优化.pdf

    根据给定的文件信息,我们可以深入探讨Apache Pig的性能优化及其在大数据处理中的角色与优势。首先,让我们从Apache Pig的基本概念入手。 ### Apache Pig概述 Apache Pig是一种高生产力的数据流语言和执行框架,...

    pig-0.9.1.tar

    《Pig-0.9.1在Hadoop环境下的安装与配置详解》 Apache Pig是Hadoop生态系统中的一个高级数据处理工具,它提供了一种基于脚本语言的接口,使得用户可以更方便地进行大规模数据集的分析。Pig-0.9.1是Pig的一个早期...

    pig-0.12.0-cdh5.5.0.tar.gz

    《Apache Pig 0.12.0 在 CDH 5.5.0 上的应用与解析》 Apache Pig 是一个用于大数据分析的高级编程平台,它提供了名为 Pig Latin 的脚本语言,使得用户能够以相对简单的语法处理大规模数据集。Pig-0.12.0 是 Pig 的...

    pig-hive编程指南

    《Pig-Hive编程指南》 在大数据处理领域,Pig和Hive是Apache Hadoop生态系统中的重要组件,主要用于大规模数据处理和分析。这两者都提供了高级的数据抽象和查询语言,使得非Java背景的开发者也能方便地进行...

    HADOOP 系统之hadoop pig hive 整合版

    在IT行业中,Hadoop、Hive和Pig是大数据处理领域的三大重要工具,它们共同构建了一个高效、可扩展的数据处理框架。以下是对这些技术的详细解释: **Hadoop** 是一个开源的分布式计算框架,由Apache软件基金会开发。...

Global site tag (gtag.js) - Google Analytics