`

Pig: Introduction to Latin - 3

    博客分类:
  • Pig
 
阅读更多
  • flatten

players = load 'baseball' as (name:chararray, team:chararray,position:bag{t:(p:chararray)}, bat:map[]);
pos= foreach players generate name, flatten(position) as position;
bypos= group pos by position;

 

Jorge Posada,New York Yankees,{(Catcher),(Designated_hitter)},...

==>

Jorge Posada,Catcher
Jorge Posada,Designated_hitter

 

Note:A foreach with a flatten produces a cross product of every record in the bag with all of the other expressions in the generate statement.If there is more than one bag and both are flattened, this cross product will be done with members of each bag as well as other expressions in the generate statement.

 

If the bag is empty, no records are produced. But you can avoid this  by

noempty = foreach players generate name,

                                           ((position is null or IsEmpty(position)) ? {('unknown')} : position) as position;

 

Flatten can also be applied to a tuple. In this case, it does not produce a cross product;instead, it elevates each field in the tuple to a top-level field.

 

  •  Nested foreach

 

daily = load 'NYSE_daily' as (exchange, symbol); -- not interested in other fields
grpd = group daily by exchange;
uniqcnt = foreach grpd {
           sym = daily.symbol;
           uniq_sym = distinct sym;
           generate group, COUNT(uniq_sym);
};

 


divs = load 'NYSE_dividends' as (exchange:chararray, symbol:chararray,date:chararray, dividends:float);
grpd = group divs by symbol;
top3 = foreach grpd {
              sorted = order divs by dividends desc;
              top = limit sorted 3;

              generate group, flatten(top);

};

 

Note:only distinct , filter , limit , and order are supported in foreach.

 

  • fragment-replicate join

 

jnd = join daily by (exchange, symbol), divs by (exchange, symbol) using 'replicated';

 

Pig implements the fragment-replicate join by loading the replicated input into Ha-doop’s distributed cache. All but the first relation will be load into memory.

Note:Fragment-replicate join supports only inner and left outer joins.

 

  • skew join

In many data sets, there are a few keys that have three or more orders of magnitude more records than other keys. This results in one or two reducers that will take much longer than the rest.

 

Skew join works by first sampling one input for the join. In that input it identifies any keys that have so many records that skew join estimates it will not be able to fit them all into memory. Then, in a second MapReduce job, it does the join. For all records except those identified in the sample, it does a standard join, collecting records with the same key onto the same reducer. Those keys identified as too large are treated differently. Based on how many records were seen for a given key, those records are split across the appropriate number of reducers. The number of reducers is chosen based on Pig’s estimate of how wide the data must be split such that each reducer can fit its split into memory. For the input to the join that is not split, those keys that were split are then replicated to each reducer that contains that key.

 

users = load 'users' as (name:chararray, city:chararray);
cinfo = load 'cityinfo' as (city:chararray, population:int);
jnd = join cinfo by city, users by city using 'skewed';

 

Note:Skew join can be done on inner or outer joins. However, it can take only two join inputs.Pig looks at the record sizes in the sample and assumes it can use 30% of the JVM’s heap to materialize records that will be joined. This can be controlled by parameter pig.skewedjoin.reduce.memusage

 

  • merge join

daily = load 'NYSE_daily_sorted' as (exchange:chararray, symbol:chararray,date:chararray,  open:float,                            high:float, low:float,close:float, volume:int, adj_close:float);
divs = load 'NYSE_dividends_sorted' as (exchange:chararray,   symbol:chararray,

                        date:chararray,    dividends:float);
jnd = join daily by symbol, divs by symbol using 'merge';

 

 

分享到:
评论

相关推荐

    Pig Latin: A Not-So-Foreign Language for Data Processing

    ### Pig Latin:一种用于数据处理的“非外语” #### 概述 《Pig Latin: A Not-So-Foreign Language for Data Processing》是一篇由Christopher Olston、Benjamin Reed、Utkarsh Srivastava、Ravi Kumar以及Andrew ...

    captcha-core-2.2.1-API文档-中英对照版.zip

    标签:pig4cloud、core、plugin、captcha、jar包、java、中英对照文档; 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中的代码和结构保持不变,注释和说明...

    oss-spring-boot-starter-1.0.3-API文档-中英对照版.zip

    标签:pig4cloud、spring、plugin、starter、boot、oss、jar包、java、中英对照文档; 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中的代码和结构保持...

    nacos-consistency-2.0.4.RELEASE-API文档-中文版.zip

    标签:pig4cloud、consistency、nacos、jar包、java、中文文档; 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中的代码和结构保持不变,注释和说明精准...

    日常生活英语单词必背.doc

    3. 学习用品: - pen:钢笔 - pencil:铅笔 - pencil-case:铅笔盒 - ruler:尺子 - book:书 - bag:包 - comic book:漫画书 - post card:明信片 - newspaper:报纸 - schoolbag:书包 - eraser:橡皮...

    Pep小学英语总复习单词归类表.doc

    3. 颜色(Colours): - red:红 - blue:蓝 - yellow:黄 - green:绿 - white:白 - pink:粉红 - purple:紫 - orange:橙 - brown:棕 - black:黑 4. 动物(Animals): - cat:猫 - dog:狗 - ...

    excel-spring-boot-starter-1.1.1-API文档-中文版.zip

    标签:pig4cloud、excel、spring、starter、boot、jar包、java、中文文档; 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中的代码和结构保持不变,注释和...

    冀教版小学英语总复习资料全.doc

    3. 房间名称类: - house: 房子 - apartment: 公寓 - room: 房间 - bedroom: 卧室 - livingroom: 客厅 - kitchen: 厨房 - bathroom: 卫生间 4. 家用物品类: - bed: 床 - television/TV: 电视 - table: ...

    外研社小学英语单词表默写.doc

    - 猪:pig - 鸡:chicken - 鸡蛋:egg - 瘦的:thin - 胖的:fat - 幼崽:cub - 小的:small - 大的:big - 粉红的:pink Module 6 更多动物和形容词的学习: - 蛇:snake 这些模块中的单词和短语都是小学英语学习...

    人教八年级上册单词短语句子翻译测试.doc

    - 猪:pig - 似乎:seem - 厌倦的:bored - 某人:someone - 日记:diary - 令人愉快的:pleasant - 活动:activity - 决定:decide - 尝试:try - 鸟:bird - 自行车:bicycle - 建筑物:building - ...

    captcha-core-2.2.1-API文档-中文版.zip

    标签:pig4cloud、core、plugin、captcha、jar包、java、中文文档; 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中的代码和结构保持不变,注释和说明精准...

    小学三年级英语单词表.doc

    3. **Unit 3** - 颜色: - red:红色的 - yellow:黄色的 - green:绿色的 - blue:蓝色的 - purple:紫色的 - white:白色的 - black:黑色的 - orange:橙色的 - pink:粉色的 - brown:棕色的 4. **...

    小学英语总复习词汇专项练习.doc

    3. **颜色(colours)** - 颜色:colour - 红:red - 蓝:blue - 黄:yellow - 绿:green - 白:white - 黑:black - 粉红:pink - 紫:purple - 橙:orange - 棕:brown 4. **动物(animals)** - 猫...

    nacos-naming-2.0.4.RELEASE-API文档-中文版.zip

    标签:pig4cloud、naming、nacos、jar包、java、中文文档; 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中的代码和结构保持不变,注释和说明精准翻译,请...

    excel-spring-boot-starter-1.1.1-API文档-中英对照版.zip

    标签:pig4cloud、excel、spring、starter、boot、jar包、java、中英对照文档; 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中的代码和结构保持不变,...

    nacos-api-2.0.4.RELEASE-API文档-中文版.zip

    标签:pig4cloud、api、nacos、jar包、java、中文文档; 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中的代码和结构保持不变,注释和说明精准翻译,请...

    oss-spring-boot-starter-1.0.3-API文档-中文版.zip

    对应Maven信息:groupId:com.pig4cloud.plugin,artifactId:oss-spring-boot-starter,version:1.0.3 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中...

    nacos-auth-2.0.4.RELEASE-API文档-中文版.zip

    标签:pig4cloud、auth、nacos、jar包、java、中文文档; 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中的代码和结构保持不变,注释和说明精准翻译,请...

    nacos-consistency-2.0.4.RELEASE-API文档-中英对照版.zip

    标签:pig4cloud、consistency、nacos、jar包、java、中英对照文档; 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中的代码和结构保持不变,注释和说明...

    nacos-naming-2.0.4.RELEASE-API文档-中英对照版.zip

    标签:pig4cloud、naming、nacos、jar包、java、中英对照文档; 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中的代码和结构保持不变,注释和说明精准翻译...

Global site tag (gtag.js) - Google Analytics