- flatten
players = load 'baseball' as (name:chararray, team:chararray,position:bag{t:(p:chararray)}, bat:map[]);
pos= foreach players generate name, flatten(position) as position;
bypos= group pos by position;
Jorge Posada,New York Yankees,{(Catcher),(Designated_hitter)},...
==>
Jorge Posada,Catcher
Jorge Posada,Designated_hitter
Note:A foreach with a flatten produces a cross product of every record in the bag with all of the other expressions in the generate statement.If there is more than one bag and both are flattened, this cross product will be done with members of each bag as well as other expressions in the generate statement.
If the bag is empty, no records are produced. But you can avoid this by
noempty = foreach players generate name,
((position is null or IsEmpty(position)) ? {('unknown')} : position) as position;
Flatten can also be applied to a tuple. In this case, it does not produce a cross product;instead, it elevates each field in the tuple to a top-level field.
- Nested foreach
daily = load 'NYSE_daily' as (exchange, symbol); -- not interested in other fields
grpd = group daily by exchange;
uniqcnt = foreach grpd {
sym = daily.symbol;
uniq_sym = distinct sym;
generate group, COUNT(uniq_sym);
};
divs = load 'NYSE_dividends' as (exchange:chararray, symbol:chararray,date:chararray, dividends:float);
grpd = group divs by symbol;
top3 = foreach grpd {
sorted = order divs by dividends desc;
top = limit sorted 3;
generate group, flatten(top);
};
Note:only distinct , filter , limit , and order are supported in foreach.
- fragment-replicate join
jnd = join daily by (exchange, symbol), divs by (exchange, symbol) using 'replicated';
Pig implements the fragment-replicate join by loading the replicated input into Ha-doop’s distributed cache. All but the first relation will be load into memory.
Note:Fragment-replicate join supports only inner and left outer joins.
- skew join
In many data sets, there are a few keys that have three or more orders of magnitude more records than other keys. This results in one or two reducers that will take much longer than the rest.
Skew join works by first sampling one input for the join. In that input it identifies any keys that have so many records that skew join estimates it will not be able to fit them all into memory. Then, in a second MapReduce job, it does the join. For all records except those identified in the sample, it does a standard join, collecting records with the same key onto the same reducer. Those keys identified as too large are treated differently. Based on how many records were seen for a given key, those records are split across the appropriate number of reducers. The number of reducers is chosen based on Pig’s estimate of how wide the data must be split such that each reducer can fit its split into memory. For the input to the join that is not split, those keys that were split are then replicated to each reducer that contains that key.
users = load 'users' as (name:chararray, city:chararray);
cinfo = load 'cityinfo' as (city:chararray, population:int);
jnd = join cinfo by city, users by city using 'skewed';
Note:Skew join can be done on inner or outer joins. However, it can take only two join inputs.Pig looks at the record sizes in the sample and assumes it can use 30% of the JVM’s heap to materialize records that will be joined. This can be controlled by parameter pig.skewedjoin.reduce.memusage
- merge join
daily = load 'NYSE_daily_sorted' as (exchange:chararray, symbol:chararray,date:chararray, open:float, high:float, low:float,close:float, volume:int, adj_close:float);
divs = load 'NYSE_dividends_sorted' as (exchange:chararray, symbol:chararray,
date:chararray, dividends:float);
jnd = join daily by symbol, divs by symbol using 'merge';
相关推荐
### Pig Latin:一种用于数据处理的“非外语” #### 概述 《Pig Latin: A Not-So-Foreign Language for Data Processing》是一篇由Christopher Olston、Benjamin Reed、Utkarsh Srivastava、Ravi Kumar以及Andrew ...
标签:pig4cloud、core、plugin、captcha、jar包、java、中英对照文档; 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中的代码和结构保持不变,注释和说明...
标签:pig4cloud、spring、plugin、starter、boot、oss、jar包、java、中英对照文档; 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中的代码和结构保持...
标签:pig4cloud、consistency、nacos、jar包、java、中文文档; 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中的代码和结构保持不变,注释和说明精准...
3. 学习用品: - pen:钢笔 - pencil:铅笔 - pencil-case:铅笔盒 - ruler:尺子 - book:书 - bag:包 - comic book:漫画书 - post card:明信片 - newspaper:报纸 - schoolbag:书包 - eraser:橡皮...
3. 颜色(Colours): - red:红 - blue:蓝 - yellow:黄 - green:绿 - white:白 - pink:粉红 - purple:紫 - orange:橙 - brown:棕 - black:黑 4. 动物(Animals): - cat:猫 - dog:狗 - ...
标签:pig4cloud、excel、spring、starter、boot、jar包、java、中文文档; 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中的代码和结构保持不变,注释和...
3. 房间名称类: - house: 房子 - apartment: 公寓 - room: 房间 - bedroom: 卧室 - livingroom: 客厅 - kitchen: 厨房 - bathroom: 卫生间 4. 家用物品类: - bed: 床 - television/TV: 电视 - table: ...
- 猪:pig - 鸡:chicken - 鸡蛋:egg - 瘦的:thin - 胖的:fat - 幼崽:cub - 小的:small - 大的:big - 粉红的:pink Module 6 更多动物和形容词的学习: - 蛇:snake 这些模块中的单词和短语都是小学英语学习...
- 猪:pig - 似乎:seem - 厌倦的:bored - 某人:someone - 日记:diary - 令人愉快的:pleasant - 活动:activity - 决定:decide - 尝试:try - 鸟:bird - 自行车:bicycle - 建筑物:building - ...
标签:pig4cloud、core、plugin、captcha、jar包、java、中文文档; 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中的代码和结构保持不变,注释和说明精准...
3. **Unit 3** - 颜色: - red:红色的 - yellow:黄色的 - green:绿色的 - blue:蓝色的 - purple:紫色的 - white:白色的 - black:黑色的 - orange:橙色的 - pink:粉色的 - brown:棕色的 4. **...
3. **颜色(colours)** - 颜色:colour - 红:red - 蓝:blue - 黄:yellow - 绿:green - 白:white - 黑:black - 粉红:pink - 紫:purple - 橙:orange - 棕:brown 4. **动物(animals)** - 猫...
标签:pig4cloud、naming、nacos、jar包、java、中文文档; 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中的代码和结构保持不变,注释和说明精准翻译,请...
标签:pig4cloud、excel、spring、starter、boot、jar包、java、中英对照文档; 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中的代码和结构保持不变,...
标签:pig4cloud、api、nacos、jar包、java、中文文档; 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中的代码和结构保持不变,注释和说明精准翻译,请...
对应Maven信息:groupId:com.pig4cloud.plugin,artifactId:oss-spring-boot-starter,version:1.0.3 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中...
标签:pig4cloud、auth、nacos、jar包、java、中文文档; 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中的代码和结构保持不变,注释和说明精准翻译,请...
标签:pig4cloud、consistency、nacos、jar包、java、中英对照文档; 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中的代码和结构保持不变,注释和说明...
标签:pig4cloud、naming、nacos、jar包、java、中英对照文档; 使用方法:解压翻译后的API文档,用浏览器打开“index.html”文件,即可纵览文档内容。 人性化翻译,文档中的代码和结构保持不变,注释和说明精准翻译...