storm trident api

blackproof

浏览: 1403561 次
性别:
来自: 北京

最近访客更多访客>>

lingxiajiudu

youtao531

mengjingwo

xuycan

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

storm

storm trident api

Trident API

partition本地操作，无需网络io

等同于pig的generate

mystream.each(new Fields("b"), new MyFunction(), new Fields("d")))

public class MyFunction extends BaseFunction {

public void execute(TridentTuple tuple, TridentCollector collector) {

for(int i=0; i < tuple.getInteger(0); i++) {

collector.emit(new Values(i));

}

等同于pig的filter

mystream.each(new Fields("b", "a"), new MyFilter())

public class MyFilter extends BaseFilter {

public boolean isKeep(TridentTuple tuple) {

return tuple.getInteger(0) == 1 && tuple.getInteger(1) == 2;

}

partitionAggregate

等同于pig的combine操作（三种aggregate接口）

mystream.partitionAggregate(new Fields("b"), new Sum(), new Fields("sum"))

mystream.chainedAgg()

.partitionAggregate(new Count(), new Fields("count"))

.partitionAggregate(new Fields("b"), new Sum(), new Fields("sum"))

.chainEnd()

@@@

public class Count implements CombinerAggregator<Long> {

public Long init(TridentTuple tuple) {

return 1L;

}

public Long combine(Long val1, Long val2) {

return val1 + val2;

}

public Long zero() {

return 0L;

}

@@@

public class Count implements ReducerAggregator<Long> {

public Long init() {

return 0L;

}

public Long reduce(Long curr, TridentTuple tuple) {

return curr + 1;

}

//最底层的aggregate，每个方法都有collector

public class CountAgg extends BaseAggregator<CountState> {

static class CountState {

long count = 0;

}

public CountState init(Object batchId, TridentCollector collector) {

return new CountState();

}

public void aggregate(CountState state, TridentTuple tuple, TridentCollector collector) {

state.count+=1;

}

public void complete(CountState state, TridentCollector collector) {

collector.emit(new Values(state.count));

}

---------------------

stateQuery and partitionPersist

--------------------------

projection

mystream.project(new Fields("b", "d"))

---------------------------

Repartitioning operations

shuffle: Use random round robin algorithm to evenly redistribute tuples across all target partitions

broadcast: Every tuple is replicated to all target partitions. This can useful during DRPC – for example, if you need to do a stateQuery on every partition of data.

partitionBy: partitionBy takes in a set of fields and does semantic partitioning based on that set of fields. The fields are hashed and modded by the number of target partitions to select the target partition. partitionBy guarantees that the same set of fields always goes to the same target partition.

global: All tuples are sent to the same partition. The same partition is chosen for all batches in the stream.

batchGlobal: All tuples in the batch are sent to the same partition. Different batches in the stream may go to different partitions.

partition: This method takes in a custom partitioning function that implements backtype.storm.grouping.CustomStreamGrouping

----------------------------

Aggregation operations

mystream.aggregate(new Count(), new Fields("count"))

----------------------------

等同pig group by

Operations on grouped streams

groupBy(new Fields("word"))

--------------------------------

不同于sql的joins，做的是一个batch的join

Merges and joins

Here's an example join between a stream containing fields ["key", "val1", "val2"] and another stream containing ["x", "val1"]:

topology.join(stream1, new Fields("key"), stream2, new Fields("x"), new Fields("key", "a", "b", "c"));

分享到：

storm drpc | kafka client端 producer

2015-06-23 16:02
浏览 1694
评论(0)
分类:企业架构
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论