https://storm.apache.org/documentation/Trident-tutorial.html
The Trident data model is the TridentTuple which is a named list of values. During a topology, tuples are incrementally built up through a sequence of operations. Operations generally take in a set of input fields and emit a set of "function fields". The input fields are used to select a subset of the tuple as input to the operation, while the "function fields" name the fields the operation emits.
Consider this example. Suppose you have a stream called "stream" that contains the fields "x", "y", and "z". To run a filter MyFilter that takes in "y" as input, you would say:
stream.each(new Fields("y"), new MyFilter())
Suppose the implementation of MyFilter is this:
public class MyFilter extends BaseFilter {
public boolean isKeep(TridentTuple tuple) {
return tuple.getInteger(0) < 10;
}
}
This will keep all tuples whose "y" field is less than 10. The TridentTuple given as input to MyFilter will only contain the "y" field. Note that Trident is able to project a subset of a tuple extremely efficiently when selecting the input fields: the projection is essentially free.
Let's now look at how "function fields" work. Suppose you had this function:
public class AddAndMultiply extends BaseFunction {
public void execute(TridentTuple tuple, TridentCollector collector) {
int i1 = tuple.getInteger(0);
int i2 = tuple.getInteger(1);
collector.emit(new Values(i1 + i2, i1 * i2));
}
}
This function takes two numbers as input and emits two new values: the addition of the numbers and the multiplication of the numbers. Suppose you had a stream with the fields "x", "y", and "z". You would use this function like this:
stream.each(new Fields("x", "y"), new AddAndMultiply(), new Fields("added", "multiplied"));
The output of functions is additive: the fields are added to the input tuple. So the output of this each call would contain tuples with the five fields "x", "y", "z", "added", and "multiplied". "added" corresponds to the first value emitted by AddAndMultiply, while "multiplied" corresponds to the second value.
With aggregators, on the other hand, the function fields replace the input tuples. So if you had a stream containing the fields "val1" and "val2", and you did this:
stream.aggregate(new Fields("val2"), new Sum(), new Fields("sum"))
The output stream would only contain a single tuple with a single field called "sum", representing the sum of all "val2" fields in that batch.
With grouped streams, the output will contain the grouping fields followed by the fields emitted by the aggregator. For example:
stream.groupBy(new Fields("val1"))
.aggregate(new Fields("val2"), new Sum(), new Fields("sum"))
In this example, the output will contain the fields "val1" and "sum".
相关推荐
**PG-Strom:利用GPU加速PostgreSQL查询执行** PG-Strom是PostgreSQL数据库的一个扩展,它引入了Foreign Data Wrapper (FDW)模块,允许数据库利用GPU的并行计算能力进行异步超并行查询处理。这个创新技术显著提高了...
strom飞哥研究Strom大数据处理系统
斯特罗姆简单的流状态管理器。 受到启发。 浏览器必须支持new Map , new Set和Symbol... modify ( ( value , state ) => { // Perform modifications here and return modified state return { ... state , ... value
【标题】"workshop-tinkerforge-strom"是一个关于通过JavaFX进行电流和电压可视化的研讨会,主要针对TinkerForge设备和Devoxx4Kids活动。这个项目旨在教育孩子们理解和探索电子学的基本概念,同时也为成年人提供了一...
3. **fail()**:如果Bolt无法处理一个tuple,调用此方法通知Strom,系统会尝试重新发送该tuple。 4. **TopologyBuilder**:用于构建拓扑的类,通过添加spouts和bolts,设置stream groupings。 **三、工作流程** 1...
- 安装编译好的 PG_strom: `make install`。 ##### 3. 修改配置文件 - 修改 `/usr/pgsql-9.5/data/postgresql.conf` 文件,添加 `shared_preload_libraries='$libdir/pg_strom'` 选项,以启用 PG_strom 扩展。 #### ...
【Strom:强大的WebService接口测试工具】 WebService接口测试是软件开发过程中不可或缺的一环,它确保了服务间的通信正常且高效。Strom是一款优秀的测试工具,专为开发者设计,用于快速、方便地对WebService接口...
- **Tuple**:Tuple是Storm中的基本数据单元,类似于键值对,用于在Spouts和Bolts之间传递数据。 2. **Strom优化策略**: - **并行度优化**:调整Spouts和Bolts的worker进程和task数量,以充分利用硬件资源。更高...
strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的jar包strom的...
【大数据与云计算教程课件】中的“31.Strom”部分详细介绍了实时数据处理框架Storm。Storm是由Twitter开源的,旨在解决随着互联网急剧发展而产生的海量数据实时处理需求。相较于传统的Hadoop,Storm在实时计算方面...
【Strom webService测试工具】 在IT行业中,Web服务测试是确保应用程序质量的重要环节,而Strom webService测试工具就是一款专为此目的设计的高效工具。与广为人知的soapUI相比,Strom可能提供了独特的特性和优势,...
Big Data:Principles and best practices of scala 1. A new paradigm for Big Data - FREE 2. Data model for Big Data - AVAILABLE 3. Data storage on the batch layer 4. MapReduce and batch processing 5. ...
标题中的"Strom项目依赖"指的是Apache Storm项目在开发过程中所依赖的各种库文件,这些文件通常是Java Archive (JAR) 文件,用于包含Java类和其他资源,使得不同项目可以在运行时共享代码。Apache Storm是一个分布式...
strom zookeeper kafka 部署文档 原理解析
**PostgreSQL数据库插件PG-Strom** PG-Strom是一款针对PostgreSQL数据库的高性能计算扩展,它利用GPU(图形处理器)的并行计算能力,优化数据库的查询处理,尤其是在大数据量和复杂计算场景下表现优越。PG-Strom的...
strom介绍,包括出现背景,应用场景,环境搭建,基本架构。
##介绍 XMemcached 是一个高性能、易用的 Java 阻塞多线程 memcached 客户端。 它基于 nio 并经过精心设计以获得最佳性能。 ##新闻和下载 。 Maven 依赖: ... <artifactId>xmemcached <version>{version} ...
【标题】"超级简单入门的storm的java代码demo"提供了对Apache Storm的初步理解与实践。Apache Storm是一个开源的分布式实时计算系统,它允许开发者处理无界数据流,具有高容错性和可扩展性。本示例项目适用于Java...
在安装Strom之前,我们需要对基础环境进行配置。首先,要修改服务器的主机名,这可以通过编辑`/etc/sysconfig/network`文件实现。找到`HOSTNAME`行,并将其值更改为所需的主机名,例如`storm1`、`storm2`或`storm3`...
kafka-and-strom-event-processing-in-realtime-131023085422-phpapp01.pdf