流处理的特征与对比

hxrs

浏览: 26529 次
性别:
来自: 南京

最近访客更多访客>>

qqggcc

yinxiangbing

qindongliang1922

我只是一名程序园

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

规则引擎

SQL Access UP

——读The 8 Requirements of Real-Time Stream Processing笔记

这篇文章介绍了8条实时流处理所需要的要求与特征，然后对比了传统DBMS（或者是内存DBMS），Rule engine与SPE在处理流数据方面所能达到的上面8条特征中的几条。

1. Eight Rules for stream processing

Rule 1: Keep the data moving

The first requirement for a real-time stream processing system is to process messages “in-stream”, without any requirement to store them to perform any operation or sequence of operations. Ideally the system should also use an active (i.e., non-polling) processing model.

Rule 2: Query using SQL on Streams(StreamSQL)

The second requirement is to support a high-level “StreamSQL” language with built-in extensible stream-oriented primitives and operators.

Rule 3: Handle stream imperfections(delayed, missing, and out-of-order data)

The third requirement is to have built-in mechanisms to provide resiliency against stream “imperfections”, including missing and out-of-order data, which are commonly present in real-world data streams.

Rule 4: Generate Predictable Outcomes

The fourth requirement is that a stream processing engine must guarantee predictable and repeatable outcomes.

Rule 5: Integrate Stored and Streaming Data

The fifth requirement is to have the capability to efficiently store, access, and modify state information, and combine it with live streaming data. For seamless integration, the system should use a uniform language when dealing with either type of data.

Rule 6: Guarantee Data Safety and Availability

The sixth requirement is to ensure that the applications are up and available, and the integrity of the data maintained at all times, despite failures.

Rule 7: Partition and Scale Applications Automatically

The seventh requirement is to have the capability to distribute processing across multiple processors and machines to achieve incremental scalability. Ideally, the distribution should be automatic and transparent.

Rule 8: Process and Respond Instantaneously

The eighth requirement is that a stream processing system must have a highly-optimized, minimal-overhead execution engine to deliver real-time response for high-volume applications.

2. DBMS, Rule Engine, SPE对比

DBMS在处理数据上是先存储后处理的，即“process-after-store” model. 所以在处理实时数据流方面天生就不是适合,尽管可以利用内存数据库来缓和效率方面的弱势，同时其也具备trigger的特性，但所有这些都不显得不够可扩展。

Rule Engine 虽然某种程度上能够处理实时的数据流，但其在Rule Language方面有欠缺，不能够拥有类似SQL的表达能力。对数据流的处理操作有限。

只有SPE是专门为处理实时流数据定做的。有许多天生的特性，专门用来处理和操作流数据。

下面就是它们的一个对比：

	DBMS	Rule engine	SPE
Keep the data moving	No	Yes	Yes
SQL on streams	No	No	Yes
Handle stream imperfections	Difficult	Possible	Possible
Predictable outcome	Difficult	Possible	Possible
High availability	Possible	Possible	Possible
Stored and streamed data	No	No	Yes
Distribution and scalability	Possible	Possible	Possible
Instantaneous response	Possible	Possible	Possible