流信息处理:从数据流到复杂事件处理
——读《Processing Flows of Information: From Data Stream to Complex Event
Processing》笔记
偶然搜到这篇文章,其对目前data stream management
system 以及complex event
processing 系统有一个比较全面的介绍与调研,并对比了其中各个典型产品之间的特点。
1. Introduction
An increasing number of distributed
applications requires processing of continuously flowing data from
geographically distributed sources with unpredictable rate to obtain timely
responses to complex queries. After several years of research and development
we can say that two models emerged and are today competing: the data stream processing model [Babcock
et al. 2002] and the complex event
processing model [Luckham 2001].
DSMSs are specialized in dealing with
transient data that is continuously updated. On the other side, the
complex event processing model sees flowing information items as notifications
of events happened in the external world, which have to be filtered and
combined to deduce what is happening in terms of higher-level events.
2. Background and motivation
With
the term Information Flow Processing (IFP) we refer to an application domain in
which users need to collect information produced by multiple, distributed
sources, to process it in a timely way, in order to extract new knowledge as
soon as the relevant information is collected.
As we mentioned, IFP has attracted the
attention of researchers coming from different fields. The first contributions
came from the database community in the form of active database systems, which were introduced to allow actions to
automatically execute when given conditions arise. Data Stream Management Systems (DSMSs) pushed this idea further, to
perform query processing in the presence of continuous data streams. In the
same years that saw the development of DSMSs, researchers with different backgrounds
identified the need of developing systems capable of processing not generic
data but event notifications, coming from different sources, to identify interesting
situations [Luckham 2001]. These systems are usually known as Complex Event Processing (CEP) Systems.
Active Database Systems. 传统的DBMS是Human-Active, Database-Passive (HADP)的,而Active Database
System 克服了这点限制。
Data Stream Management Systems. 上面提及的Active
Database Systems还是限制于静态的数据存储,而DSMS突破了这个限制。users do not have to explicitly ask for updated information, rather
the system actively notifies them according to installed queries. 这种形式的交互也称为:Database
Active, Human-Passive (DAHP).
Complex Event Processing Systems. 上面提及的DSMS把那些需要处理的数据的语义留给客户端程序去解释。而CEP却是,they are notifications of events happened in the external world and
observed by sources. The CEP engine is responsible for filtering and combining
such notifications to deduce what is happening in terms of higher-level events
(sometime also called composite events or situations) to be notified to sinks,
which act as event consumers.
DSMSs and CEP engines.前者主要focus在flowing data
and data transformations. 而CEP engines, either those developed as extensions of publish-subscribe
middleware or those developed as totally new systems,他们focus在processing
event notifications with their ordering relationships to capture complex event
patterns; and on the communication aspects involved in event processing.
所以IFP需要考虑结合DSMS以及CEP的优点,既考虑effective
data processing, 同时也including the ability to capture complex ordering relationships
among data, as well as effcient event delivery, including the ability to process
data in a strongly distributed fashion.
3. A MODELLING FRAMEWORK FOR IFP
SYSTEMS
IFP的功能模型:
In
summary, an IFP engine operates as follows: each time a new item (including those
periodically produced by the Clock)
enters the engine through the Receiver,
a detection-production cycle is performed. Such a cycle first (detection phase)
evaluates all the rules currently present in the Rules store to find those
whose condition part is true. At the end of this phase we have a set of rules
that have to be executed, The Producer takes this information and executes each
triggered rule.
处理模型:Selection policy. Consumption policy: zero consumption
policy, selected consumption policy.
Deployment Model:centralized
vs. distributed:clustered vs. networked.
clustered and networked engines focus
on different aspects: the former on increasing the available processing power by
sharing the workload among a set of well connected machines, the latter on minimizing
bandwidth usage by processing information as close as possible to the sources.
交互模型: push/pull.
Data Model: tuples, records. homogeneous
information flows vs. heterogeneous flows.
Time Model: stream-only, absolute, causal,
interval.
Rule Model: transforming rules and
detecting rules.
Language Type: Transforming languages
and Detecting, or pattern-based languages.
分享到:
相关推荐
数据流并鉴别重要事件的能力,虽然对这些事件的鉴别过程是复杂的,但结果却是无价 的。复杂事件处理能够帮助企业及时全面地洞察市场变化,降低风险和提高决策效率。下 面我们就来介绍一下复杂事件处理
复杂事件处理(Complex Event Processing,简称CEP)是一种重要的信息技术领域,它关注的是如何从大量连续流动的数据中提取有意义的信息,并据此做出及时的响应。CEP技术特别适用于需要实时分析大量数据的应用场景,...
本文主要关注的是名为Eagle的复杂事件处理系统,它建立在数据流处理系统Argus的基础上,目的是解决传统数据流处理系统无法处理复杂事件的问题。 简单事件处理系统(DSPS)通常只能够处理基本的过滤和聚合操作,但...
传统信息处理方法无法满足快速、实时、海量数据处理的需求,因此复杂事件处理系统的建设显得尤为必要。 ### 实现方案概述 本文提出的实现方案是利用开源框架Storm作为主要的实时流式处理框架,并结合策略处理引擎、...
CEP是一种特殊的处理模式,它能够从实时的、不断变化的数据流中检测、识别和分析事件,以发现其中的复杂模式和关联。与传统的数据库查询不同,CEP关注的是事件之间的关系,而不仅仅是单个事件。事件流分析是CEP的...
- **数据融合**:结合来自不同数据源的信息,形成更加丰富和复杂的数据流。 - **增值信息计算**:计算有助于快速决策的附加价值信息。 - **模式检测**:监控特定的状况或模式以便做出即时响应。 - **高级信息生成**...
根据提供的信息,我们可以了解到这是一篇关于“房产管理系统的数据流图”的文章。该系统旨在通过数据流图和数据字典来实现对房产的有效管理。接下来,我们将详细解析标题、描述以及部分给出的内容中所涉及的关键知识...
这些算法对于提升事件处理效率、降低响应时间至关重要,尤其是在需要处理大规模数据流的场景中。 为了验证所提出的分布式复杂事件处理引擎的实际效果,研究者们设置了包含简单事件和复杂事件两种场景的模拟测试环境...
数据流图(Data Flow Diagram,简称DFD)是软件设计中一种重要的图形表示方法,用于描绘信息处理过程中的数据流动和变换。它由四种基本图形符号组成:源点/终点、处理、数据存储和数据流。 1. **图形符号含义**: ...
网络通信代价高主要是因为RFID阅读器采集到的数据需要通过网络传输到一个中心节点进行处理,处理效率低则是因为所有数据流汇聚到一个节点进行事件检测,导致该节点的计算和存储压力大,瓶颈效应明显。针对这些问题,...
- **通过时间窗口进行数据探查**:CEP可以定义特定的时间窗口来过滤和处理数据流中的事件,从而实现更精细的控制。 - **实时性高**:相较于传统的数据库系统,CEP能够在极短的时间内完成事件的处理,响应速度快。 - ...
1. 实时流数据处理:在大数据分析领域,流数据处理指的是对连续不断、实时到达的数据流进行快速处理的过程。这一过程对数据分析的实时性有很高的要求,适用于需要即时反馈的应用场景,例如在线系统对用户行为的追踪...
2. **数据流**(Data Flow):表示数据在系统中的流动路径,即数据如何从一个处理节点传递到另一个处理节点。 3. **处理过程**(Process):表示对数据进行操作或转换的功能单元。 4. **数据存储**(Data Store):...
选课系统是一个复杂的系统,涉及到多个模块和数据流。下面是对选课系统的详细介绍: 学生模块 * 学生信息:学号、登录密码、学生姓名、性别、班号 * 数据流: + 学生信息来源:学生去向:在线选课 + 简述:按其...
临床医疗设备数据处理的数据流模型涉及到多个核心知识点,这些知识点主要包括临床医疗数据的特性、数据流模型的概念、以及在临床医疗领域内处理数据流的方法和挑战。 首先,临床医疗数据的特性。在重症监护室等高...