http://uima.apache.org/downloads/releaseDocs/2.2.2-incubating/docs/html/overview_and_setup/overview_and_setup.html#ugr.ovv.eclipse_setup
1.2
How to use the documentation
artifact: 文本、音频、视频、图片等
Analysis Engines(AEs):对artifact进行分析
Analysis results:是经过AEs处理得到的结果,是可meta-data of original artifact。根据你想分析的内容得到一系列的
statements比如"Bush" denotes a person. the topic of the document is "Bush and Golf".
Type: pre-defined term. 比如person, the topic of the document.是你想要AE分析得到的结果
Annotation Type: begin and end. positions in document.
tightly-coupled: running in the same process
loosely-coupled: running in separate processes or even on different machines
step 2:
不同的component analytics解决analysis task的不同部分,比如一个analysis persion name。一个analysis persion relationship。这些component analytics要容易组装。
Annotators: 是AE的核心。分析的工作就是由Annotators完成的,在Annotators里,developers自定义自己做什么分析。
CAS(common analysis Results): representing analysis results.
Component Descriptors(CD):用XML表示,contains metadata describing the component, its identity, structure and behavior。
delegate analysis engines:The internal AEs specified in an aggregate are also called the delegate analysis engines.
Analysis Engine Assembler:We refer to the development role associated with building an aggregate from delegate AEs as the Analysis Engine Assembler .
Collection Reader: its job is to connect to and iterate through a source collection, acquiring documents and initializing CASes for analysis.
CAS Consumers: Their job is to do the
final CAS processing. A CAS Consumer may be implemented, for example, to
index CAS contents in a search engine
Collection Processing Engine (CPE): is an aggregate component that specifies a “source to sink” flow from a Collection Reader though a set of analysis engines and then to a set of CAS Consumers.
CPE Descriptors: CPEs are specified by XML files called CPE Descriptors.
2.UIMA是链接unstructred information 和 structred information的桥梁。它的作用就是对unstructred information的内容进行分析,得到structred information。
3. For each person found in the body of a document, the AE would
create a Person object in the CAS and link it to the span of text where the person was mentioned in the document.
4. UIMA Context: can ensure that different annotators working together in an aggregate flow may share the
same instance of an external file,
5.A simple or primitive UIMA Analysis Engine (AE) contains a single annotator. AEs, however, may be defined to contain other AEs organized in a workflow.
6.Users of this AE need not know how it is constructed internally but only need its name and its published
input requirements and
output types. These must be declared in the aggregate AE's descriptor. Aggregate AE's descriptors declare the components they contain and a
flow specification. The flow specification defines the order in which the internal component AEs should be run.
7.We refer to the development role associated with building an aggregate from delegate AEs as the Analysis Engine Assembler .
8.The UIMA framework implementation has
two built-in flow implementations: one that support a linear flow between components, and one with conditional branching based on the language of the document.
It also supports user-provided flow controllers.
9.The application then decides
what to do with the returned CAS. There are many possibilities. For instance the application could: display the results, store the CAS to disk for post processing, extract and index analysis results as part of a search or database application etc.
10.An Analysis Engine (AE) may contain a single annotator (this is referred to as a
Primitive AE), or it may be a composition of others and therefore contain multiple annotators (this is referred to as an
Aggregate AE).
Annotators produce their analysis results in the form of typed
Feature Structures, which are simply data structures that
have a type and a set of (attribute, value) pairs.
All feature structures, including annotations, are represented in the UIMA
Common Analysis Structure(CAS).
native Java interface to the CAS called the JCas. The JCas
represents each feature structure as a Java object
Keep in mind that the CAS can represent arbitrary types of feature structures, and
feature structures can refer to other feature structures.
UIMA defines basic primitive types such as Boolean, Byte, Short, Integer, Long, Float, and Double, as well as Arrays of these primitive types. UIMA also defines the built-in types
TOP, which is the root of the type system,
analogous to Object in Java;
FSArray, which is an array of Feature Structures (i.e.
an array of instances of TOP); and Annotation
The built-in
Annotation type declares three
fields (called Features in CAS terminology). The features
begin and
end store the character offsets of the span of text to which the annotation refers. The feature
sofa (Subject of Analysis) indicates which document the begin and end offsets point into.
Annotator implementations all implement a standard interface (AnalysisComponent), having several methods, the most important of which are:
1) initialize, 2) process, 3) destroy.
initialize is called by the framework
once when it first creates an instance of the annotator class.
process is called once per item being processed. destroy may be called by the application when it is done using your annotator. There is a default implementation of this interface for annotators using the JCas, called
JCasAnnotator_ImplBase, which has implementations of all required methods except for the process method.
we call
annotation.addToIndexes() to add the new annotation to the indexes maintained in the CAS. By default, the CAS implementation used for analysis of text documents keeps an index of all annotations
in their order from beginning to end of the document. Subsequent annotators or applications use the indexes to
iterate over the annotations.
If you don't add the instance to the indexes, it cannot be retrieved by down-stream annotators
On the
Capabilities page, we define our annotator's
inputs and outputs, in terms of the types in the type system.
Although capabilities come in sets,
having multiple sets is deprecated; here we're just using one set. The RoomNumberAnnotator is very simple.
It requires no input types, as it operates directly on the document text -- which is supplied as a part of the CAS initialization
UIMA allows annotators to declare
configuration parameters in their descriptors. The descriptor also specifies
default values for the parameters, though these
can be overridden at runtime.
initialize method is a good place to read configuration parameter values.
The UIMA framework ensures that an Annotator instance is called by
only one thread at a time. An instance never has to worry about running some method on one thread, and then asynchronously being called using another thread. When multiple threading is wanted, for performance,
multiple instances of the Annotator are created, each one running on just one thread.
分享到:
相关推荐
【IBM推出UIMA】这个标题提到的是IBM引入了一项名为UIMA(Unstructured Information Management Architecture,非结构化信息管理架构)的技术。UIMA是IBM开发的一个开源框架,主要用于分析、理解和提取非结构化数据...
标题中的“UIMA自带资源的介绍”指的是一种名为Unstructured Information Management Architecture(UIMA)的框架,它是由Apache软件基金会开发的,主要用于处理非结构化的信息,如文本、语音等。UIMA的设计目的是...
jar包,亲测可用
Java UIMA(Unstructured Information Management Architecture)框架是一个用于分析大量非结构化信息的开源工具集。UIMA提供了处理文本、语音和其他数据类型的组件,这些组件可以进行语义分析、信息提取、关系抽取...
Java的UIMA(Unstructured Information Management Architecture)是一个开源框架,由Apache基金会开发,主要用于处理非结构化信息,如文本、语音等。UIMA提供了一种标准化的方法来分析、标记和检索此类信息。在这个...
**Java的UIMA注解类 uimaFIT** UIMA(Unstructured Information Management Architecture)是由Apache软件基金会开发的一个框架,主要用于处理非结构化信息,如文本、语音和图像数据。它提供了一种标准的方式来分析...
Java的UIMA(Unstructured Information Management Architecture)框架是Apache软件基金会开发的一个开源项目,主要用于处理非结构化的信息,如文本、语音等。UIMA提供了一种标准的方式来分析、标记和检索此类信息,...
Java的UIMA(Unstructured Information Management Architecture)框架是Apache开发的一个用于分析大量非结构化信息的开源工具。它提供了一种标准的方式来处理、管理和理解文本、图像等非结构化数据。UIMA允许开发者...
Java的UIMA(Unstructured Information Management Architecture)框架是Apache软件基金会开发的一个开源项目,主要用于处理非结构化的信息,如文本、语音等。UIMA提供了一种标准的方式来分析、标记和检索此类信息,...
Unstructured Information Management applications are software systems that analyze large volumes ...官网在http://uima.apache.org/。 网络环境不方便到官网下载的话可以下载本资源。 有兴趣的同学可以学习一下。
Java的UIMA(Unstructured Information Management Architecture)框架是Apache软件基金会开发的一个开源项目,主要用于处理非结构化的信息,如文本、语音和图像等。UIMA提供了处理这些数据的工具和服务,包括分析...
JAVA源码Java的UIMA注解类uimaFIT
java资源Java的UIMA注解类 uimaFIT提取方式是百度网盘分享地址
使用UIMA和DB2IntelligentMiner进行文本挖掘.pdf
textimager-uima 基于Apache UIMA框架和DKPro的自然语言处理软件组件。引用Wahed Hemati,Tolga Uslu,Alexander Mehler:TextImager:一个用于NLP的基于UIMA的分布式系统。 COLING(Demos)2016:59-63
《DKPro Core:Apache UIMA框架下的自然语言处理组件库》 DKPro Core,全称为Deutsche Konrad-Probe Labor für Sprachtechnologie Core,是一款基于Apache UIMA(Unstructured Information Management ...
**UIMA意见分析组件** UIMA(Unstructured Information Management Architecture,非结构化信息管理架构)是Apache软件基金会开发的一个开源框架,主要用于处理和分析非结构化的数据,如文本、音频和图像。它提供了...
Unstructured Information Management applications are software systems that analyze large volumes ...官网在http://uima.apache.org/。 网络环境不方便到官网下载的话可以下载本资源。 有兴趣的同学可以学习一下。
UIMA 支持库。 这个项目包括一些基本的、简单的、uima 管道和分析引擎,这些引擎来自 UIMA 和 uimaFIT 教程,以及过去的 CCP UIMA 工作。 为了简化和提高可访问性,这项工作与过去在 CCP 的 UIMA 工作不同,因为它不...