- 浏览: 149494 次
- 性别:
- 来自: 杭州
文章分类
最新评论
-
fei33423:
流程引擎的流转比状态机要复杂太多了把. 我觉得你即可以看成平行 ...
工作流引擎是否应该建立在有限状态机(Finite State Machine, FSM)引擎之上? -
c601097836:
为什么你们都说FSM,而不是UML State machine ...
工作流引擎是否应该建立在有限状态机(Finite State Machine, FSM)引擎之上? -
ronghao:
di1984HIT 写道类似啊,我现在也有这个困惑、我的工作流 ...
工作流引擎是否应该建立在有限状态机(Finite State Machine, FSM)引擎之上? -
di1984HIT:
不错啊。学习了。
[转]hadoop使用中的几个小细节(一) -
di1984HIT:
好的、
工作流20种基本模式的理解
22 Feb 2012, by Bright Zheng (IT进行时)
写在这章前面的几点牢骚或感慨:
1. 我发现建模是比较别扭的一件事情,尤其是你的脑子里都是RDBMS的ERD的时候;
2. 本人试图通过两者的建模过程体现思考要点,但感觉在NoSQL的建模上有点“那个”——如果不在大型项目上吃亏过或者直接受教于前辈,总感觉缺那么点味道;
3. 这篇是我写的最郁闷的一篇,而且可能后面需要无数个补丁,但管不了了,有错误才有感悟
5. Data Modeling
Data Modeling is one of the most important things in experiencing Cassandra, especially to those who have lots of experiences with RDBMS data modeling.
By admiring Twissandra project, we name it as Jtwissandra as an example. If possible, I’ll try to create and implement it and share it in GitHub.
This is a simple example to showcase the NoSQL concepts by admiring the Twitter via Cassandra.
5.1. Tranditional RDBMS Data Modeling
Following are the core Entities & Relationships if we’re modeling in RDBMS concepts.
Here are some pseudo codes for demonstrating the business logic/requirements:
1. Adding a new user:
USER.insert(user_id, user_name, user_password, create_timestamp); |
2. Following a friend:
FRIEND.insert(user_id, followed_id, create_timestamp) as ($current_user_id, user_id, create_timestamp); FOLLOWER.insert(user_id, follower_id, create_timestamp) as (user_id, $current_user_id, create_timestamp); |
3. Tweetting:
FRIEND.insert(user_id, followed_id, create_timestamp) as ($current_user_id, user_id, create_timestamp); FOLLOWER.insert(user_id, follower_id, create_timestamp) as (user_id, $current_user_id, create_timestamp); |
4. Getting Tweets (that are twitted by self and friends):
select * from TWEET t where t.user_id = $current_user_id or t.user_id in ( select followed_id from FRIEND where user_id = $current_user_id ) |
Comment:: What a bottleneck is here!! That’s also the most important reason why Twitter has to migrate to NoSQL solutions.
5.2. NoSQL Data Modeling
Before we go deeper of NoSQL data modeling with Cassandre, we must understand the key design points of it.
1. Cassandra is a key-value based model
2. Cassandra supports more complex modeling by importing the concept of Super Column
3. The data can be stored in two ways: as column names or as values (it’s really confusing for the beginners sometimes, but you will be free if you understand more especially on the indexing)
4. The Columns, normal Columns or Super ones, in the Column Family is sorted by Column Names, not values
So let’s get started.
We need to create the Keyspace first.
create keyspace JTWISSANDRA with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = [{replication_factor:1}]; |
Under this Keyspace, we’ll be working on the data modeling one by one.
5.2.1. User
The key points should be under consideration:
- The key we can simply use Time UUID
- The user_name must be (secondary) indexed because we may use it for search
- The create_timestamp should be (secondary) indexed because we may use it for search or some kinds of partitioning
- The password must be encoded as base64. No more CSDN story please.
So the sample data model will be as following:
ColumnFamily: USER |
||
Key |
Columns |
|
550e8400-e29b-41d4-a716-446655440000 |
name |
value |
|
“user_name” |
“itstarting” |
|
“password” |
"******" |
|
“create_timestamp” |
1329836819890000 |
550e8400-e29b-41d4-a716-446655440001 |
name |
value |
|
“user_name” |
“test1” |
|
“password” |
"******" |
|
“create_timestamp” |
1329836819890001 |
Here is the create script:
create column family USER with comparator = UTF8Type and key_validation_class = UTF8Type and default_validation_class = UTF8Type and column_metadata = [ {column_name: user_name, validation_class: UTF8Type, index_name:user_name_idx, index_type:KEYS } {column_name: user_password, validation_class: UTF8Type} {column_name: create_timestamp, validation_class: LongType, index_name:create_timestamp_idx, index_type:KEYS} ]; |
And the insert script/CLI for showcase only:
// insert user 550e8400-e29b-41d4-a716-446655440000 set USER[‘550e8400-e29b-41d4-a716-446655440000’][‘user_name’] = ‘itstarting’; set USER[‘550e8400-e29b-41d4-a716-446655440000’][‘password’] = ‘111222’; set USER[‘550e8400-e29b-41d4-a716-446655440000’][‘create_timestamp’] = 1329836819890000;
// insert user 550e8400-e29b-41d4-a716-446655440001 set USER[‘550e8400-e29b-41d4-a716-446655440001’][‘user_name’] = ‘test1; set USER[‘550e8400-e29b-41d4-a716-446655440001’][‘password’] = ‘222111’; set USER[‘550e8400-e29b-41d4-a716-446655440001’][‘create_timestamp’] = 1329836819890001; |
5.2.2. Friend
The friends mean: who are the user X following?
The key points should be under consideration:
- The key should be the uuid of the user X
- The timestamp when the relationship is built is the column (for friend sorting) and the friend’s uuid is the value. Wow again here. Right?
Let’s say the two users we created are friends each other.
So the sample data model will be as following:
ColumnFamily: FRIEND |
||
Key |
Columns |
|
550e8400-e29b-41d4-a716-446655440000 |
name |
value |
|
“1329836819859000” |
“550e8400-e29b-41d4-a716-446655440001” |
|
If the guy has more friends, insert colums here |
|
550e8400-e29b-41d4-a716-446655440001 |
name |
value |
|
“1329836819781000” |
“550e8400-e29b-41d4-a716-446655440000” |
|
If the guy has more friends, insert colums here |
The first record means the user X is 550e8400-e29b-41d4-a716-446655440000 and his/her friend is 550e8400-e29b-41d4-a716-446655440001 and the relationship is established at timestamp of 1329836819859000.
Here is the create script:
create column family FRIEND with comparator = UTF8Type LongType and key_validation_class = UTF8Type and default_validation_class = UTF8Type; |
No more column name definitions here? Yes, Cassandra is a so-called schema-free data store. Wow!
And the insert script/CLI for showcase only:
set FRIEND[‘550e8400-e29b-41d4-a716-446655440000’][‘1329836819859000’] = ‘550e8400-e29b-41d4-a716-446655440001;
set FRIEND[‘550e8400-e29b-41d4-a716-446655440001’][‘1329836819781000’] = ‘550e8400-e29b-41d4-a716-446655440000; |
5.2.3. Follower
The Follower is a reversed concept compared to Friend: Who are following user X?
The key points should be under consideration:
- The key should be the uuid of the user X
- The timestamp when the relationship is built is the column (for follower sorting) and the follower’s user uuid is the value.
Actually the logic should be within the same transaction of friend creation. So we’d like to follow the sample in Friend chapter.
So the sample data model will be as following:
ColumnFamily: FOLLOWER |
||
Key |
Columns |
|
550e8400-e29b-41d4-a716-446655440001 |
name |
value |
|
“1329836819859000” |
“550e8400-e29b-41d4-a716-446655440000” |
|
If the guy has more friends, insert colums here |
|
550e8400-e29b-41d4-a716-446655440000 |
name |
value |
|
“1329836819781000” |
“550e8400-e29b-41d4-a716-446655440001” |
|
If the guy has more friends, insert colums here |
Here is the create script:
create column family FOLLOWER with comparator = UTF8Type LongType and key_validation_class = UTF8Type and default_validation_class = UTF8Type; |
And the insert script/CLI for showcase only:
set FOLLOWER[‘550e8400-e29b-41d4-a716-446655440001’][‘1329836819859000’’] = ‘550e8400-e29b-41d4-a716-446655440000;
set FOLLOWER[‘550e8400-e29b-41d4-a716-446655440000’][‘329836819781000’] = ‘550e8400-e29b-41d4-a716-446655440001; |
5.2.4. Tweet & Timeline
The tweets are the soul of Twitter.
The key points should be under consideration:
- How to get my tweets?
- How to get my friends’ tweets without join?
- How to sort all tweets including mine and my friends’.
That’s why Twitter imported the concept of Timeline.
Let’s imagine something like this (please correct me if I’m wrong on following discussions):
<!--
Copied from: http://user.services.openoffice.org/en/forum/viewtopic.php?f=9&t=32185
All the events (tweets) are going along the time.
The Timeline means the line with the specified user’s all related events including
- The events (tweets) I sent
- The events (tweets) my friends sent
So the tweets are inserted to CF of Tweet but need to add one more CF: Timeline.
CAUTION: The following learning experiences/exercises might be not correct, please take your own risks if you still want to read on. But of course, any feedback is welcome.
5.2.4.1. Tweet
ColumnFamily: TWEET |
||
Key |
Columns |
|
550e8400-e29b-41d4-a716-446655440011 |
name |
value |
|
“user_uuid” |
“550e8400-e29b-41d4-a716-446655440000” |
|
“tweet_content” |
“Hello world: 11” |
550e8400-e29b-41d4-a716-446655440012 |
name |
value |
|
“user_uuid” |
“550e8400-e29b-41d4-a716-446655440000” |
|
“tweet_content” |
“Hello world: 12” |
550e8400-e29b-41d4-a716-446655440021 |
name |
value |
|
“user_uuid” |
“550e8400-e29b-41d4-a716-446655440001” |
|
“tweet_content” |
“Hello world: 21” |
550e8400-e29b-41d4-a716-446655440022 |
name |
value |
|
“user_uuid” |
“550e8400-e29b-41d4-a716-446655440001” |
|
“tweet_content” |
“Hello world: 22” |
Here is the create script:
create column family TWEET with comparator = UTF8Type and key_validation_class = UTF8Type and default_validation_class = UTF8Type and column_metadata = [ {column_name: user_uuid, validation_class: UTF8Type} {column_name: tweet_content, validation_class: UTF8Type} ]; |
And the insert script/CLI for showcase only:
set TWEET['550e8400-e29b-41d4-a716-446655440011']['user_uuid'] = '550e8400-e29b-41d4-a716-446655440000'; set TWEET['550e8400-e29b-41d4-a716-446655440011']['tweet_content'] = 'Hello world: 11';
set TWEET['550e8400-e29b-41d4-a716-446655440012']['user_uuid'] = '550e8400-e29b-41d4-a716-446655440000'; set TWEET['550e8400-e29b-41d4-a716-446655440012']['tweet_content'] = 'Hello world: 12';
set TWEET['550e8400-e29b-41d4-a716-446655440021']['user_uuid'] = '550e8400-e29b-41d4-a716-446655440001'; set TWEET['550e8400-e29b-41d4-a716-446655440021']['tweet_content'] = 'Hello world: 21';
set TWEET['550e8400-e29b-41d4-a716-446655440022']['user_uuid'] = '550e8400-e29b-41d4-a716-446655440001'; set TWEET['550e8400-e29b-41d4-a716-446655440022']['tweet_content'] = 'Hello world: 22'; |
5.2.4.2. Timeline
ColumnFamily: TIMELINE |
||
Key |
Columns |
|
550e8400-e29b-41d4-a716-446655440000 |
name |
value |
|
“1329883039824000” |
“550e8400-e29b-41d4-a716-446655440011” |
|
“1329883039825000” |
“550e8400-e29b-41d4-a716-446655440021” |
|
“1329883039934000” |
“550e8400-e29b-41d4-a716-446655440012” |
|
“1329883039935000” |
“550e8400-e29b-41d4-a716-446655440022” |
550e8400-e29b-41d4-a716-446655440001 |
name |
value |
|
“1329883039824000” |
“550e8400-e29b-41d4-a716-446655440011” |
|
“1329883039825000” |
“550e8400-e29b-41d4-a716-446655440021” |
|
“1329883039934000” |
“550e8400-e29b-41d4-a716-446655440012” |
|
“1329883039935000” |
“550e8400-e29b-41d4-a716-446655440022” |
Here is the create script:
create column family TIMELINE with comparator = UTF8Type LongType and key_validation_class = UTF8Type and default_validation_class = UTF8Type; |
And the insert script/CLI for showcase only:
set TIMELINE['550e8400-e29b-41d4-a716-446655440000']['1329883039824000'] = '550e8400-e29b-41d4-a716-446655440011'; set TIMELINE['550e8400-e29b-41d4-a716-446655440000']['1329883039825000'] = '550e8400-e29b-41d4-a716-446655440021'; set TIMELINE['550e8400-e29b-41d4-a716-446655440000']['1329883039834000'] = '550e8400-e29b-41d4-a716-446655440012'; set TIMELINE['550e8400-e29b-41d4-a716-446655440000']['1329883039835000'] = '550e8400-e29b-41d4-a716-446655440022';
set TIMELINE['550e8400-e29b-41d4-a716-446655440001']['1329883039824000'] = '550e8400-e29b-41d4-a716-446655440011'; set TIMELINE['550e8400-e29b-41d4-a716-446655440001']['1329883039825000'] = '550e8400-e29b-41d4-a716-446655440021'; set TIMELINE['550e8400-e29b-41d4-a716-446655440001']['1329883039834000'] = '550e8400-e29b-41d4-a716-446655440012'; set TIMELINE['550e8400-e29b-41d4-a716-446655440001']['1329883039835000'] = '550e8400-e29b-41d4-a716-446655440022'; |
发表评论
-
Apache Cassandra Learning Step by Step (5): 实战性的JTwissandra项目
2012-02-25 22:08 2629在完成了Apache Cassandra的四个基本学习步骤之后 ... -
Apache Cassandra Learning Step by Step (3): Samples ABC
2012-02-16 16:48 2777====16 Feb 2012, by Bright Zhen ... -
Apache Cassandra Learning Step by Step (2): Core Concepts
2012-02-15 21:04 2338====15 Feb 2012, by Bright Zhen ... -
Apache Cassandra Learning Step by Step (1)
2012-02-14 21:58 2691By Bright Zheng (IT进行时) 1. A ...
相关推荐
Learning Apache Cassandra - Second Edition by Sandeep Yarabarla English | 25 Apr. 2017 | ASIN: B01N52R0B5 | 360 Pages | AZW3 | 10.68 MB Key Features Install Cassandra and set up multi-node clusters ...
Spring Data for Apache Cassandra API。 Spring Data for Apache Cassandra 开发文档
通过学习《Learning Apache Cassandra 2015》这本书,我们不仅能够了解到Cassandra的基本概念和特性,还能深入了解如何使用Cassandra解决实际问题,包括如何设计表结构、组织数据以及执行高效查询等。此外,书中还...
在本文档中,标题“Learning_Apache_Cassandra”透露了内容的主题,即学习Apache Cassandra。Cassandra是一个开源的NoSQL分布式数据库管理系统,它以高可用性和分布式架构著称。该书详细介绍了Cassandra的基本概念、...
### Apache Cassandra 掌控指南 #### 一、引言 在大数据时代,高效的数据存储与管理变得至关重要。《Mastering Apache Cassandra》这本书旨在帮助读者掌握 Apache Cassandra 的核心技术和最佳实践,使其能够在处理...
Title: Mastering Apache Cassandra, 2nd Edition Author: Nishant Neeraj Length: 322 pages Edition: 2 Language: English Publisher: Packt Publishing Publication Date: 2015-02-27 ISBN-10: 1784392618 ISBN-...
Beginning Apache Cassandra Development introduces you to one of the most robust and best-performing NoSQL database platforms on the planet. Apache Cassandra is a document database following the JSON ...
Apache Cassandra是一个分布式的NoSQL数据库管理系统,它被设计用来处理大量的数据跨越多个数据中心。Cassandra对高性能、高可用性、可扩展性有着出色的支持,因此它特别适合于那些需要不断增长和变化的数据集的应用...
Cassandra is the technology of choice for such data-driven organizations as Netflix, eBay, Constant Contact, Comcast, and scores of others. In NOSQL WEB DEVELOPMENT WITH APACHE CASSANDRA, you will ...
Apache Cassandra 是一个分布式数据库系统,特别设计用于处理大规模数据,具备高可用性、线性可扩展性和优秀的性能。在这个"apache-cassandra-3.11.13"版本中,我们探讨的是Cassandra项目的其中一个稳定版本,它包含...
Apache Cassandra 是一个分布式NoSQL数据库系统,以高可用性、可扩展性和高性能著称。它设计用于处理大规模数据,尤其适合大数据分析和实时应用程序。在MATLAB开发环境中,与Apache Cassandra的集成允许用户通过...
Cassandra(apache-cassandra-3.11.11-bin.tar.gz)是一套开源分布式NoSQL数据库系统。它最初由Facebook开发,用于储存收件箱等简单格式数据,集GoogleBigTable的数据模型与Amazon Dynamo的完全分布式的架构于一身...
Apache Cassandra是一个开源的分布式NoSQL数据库管理系统,它最初由Facebook开发,并在2008年被捐献给了Apache软件基金会。Cassandra旨在解决大规模数据存储的问题,特别适用于那些需要高性能、可伸缩性以及高可用性...
Apache Cassandra is the perfect choice for building fault tolerant and scalable databases. Implementing Cassandra will enable you to take advantage of its features which include replication of data ...