- 浏览: 149512 次
- 性别:
- 来自: 杭州
文章分类
最新评论
-
fei33423:
流程引擎的流转比状态机要复杂太多了把. 我觉得你即可以看成平行 ...
工作流引擎是否应该建立在有限状态机(Finite State Machine, FSM)引擎之上? -
c601097836:
为什么你们都说FSM,而不是UML State machine ...
工作流引擎是否应该建立在有限状态机(Finite State Machine, FSM)引擎之上? -
ronghao:
di1984HIT 写道类似啊,我现在也有这个困惑、我的工作流 ...
工作流引擎是否应该建立在有限状态机(Finite State Machine, FSM)引擎之上? -
di1984HIT:
不错啊。学习了。
[转]hadoop使用中的几个小细节(一) -
di1984HIT:
好的、
工作流20种基本模式的理解
====16 Feb 2012, by Bright Zheng (IT进行时)====
4. Samples ABC
We try to learn it step by step to understand the concepts and Java API usages by means of:
1. Concept Introduction
2. CLI
3. Java Sample Code
4.1. Get a Single Column by a Key
4.1.1. Sample Code
public QueryResult<HColumn<String,String>> execute() { ColumnQuery<String, String, String> columnQuery = HFactory.createStringColumnQuery(keyspace); columnQuery.setColumnFamily("Npanxx"); columnQuery.setKey("512204"); columnQuery.setName("city"); QueryResult<HColumn<String, String>> result = columnQuery.execute();
return result; } |
4.1.2. Sample Code run by Maven
C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner" |
The output is:
[INFO] --- exec-maven-plugin:1.1.2-Beta1:java (default-cli) @ cassandra-tutorial ---
HColumn(city=Austin)
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
4.1.3. CLI
[default@Tutorial] get Npanxx['512204']['city']; => (column=city, value=Austin, timestamp=1329234388328000) Elapsed time: 16 msec(s). |
4.2. Get multiple columns by a Key
4.2.1. Sample Code
public QueryResult<ColumnSlice<Long,String>> execute() { SliceQuery<String, Long, String> sliceQuery = HFactory.createSliceQuery(keyspace, stringSerializer, longSerializer, stringSerializer); sliceQuery.setColumnFamily("StateCity"); sliceQuery.setKey("TX Austin");
//way 1: set multiple columnNames sliceQuery.setColumnNames(202L, 203L, 204L);
//way 2: use setRange // change 'reversed' to true to get the columns in reverse order //sliceQuery.setRange(202L, 204L, false, 5);
QueryResult<ColumnSlice<Long, String>> result = sliceQuery.execute(); return result; } |
4.2.2. Sample Code run by Maven
C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get_slice_sc" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner" |
The output is:
[INFO] --- exec-maven-plugin:1.1.2-Beta1:java (default-cli) @ cassandra-tutorial --- ColumnSlice([HColumn(202=30.27x097.74), HColumn(203=30.27x097.74), HColumn(204=30.32x097.73)] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS |
4.2.3. CLI(TODO)
TODO: Refering to CLI Syntax, Cassandra can’t get multiple columns at one ‘get’ command? |
4.3. Get multiple rows by a set of Key
4.3.1. Sample Code
public QueryResult<Rows<String,String,String>> execute() { MultigetSliceQuery<String, String, String> multigetSlicesQuery = HFactory.createMultigetSliceQuery(keyspace, stringSerializer, stringSerializer, stringSerializer); multigetSlicesQuery.setColumnFamily("Npanxx"); multigetSlicesQuery.setColumnNames("city","state","lat","lng"); multigetSlicesQuery.setKeys("512202","512203","512205","512206"); QueryResult<Rows<String, String, String>> results = multigetSlicesQuery.execute(); return results; } |
4.3.2. Sample Code run by Maven
C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="multiget_slice" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner" |
The output is:
[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---
Rows({
512205=Row(512205,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)])),
512206=Row(512206,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)])),
512203=Row(512203,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.27), HColumn(lng=097.74), HColumn(state=TX)])),
512202=Row(512202,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.27), HColumn(lng=097.74), HColumn(state=TX)]))})
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
4.3.3. CLI(TODO)
TODO: N/A? |
4.4. Get Slices from a Range of Rows by Key
4.4.1. Sample Code
GetRangeSlicesForStateCity.java
public QueryResult<OrderedRows<String,String,String>> execute() { RangeSlicesQuery<String, String, String> rangeSlicesQuery = HFactory.createRangeSlicesQuery(keyspace, stringSerializer, stringSerializer, stringSerializer); rangeSlicesQuery.setColumnFamily("Npanxx"); rangeSlicesQuery.setColumnNames("city","state","lat","lng"); rangeSlicesQuery.setKeys("512202", "512205"); rangeSlicesQuery.setRowCount(5); QueryResult<OrderedRows<String, String, String>> results = rangeSlicesQuery.execute(); return results; } |
Important Note: The result actually is NOT meaningful (expected return might be 512202-512205, 4 rows, but actually not) since the Key is sorted by RandomPartitioner (which can be configured in /conf/cassandra.yaml, but not recommend to do so). The result can be referred at “Sample Code run by Maven”.
4.4.2. Sample Code run by Maven
C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get_range_slices" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner" |
The output is:
[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---
Rows({
512202=Row(512202,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.27), HColumn(lng=097.74), HColumn(state=TX)])),
512206=Row(512206,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)])),
512205=Row(512205,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)]))
})
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
4.4.3. CLI(TODO)
TODO: N/A |
4.5. Get Slices from a Range of Rows by Columns
4.5.1. Sample Code
GetSliceForAreaCodeCity.java
public QueryResult<ColumnSlice<String,String>> execute() { SliceQuery<String, String, String> sliceQuery = HFactory.createSliceQuery(keyspace, stringSerializer, stringSerializer, stringSerializer); sliceQuery.setColumnFamily("AreaCode"); sliceQuery.setKey("512"); // change the order argument to 'true' to get the last 2 columns in descending order // gets the first 4 columns "between" Austin and Austin__204 according to comparator sliceQuery.setRange("Austin", "Austin__204", false, 5);
QueryResult<ColumnSlice<String, String>> result = sliceQuery.execute();
return result; } |
4.5.2. Sample Code run by Maven
C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get_slice_acc" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner" |
The output is:
[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---
ColumnSlice([
HColumn(Austin__202=30.27x097.74),
HColumn(Austin__203=30.27x097.74),
HColumn(Austin__204=30.32x097.73)
])
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
4.5.3. CLI
N/A |
4.6. Get Slices from Indexed Columns
4.6.1. Sample Code
GetIndexedSlicesForCityState.java
public QueryResult<OrderedRows<String, String, String>> execute() { IndexedSlicesQuery<String, String, String> indexedSlicesQuery = HFactory.createIndexedSlicesQuery(keyspace, stringSerializer, stringSerializer, stringSerializer); indexedSlicesQuery.setColumnFamily("Npanxx"); indexedSlicesQuery.setColumnNames("city","lat","lng"); indexedSlicesQuery.addEqualsExpression("state", "TX"); indexedSlicesQuery.addEqualsExpression("city", "Austin"); indexedSlicesQuery.addGteExpression("lat", "30.30"); QueryResult<OrderedRows<String, String, String>> result = indexedSlicesQuery.execute();
return result; } |
4.6.2. Sample Code run by Maven
|
The output is:
[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---
Rows({512204=Row(
512204,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73)])),
512206=Row(512206,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73)])),
512205=Row(512205,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73)]))})
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
4.6.3. CLI
[default@Tutorial] get npanxx where state='TX' and city='Austin' and lat>'30.30'; ------------------- RowKey: 512204 => (column=city, value=Austin, timestamp=1329299521508000) => (column=lat, value=30.32, timestamp=1329299521540000) => (column=lng, value=097.73, timestamp=1329299521555000) => (column=state, value=TX, timestamp=1329299521524000) ------------------- RowKey: 512206 => (column=city, value=Austin, timestamp=1329299521618000) => (column=lat, value=30.32, timestamp=1329299521633000) => (column=lng, value=097.73, timestamp=1329299522491000) => (column=state, value=TX, timestamp=1329299521618000) ------------------- RowKey: 512205 => (column=city, value=Austin, timestamp=1329299521555000) => (column=lat, value=30.32, timestamp=1329299521586000) => (column=lng, value=097.73, timestamp=1329299521602000) => (column=state, value=TX, timestamp=1329299521571000)
3 Rows Returned. Elapsed time: 16 msec(s). |
4.7. Insertion
4.7.1. Sample Code
InsertRowsForColumnFamilies.java
public QueryResult<?> execute() { Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);
mutator.addInsertion("CA Burlingame", "StateCity", HFactory.createColumn(650L, "37.57x122.34",longSerializer,stringSerializer)); mutator.addInsertion("650", "AreaCode", HFactory.createStringColumn("Burlingame__650", "37.57x122.34")); mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("lat", "37.57")); mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("lng", "122.34")); mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("city", "Burlingame")); mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("state", "CA"));
MutationResult mr = mutator.execute(); return null; } |
4.7.2. Sample Code run by Maven
Omitted |
4.7.3. CLI
[default@Tutorial] set StateCity['CA Burlingame']['650']='37.57x122.34'; [default@Tutorial] set AreaCode[‘650'][‘Burlingame__650’]=’37.57x122.34'; [default@Tutorial] set Npanxx['650222']['lat']='37.57'; … |
4.8. Deletion
4.8.1. Sample Code
InsertRowsForColumnFamilies.java
public QueryResult<?> execute() { Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);
//Mutator.addDeletion(String key, String cf, String columnName, Serializer<String> nameSerializer) //columnName as null means to delete the whole row. mutator.addDeletion("CA Burlingame", "StateCity", null, stringSerializer); mutator.addDeletion("650", "AreaCode", null, stringSerializer); mutator.addDeletion("650222", "Npanxx", null, stringSerializer); // adding a non-existent key like the following will cause the insertion of a tombstone // mutator.addDeletion("652", "AreaCode", null, stringSerializer); MutationResult mr = mutator.execute(); return null;
} |
4.8.2. Sample Code run by Maven
Omitted… |
4.8.3. CLI
[default@Tutorial] del StateCity['CA Burlingame']; [default@Tutorial] del AreaCode['650']; [default@Tutorial] del Npanxx['650222']; |
Important Note: Whatever you use, either java code or CLI, the deletion event will still leave the DeletedColumn row key there marked as Tombstone (hehe, 墓碑, a really good naming) which can be retrieved back by command of ‘list’ like this.
[default@Tutorial] list StateCity; Using default limit of 100 ------------------- RowKey: CA Burlingame ------------------- RowKey: TX Austin => (column=202, value=30.27x097.74, timestamp=1329297768323000) => (column=203, value=30.27x097.74, timestamp=1329297768338000) => (column=204, value=30.32x097.73, timestamp=1329297768354000) => (column=205, value=30.32x097.73, timestamp=1329297768370000) => (column=206, value=30.32x097.73, timestamp=1329297768385000) 2 Rows Returned. Elapsed time: 16 msec(s). |
As you see, two rows returned! Even the row of ‘CA Burlingame’ has been deleted.
Even worse, if the deletion of non-existing key will cause an issue called ‘insertion of a tombstone’ which means it will add one more row in the Column Family!!!
Fortrunately, the command of ‘get’ won’t retrieve it back any more.
[default@Tutorial] get StateCity['CA Burlingame']; Returned 0 results. Elapsed time: 0 msec(s). |
Go deeper? Please read on.
When will Cassandra remove these tombstones? As I know, two ways:
1. Wait until gc_grace_seconds is timeout (Not verified yet)
The gc_grace_seconds is set per CF and can be updated without a restart.
How to get gc_grace_seconds? Simply use CLI:
[default@Tutorial] show schema; … create column family StateCity with column_type = 'Standard' and comparator = 'LongType' and default_validation_class = 'UTF8Type' and key_validation_class = 'UTF8Type' and rows_cached = 0.0 and row_cache_save_period = 0 and row_cache_keys_to_save = 2147483647 and keys_cached = 200000.0 and key_cache_save_period = 14400 and read_repair_chance = 1.0 and gc_grace = 864000 // 10 days, OMG and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and row_cache_provider = 'ConcurrentLinkedHashCacheProvider' and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'; … |
2. The Compaction event (under investigation but no luck yet)
The Compaction will be triggered automatically.
But how to trigger compaction manually? Use nodetool as well.
C:\java\apache-cassandra-1.0.7\bin>nodetool -h localhost flush Tutorial Starting NodeTool C:\java\apache-cassandra-1.0.7\bin>nodetool -h localhost compact Tutorial Starting NodeTool |
Then we can see some logging messages in the Cassandra console.
But as I found, the tombstones are still here. (WHY???)
C:\java\apache-cassandra-1.0.7\bin>sstable2json ..\runtime\data\Tutorial\StateCity-hc-9-Data.db { "4341204275726c696e67616d65": [["650","37.57x122.34",1329316454906000]], "54582041757374696e": [["202","30.27x097.74",1329297768323000], ["203","30.27x097.74",1329297768338000], ["204","30.32x097.73",1329297768354000], ["205","30.32x097.73",1329297768370000], ["206","30.32x097.73",1329297768385000]], "616263": [] } |
And still appears in the list command. (KAO, 阴魂不散? Big why???)
[default@Tutorial] list statecity; Using default limit of 100 ------------------- RowKey: CA Burlingame ------------------- RowKey: TX Austin => (column=202, value=30.27x097.74, timestamp=1329297768323000) => (column=203, value=30.27x097.74, timestamp=1329297768338000) => (column=204, value=30.32x097.73, timestamp=1329297768354000) => (column=205, value=30.32x097.73, timestamp=1329297768370000) => (column=206, value=30.32x097.73, timestamp=1329297768385000) ------------------- RowKey: abc
3 Rows Returned. Elapsed time: 31 msec(s). |
在这儿咱发几句牢骚:
1. 可能是学习深度还不足的原因,感觉CLI比较弱,适合初始化建模DDL和简单的数据分析;
2. Tombstone的清理问题还没有最终得到验证,暂时挂起,权当悬案先,以后有答案了再补充、更正
发表评论
-
Apache Cassandra Learning Step by Step (5): 实战性的JTwissandra项目
2012-02-25 22:08 2629在完成了Apache Cassandra的四个基本学习步骤之后 ... -
Apache Cassandra Learning Step by Step (4): Data Modeling
2012-02-22 18:14 219022 Feb 2012, by Bright Zheng (I ... -
Apache Cassandra Learning Step by Step (2): Core Concepts
2012-02-15 21:04 2338====15 Feb 2012, by Bright Zhen ... -
Apache Cassandra Learning Step by Step (1)
2012-02-14 21:58 2691By Bright Zheng (IT进行时) 1. A ...
相关推荐
通过学习《Learning Apache Cassandra 2015》这本书,我们不仅能够了解到Cassandra的基本概念和特性,还能深入了解如何使用Cassandra解决实际问题,包括如何设计表结构、组织数据以及执行高效查询等。此外,书中还...
在本文档中,标题“Learning_Apache_Cassandra”透露了内容的主题,即学习Apache Cassandra。Cassandra是一个开源的NoSQL分布式数据库管理系统,它以高可用性和分布式架构著称。该书详细介绍了Cassandra的基本概念、...
Learning Apache Cassandra - Second Edition by Sandeep Yarabarla English | 25 Apr. 2017 | ASIN: B01N52R0B5 | 360 Pages | AZW3 | 10.68 MB Key Features Install Cassandra and set up multi-node clusters ...
### Apache Cassandra 掌控指南 #### 一、引言 在大数据时代,高效的数据存储与管理变得至关重要。《Mastering Apache Cassandra》这本书旨在帮助读者掌握 Apache Cassandra 的核心技术和最佳实践,使其能够在处理...
Title: Mastering Apache Cassandra, 2nd Edition Author: Nishant Neeraj Length: 322 pages Edition: 2 Language: English Publisher: Packt Publishing Publication Date: 2015-02-27 ISBN-10: 1784392618 ISBN-...
Spring Data for Apache Cassandra API。 Spring Data for Apache Cassandra 开发文档
Beginning Apache Cassandra Development introduces you to one of the most robust and best-performing NoSQL database platforms on the planet. Apache Cassandra is a document database following the JSON ...
Apache Cassandra是一个分布式的NoSQL数据库管理系统,它被设计用来处理大量的数据跨越多个数据中心。Cassandra对高性能、高可用性、可扩展性有着出色的支持,因此它特别适合于那些需要不断增长和变化的数据集的应用...
Apache Cassandra 是一个分布式数据库系统,特别设计用于处理大规模数据,具备高可用性、线性可扩展性和优秀的性能。在这个"apache-cassandra-3.11.13"版本中,我们探讨的是Cassandra项目的其中一个稳定版本,它包含...
Apache Cassandra 是一个分布式NoSQL数据库系统,以高可用性、可扩展性和高性能著称。它设计用于处理大规模数据,尤其适合大数据分析和实时应用程序。在MATLAB开发环境中,与Apache Cassandra的集成允许用户通过...
Cassandra(apache-cassandra-3.11.11-bin.tar.gz)是一套开源分布式NoSQL数据库系统。它最初由Facebook开发,用于储存收件箱等简单格式数据,集GoogleBigTable的数据模型与Amazon Dynamo的完全分布式的架构于一身...
Apache Cassandra is the most commonly used NoSQL database written in Java and is renowned in the industry as the only NoSQL solution that can accommodate the complex requirements of today’s modern ...
Apache Cassandra是一个开源的分布式NoSQL数据库管理系统,它最初由Facebook开发,并在2008年被捐献给了Apache软件基金会。Cassandra旨在解决大规模数据存储的问题,特别适用于那些需要高性能、可伸缩性以及高可用性...
Apache Cassandra 是一个分布式数据库系统,特别适合处理大规模的数据。它以高可用性、线性可扩展性和优秀的性能而闻名。2.2.14 版本是 Apache Cassandra 的一个重要里程碑,提供了许多增强功能和修复了已知问题。...
Cassandra-Operator是针对Apache Cassandra在Kubernetes集群中部署和管理的一个开源项目。它使得在Kubernetes环境中运行和扩展Cassandra数据库变得更加简单和自动化。在这个压缩包“cassandra-operator,apache-...