`

Apache Cassandra Learning Step by Step (3): Samples ABC

阅读更多

====16 Feb 2012, by Bright Zheng (IT进行时)====

4. Samples ABC

We try to learn it step by step to understand the concepts and Java API usages by means of:

1. Concept Introduction

2. CLI

3. Java Sample Code

4.1. Get a Single Column by a Key

4.1.1. Sample Code

public QueryResult<HColumn<String,String>> execute() {       

        ColumnQuery<String, String, String> columnQuery = HFactory.createStringColumnQuery(keyspace);

        columnQuery.setColumnFamily("Npanxx");

        columnQuery.setKey("512204");

        columnQuery.setName("city");

        QueryResult<HColumn<String, String>> result = columnQuery.execute();

       

        return result;

    }


4.1.2.  Sample Code run by Maven

C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner"

The output is:

[INFO] --- exec-maven-plugin:1.1.2-Beta1:java (default-cli) @ cassandra-tutorial ---

HColumn(city=Austin)

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.1.3.  CLI

[default@Tutorial] get Npanxx['512204']['city'];

=> (column=city, value=Austin, timestamp=1329234388328000)

Elapsed time: 16 msec(s).

4.2. Get multiple columns by a Key

4.2.1. Sample Code

public QueryResult<ColumnSlice<Long,String>> execute() {

        SliceQuery<String, Long, String> sliceQuery =

            HFactory.createSliceQuery(keyspace, stringSerializer, longSerializer, stringSerializer);

        sliceQuery.setColumnFamily("StateCity");

        sliceQuery.setKey("TX Austin");

       

        //way 1: set multiple columnNames

        sliceQuery.setColumnNames(202L, 203L, 204L);

       

        //way 2: use setRange

        // change 'reversed' to true to get the columns in reverse order

        //sliceQuery.setRange(202L, 204L, false, 5);

        

        QueryResult<ColumnSlice<Long, String>> result = sliceQuery.execute();

        return result;

    }

4.2.2. Sample Code run by Maven

C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get_slice_sc" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner"

The output is:

[INFO] --- exec-maven-plugin:1.1.2-Beta1:java (default-cli) @ cassandra-tutorial ---

ColumnSlice([HColumn(202=30.27x097.74), HColumn(203=30.27x097.74), HColumn(204=30.32x097.73)]

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.2.3. CLI(TODO)

TODO: Refering to CLI Syntax, Cassandra can’t get multiple columns at one ‘get’ command?

4.3. Get multiple rows by a set of Key

4.3.1. Sample Code

public QueryResult<Rows<String,String,String>> execute() {

        MultigetSliceQuery<String, String, String> multigetSlicesQuery =

            HFactory.createMultigetSliceQuery(keyspace, stringSerializer, stringSerializer, stringSerializer);

        multigetSlicesQuery.setColumnFamily("Npanxx");

        multigetSlicesQuery.setColumnNames("city","state","lat","lng");       

        multigetSlicesQuery.setKeys("512202","512203","512205","512206");

        QueryResult<Rows<String, String, String>> results = multigetSlicesQuery.execute();

        return results;

    }

4.3.2. Sample Code run by Maven

C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="multiget_slice" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner"

The output is:

[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---

Rows({

512205=Row(512205,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)])),

512206=Row(512206,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)])),

512203=Row(512203,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.27), HColumn(lng=097.74), HColumn(state=TX)])),

512202=Row(512202,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.27), HColumn(lng=097.74), HColumn(state=TX)]))})

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.3.3. CLI(TODO)

TODO: N/A?

4.4. Get Slices from a Range of Rows by Key

4.4.1. Sample Code

GetRangeSlicesForStateCity.java

public QueryResult<OrderedRows<String,String,String>> execute() {

        RangeSlicesQuery<String, String, String> rangeSlicesQuery =

            HFactory.createRangeSlicesQuery(keyspace, stringSerializer, stringSerializer, stringSerializer);

        rangeSlicesQuery.setColumnFamily("Npanxx");

        rangeSlicesQuery.setColumnNames("city","state","lat","lng");       

        rangeSlicesQuery.setKeys("512202", "512205");

        rangeSlicesQuery.setRowCount(5);

        QueryResult<OrderedRows<String, String, String>> results = rangeSlicesQuery.execute();

        return results;

    }

Important Note: The result actually is NOT meaningful (expected return might be 512202-512205, 4 rows, but actually not) since the Key is sorted by RandomPartitioner (which can be configured in /conf/cassandra.yaml, but not recommend to do so).  The result can be referred at “Sample Code run by Maven”.

4.4.2. Sample Code run by Maven

C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get_range_slices" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner"

The output is:

[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---

Rows({

512202=Row(512202,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.27), HColumn(lng=097.74), HColumn(state=TX)])),

512206=Row(512206,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)])),

512205=Row(512205,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)]))

})

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.4.3. CLI(TODO)

TODO: N/A

4.5. Get Slices from a Range of Rows by Columns

4.5.1. Sample Code

GetSliceForAreaCodeCity.java

public QueryResult<ColumnSlice<String,String>> execute() {

        SliceQuery<String, String, String> sliceQuery =

            HFactory.createSliceQuery(keyspace, stringSerializer, stringSerializer, stringSerializer);

        sliceQuery.setColumnFamily("AreaCode");

        sliceQuery.setKey("512");

        // change the order argument to 'true' to get the last 2 columns in descending order

        // gets the first 4 columns "between" Austin and Austin__204 according to comparator

        sliceQuery.setRange("Austin", "Austin__204", false, 5);

 

        QueryResult<ColumnSlice<String, String>> result = sliceQuery.execute();

 

        return result;

    }

4.5.2. Sample Code run by Maven

C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get_slice_acc" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner"

The output is:

[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---

ColumnSlice([

HColumn(Austin__202=30.27x097.74),

HColumn(Austin__203=30.27x097.74),

HColumn(Austin__204=30.32x097.73)

])

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.5.3. CLI

N/A

4.6. Get Slices from Indexed Columns

4.6.1. Sample Code

GetIndexedSlicesForCityState.java

public QueryResult<OrderedRows<String, String, String>> execute() {

        IndexedSlicesQuery<String, String, String> indexedSlicesQuery =

            HFactory.createIndexedSlicesQuery(keyspace, stringSerializer, stringSerializer, stringSerializer);

        indexedSlicesQuery.setColumnFamily("Npanxx");

        indexedSlicesQuery.setColumnNames("city","lat","lng");

        indexedSlicesQuery.addEqualsExpression("state", "TX");

        indexedSlicesQuery.addEqualsExpression("city", "Austin");

        indexedSlicesQuery.addGteExpression("lat", "30.30");

        QueryResult<OrderedRows<String, String, String>> result = indexedSlicesQuery.execute();

        

        return result;

    }

4.6.2. Sample Code run by Maven

 

The output is:

[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---

Rows({512204=Row(

512204,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73)])),

512206=Row(512206,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73)])),

512205=Row(512205,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73)]))})

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.6.3. CLI

[default@Tutorial] get npanxx where state='TX' and city='Austin' and lat>'30.30';

-------------------

RowKey: 512204

=> (column=city, value=Austin, timestamp=1329299521508000)

=> (column=lat, value=30.32, timestamp=1329299521540000)

=> (column=lng, value=097.73, timestamp=1329299521555000)

=> (column=state, value=TX, timestamp=1329299521524000)

-------------------

RowKey: 512206

=> (column=city, value=Austin, timestamp=1329299521618000)

=> (column=lat, value=30.32, timestamp=1329299521633000)

=> (column=lng, value=097.73, timestamp=1329299522491000)

=> (column=state, value=TX, timestamp=1329299521618000)

-------------------

RowKey: 512205

=> (column=city, value=Austin, timestamp=1329299521555000)

=> (column=lat, value=30.32, timestamp=1329299521586000)

=> (column=lng, value=097.73, timestamp=1329299521602000)

=> (column=state, value=TX, timestamp=1329299521571000)

 

3 Rows Returned.

Elapsed time: 16 msec(s).

 

4.7. Insertion

4.7.1. Sample Code

InsertRowsForColumnFamilies.java

public QueryResult<?> execute() {

        Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);

       

        mutator.addInsertion("CA Burlingame", "StateCity", HFactory.createColumn(650L, "37.57x122.34",longSerializer,stringSerializer));

        mutator.addInsertion("650", "AreaCode", HFactory.createStringColumn("Burlingame__650", "37.57x122.34"));

        mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("lat", "37.57"));

        mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("lng", "122.34"));

        mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("city", "Burlingame"));

        mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("state", "CA"));                

       

        MutationResult mr = mutator.execute();

        return null;

    }

4.7.2. Sample Code run by Maven

Omitted

4.7.3. CLI

[default@Tutorial] set StateCity['CA Burlingame']['650']='37.57x122.34';

[default@Tutorial] set AreaCode[‘650'][‘Burlingame__650’]=’37.57x122.34';

[default@Tutorial] set Npanxx['650222']['lat']='37.57';

4.8. Deletion

4.8.1. Sample Code

InsertRowsForColumnFamilies.java

public QueryResult<?> execute() {

        Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);

       

        //Mutator.addDeletion(String key, String cf, String columnName, Serializer<String> nameSerializer)

        //columnName as null means to delete the whole row.

        mutator.addDeletion("CA Burlingame", "StateCity", null, stringSerializer);

        mutator.addDeletion("650", "AreaCode", null, stringSerializer);

        mutator.addDeletion("650222", "Npanxx", null, stringSerializer);

        // adding a non-existent key like the following will cause the insertion of a tombstone

        // mutator.addDeletion("652", "AreaCode", null, stringSerializer);

        MutationResult mr = mutator.execute();

        return null;

 

    }

4.8.2. Sample Code run by Maven

Omitted…

4.8.3. CLI

[default@Tutorial] del StateCity['CA Burlingame'];

[default@Tutorial] del AreaCode['650'];

[default@Tutorial] del Npanxx['650222'];

Important Note: Whatever you use, either java code or CLI, the deletion event will still leave the DeletedColumn row key there marked as Tombstone (hehe, 墓碑, a really good naming) which can be retrieved back by command of ‘list’ like this.

[default@Tutorial] list StateCity;

Using default limit of 100

-------------------

RowKey: CA Burlingame

-------------------

RowKey: TX Austin

=> (column=202, value=30.27x097.74, timestamp=1329297768323000)

=> (column=203, value=30.27x097.74, timestamp=1329297768338000)

=> (column=204, value=30.32x097.73, timestamp=1329297768354000)

=> (column=205, value=30.32x097.73, timestamp=1329297768370000)

=> (column=206, value=30.32x097.73, timestamp=1329297768385000)


2 Rows Returned.

Elapsed time: 16 msec(s).

As you see, two rows returned! Even the row of ‘CA Burlingame’ has been deleted.

Even worse, if the deletion of non-existing key will cause an issue called ‘insertion of a tombstone’ which means it will add one more row in the Column Family!!!

Fortrunately, the command of ‘get’ won’t retrieve it back any more.

[default@Tutorial] get StateCity['CA Burlingame'];

Returned 0 results.

Elapsed time: 0 msec(s).

 

Go deeper? Please read on.

When will Cassandra remove these tombstones? As I know, two ways:

1. Wait until gc_grace_seconds is timeout (Not verified yet)

The gc_grace_seconds is set per CF and can be updated without a restart.

How to get gc_grace_seconds? Simply use CLI:

[default@Tutorial] show schema;

create column family StateCity

  with column_type = 'Standard'

  and comparator = 'LongType'

  and default_validation_class = 'UTF8Type'

  and key_validation_class = 'UTF8Type'

  and rows_cached = 0.0

  and row_cache_save_period = 0

  and row_cache_keys_to_save = 2147483647

  and keys_cached = 200000.0

  and key_cache_save_period = 14400

  and read_repair_chance = 1.0

  and gc_grace = 864000   // 10 days, OMG

  and min_compaction_threshold = 4

  and max_compaction_threshold = 32

  and replicate_on_write = true

  and row_cache_provider = 'ConcurrentLinkedHashCacheProvider'

and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy';

 

2. The Compaction event (under investigation but no luck yet)

The Compaction will be triggered automatically.

But how to trigger compaction manually? Use nodetool as well.

C:\java\apache-cassandra-1.0.7\bin>nodetool -h localhost flush Tutorial

Starting NodeTool

C:\java\apache-cassandra-1.0.7\bin>nodetool -h localhost compact Tutorial

Starting NodeTool

Then we can see some logging messages in the Cassandra console.

But as I found, the tombstones are still here. (WHY???)

C:\java\apache-cassandra-1.0.7\bin>sstable2json ..\runtime\data\Tutorial\StateCity-hc-9-Data.db

{

"4341204275726c696e67616d65": [["650","37.57x122.34",1329316454906000]],

"54582041757374696e": [["202","30.27x097.74",1329297768323000], ["203","30.27x097.74",1329297768338000], ["204","30.32x097.73",1329297768354000], ["205","30.32x097.73",1329297768370000], ["206","30.32x097.73",1329297768385000]],

"616263": []

}

And still appears in the list command. (KAO, 阴魂不散? Big why???)

[default@Tutorial] list statecity;

Using default limit of 100

-------------------

RowKey: CA Burlingame

-------------------

RowKey: TX Austin

=> (column=202, value=30.27x097.74, timestamp=1329297768323000)

=> (column=203, value=30.27x097.74, timestamp=1329297768338000)

=> (column=204, value=30.32x097.73, timestamp=1329297768354000)

=> (column=205, value=30.32x097.73, timestamp=1329297768370000)

=> (column=206, value=30.32x097.73, timestamp=1329297768385000)

-------------------

RowKey: abc

 

3 Rows Returned.

Elapsed time: 31 msec(s).

 

 

在这儿咱发几句牢骚:

1. 可能是学习深度还不足的原因,感觉CLI比较弱,适合初始化建模DDL和简单的数据分析;

2. Tombstone的清理问题还没有最终得到验证,暂时挂起,权当悬案先,以后有答案了再补充、更正

 

0
0
分享到:
评论

相关推荐

    Learning Apache Cassandra 2015

    通过学习《Learning Apache Cassandra 2015》这本书,我们不仅能够了解到Cassandra的基本概念和特性,还能深入了解如何使用Cassandra解决实际问题,包括如何设计表结构、组织数据以及执行高效查询等。此外,书中还...

    Learning_Apache_Cassandra

    在本文档中,标题“Learning_Apache_Cassandra”透露了内容的主题,即学习Apache Cassandra。Cassandra是一个开源的NoSQL分布式数据库管理系统,它以高可用性和分布式架构著称。该书详细介绍了Cassandra的基本概念、...

    Learning Apache Cassandra - Second Edition

    Learning Apache Cassandra - Second Edition by Sandeep Yarabarla English | 25 Apr. 2017 | ASIN: B01N52R0B5 | 360 Pages | AZW3 | 10.68 MB Key Features Install Cassandra and set up multi-node clusters ...

    Mastering Apache Cassandra

    ### Apache Cassandra 掌控指南 #### 一、引言 在大数据时代,高效的数据存储与管理变得至关重要。《Mastering Apache Cassandra》这本书旨在帮助读者掌握 Apache Cassandra 的核心技术和最佳实践,使其能够在处理...

    Mastering.Apache.Cassandra.2nd.Edition.1784392618

    Title: Mastering Apache Cassandra, 2nd Edition Author: Nishant Neeraj Length: 322 pages Edition: 2 Language: English Publisher: Packt Publishing Publication Date: 2015-02-27 ISBN-10: 1784392618 ISBN-...

    Spring Data for Apache Cassandra API(Spring Data for Apache Cassandra 开发文档).CHM

    Spring Data for Apache Cassandra API。 Spring Data for Apache Cassandra 开发文档

    Beginning Apache Cassandra Development

    Beginning Apache Cassandra Development introduces you to one of the most robust and best-performing NoSQL database platforms on the planet. Apache Cassandra is a document database following the JSON ...

    Expert Apache Cassandra Administration.pdf

    Apache Cassandra是一个分布式的NoSQL数据库管理系统,它被设计用来处理大量的数据跨越多个数据中心。Cassandra对高性能、高可用性、可扩展性有着出色的支持,因此它特别适合于那些需要不断增长和变化的数据集的应用...

    apache-cassandra-3.11.13

    Apache Cassandra 是一个分布式数据库系统,特别设计用于处理大规模数据,具备高可用性、线性可扩展性和优秀的性能。在这个"apache-cassandra-3.11.13"版本中,我们探讨的是Cassandra项目的其中一个稳定版本,它包含...

    Apache Cassandra 的数据库工具箱界面:使用 Apache Cassandra 数据库的数据库工具箱界面访问和导入列数据。-matlab开发

    Apache Cassandra 是一个分布式NoSQL数据库系统,以高可用性、可扩展性和高性能著称。它设计用于处理大规模数据,尤其适合大数据分析和实时应用程序。在MATLAB开发环境中,与Apache Cassandra的集成允许用户通过...

    NoSQL Web Development with Apache Cassandra(2015)

    Apache Cassandra is the most commonly used NoSQL database written in Java and is renowned in the industry as the only NoSQL solution that can accommodate the complex requirements of today’s modern ...

    Apache Cassandra

    Apache Cassandra是一个开源的分布式NoSQL数据库管理系统,它最初由Facebook开发,并在2008年被捐献给了Apache软件基金会。Cassandra旨在解决大规模数据存储的问题,特别适用于那些需要高性能、可伸缩性以及高可用性...

    Cassandra(apache-cassandra-3.11.11-bin.tar.gz)

    Cassandra(apache-cassandra-3.11.11-bin.tar.gz)是一套开源分布式NoSQL数据库系统。它最初由Facebook开发,用于储存收件箱等简单格式数据,集GoogleBigTable的数据模型与Amazon Dynamo的完全分布式的架构于一身...

    apache-cassandra-2.2.14-bin.tar.gz

    Apache Cassandra 是一个分布式数据库系统,特别适合处理大规模的数据。它以高可用性、线性可扩展性和优秀的性能而闻名。2.2.14 版本是 Apache Cassandra 的一个重要里程碑,提供了许多增强功能和修复了已知问题。...

    cassandra-operator,apache-cassandra的kubernetes算子.zip

    Cassandra-Operator是针对Apache Cassandra在Kubernetes集群中部署和管理的一个开源项目。它使得在Kubernetes环境中运行和扩展Cassandra数据库变得更加简单和自动化。在这个压缩包“cassandra-operator,apache-...

Global site tag (gtag.js) - Google Analytics