itstarting

浏览: 146935 次
性别:
来自: 杭州

最近访客更多访客>>

sp42

黑小子

winting

yeyeyeid

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

Apache Cassandra Learning Step by Step (3): Samples ABC

博客分类：

NoSQL

NoSQL Apache Cassandra learning

====16 Feb 2012, by Bright Zheng (IT进行时)====

4. Samples ABC

We try to learn it step by step to understand the concepts and Java API usages by means of:

1. Concept Introduction

2. CLI

3. Java Sample Code

4.1. Get a Single Column by a Key

4.1.1. Sample Code

public QueryResult<HColumn<String,String>> execute() {

ColumnQuery<String, String, String> columnQuery = HFactory.createStringColumnQuery(keyspace);

columnQuery.setColumnFamily("Npanxx");

columnQuery.setKey("512204");

columnQuery.setName("city");

QueryResult<HColumn<String, String>> result = columnQuery.execute();

return result;

}

4.1.2. Sample Code run by Maven

C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner"

The output is:

[INFO] --- exec-maven-plugin:1.1.2-Beta1:java (default-cli) @ cassandra-tutorial ---

HColumn(city=Austin)

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.1.3. CLI

[default@Tutorial] get Npanxx['512204']['city'];

=> (column=city, value=Austin, timestamp=1329234388328000)

Elapsed time: 16 msec(s).

4.2. Get multiple columns by a Key

4.2.1. Sample Code

public QueryResult<ColumnSlice<Long,String>> execute() {

SliceQuery<String, Long, String> sliceQuery =

HFactory.createSliceQuery(keyspace, stringSerializer, longSerializer, stringSerializer);

sliceQuery.setColumnFamily("StateCity");

sliceQuery.setKey("TX Austin");

//way 1: set multiple columnNames

sliceQuery.setColumnNames(202L, 203L, 204L);

//way 2: use setRange

// change 'reversed' to true to get the columns in reverse order

//sliceQuery.setRange(202L, 204L, false, 5);

QueryResult<ColumnSlice<Long, String>> result = sliceQuery.execute();

return result;

}

4.2.2. Sample Code run by Maven

C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get_slice_sc" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner"

The output is:

[INFO] --- exec-maven-plugin:1.1.2-Beta1:java (default-cli) @ cassandra-tutorial ---

ColumnSlice([HColumn(202=30.27x097.74), HColumn(203=30.27x097.74), HColumn(204=30.32x097.73)]

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.2.3. CLI(TODO)

TODO: Refering to CLI Syntax, Cassandra can’t get multiple columns at one ‘get’ command?

4.3. Get multiple rows by a set of Key

4.3.1. Sample Code

public QueryResult<Rows<String,String,String>> execute() {

MultigetSliceQuery<String, String, String> multigetSlicesQuery =

HFactory.createMultigetSliceQuery(keyspace, stringSerializer, stringSerializer, stringSerializer);

multigetSlicesQuery.setColumnFamily("Npanxx");

multigetSlicesQuery.setColumnNames("city","state","lat","lng");

multigetSlicesQuery.setKeys("512202","512203","512205","512206");

QueryResult<Rows<String, String, String>> results = multigetSlicesQuery.execute();

return results;

}

4.3.2. Sample Code run by Maven

C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="multiget_slice" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner"

The output is:

[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---

Rows({

512205=Row(512205,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)])),

512206=Row(512206,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)])),

512203=Row(512203,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.27), HColumn(lng=097.74), HColumn(state=TX)])),

512202=Row(512202,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.27), HColumn(lng=097.74), HColumn(state=TX)]))})

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.3.3. CLI(TODO)

TODO: N/A?

4.4. Get Slices from a Range of Rows by Key

4.4.1. Sample Code

GetRangeSlicesForStateCity.java

public QueryResult<OrderedRows<String,String,String>> execute() {

RangeSlicesQuery<String, String, String> rangeSlicesQuery =

HFactory.createRangeSlicesQuery(keyspace, stringSerializer, stringSerializer, stringSerializer);

rangeSlicesQuery.setColumnFamily("Npanxx");

rangeSlicesQuery.setColumnNames("city","state","lat","lng");

rangeSlicesQuery.setKeys("512202", "512205");

rangeSlicesQuery.setRowCount(5);

QueryResult<OrderedRows<String, String, String>> results = rangeSlicesQuery.execute();

return results;

}

Important Note: The result actually is NOT meaningful (expected return might be 512202-512205, 4 rows, but actually not) since the Key is sorted by RandomPartitioner (which can be configured in /conf/cassandra.yaml, but not recommend to do so). The result can be referred at “Sample Code run by Maven”.

4.4.2. Sample Code run by Maven

C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get_range_slices" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner"

The output is:

[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---

Rows({

512202=Row(512202,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.27), HColumn(lng=097.74), HColumn(state=TX)])),

512206=Row(512206,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)])),

512205=Row(512205,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)]))

})

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.4.3. CLI(TODO)

TODO: N/A

4.5. Get Slices from a Range of Rows by Columns

4.5.1. Sample Code

GetSliceForAreaCodeCity.java

public QueryResult<ColumnSlice<String,String>> execute() {

SliceQuery<String, String, String> sliceQuery =

HFactory.createSliceQuery(keyspace, stringSerializer, stringSerializer, stringSerializer);

sliceQuery.setColumnFamily("AreaCode");

sliceQuery.setKey("512");

// change the order argument to 'true' to get the last 2 columns in descending order

// gets the first 4 columns "between" Austin and Austin__204 according to comparator

sliceQuery.setRange("Austin", "Austin__204", false, 5);

QueryResult<ColumnSlice<String, String>> result = sliceQuery.execute();

return result;

}

4.5.2. Sample Code run by Maven

C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get_slice_acc" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner"

The output is:

[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---

ColumnSlice([

HColumn(Austin__202=30.27x097.74),

HColumn(Austin__203=30.27x097.74),

HColumn(Austin__204=30.32x097.73)

])

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.5.3. CLI

N/A

4.6. Get Slices from Indexed Columns

4.6.1. Sample Code

GetIndexedSlicesForCityState.java

public QueryResult<OrderedRows<String, String, String>> execute() {

IndexedSlicesQuery<String, String, String> indexedSlicesQuery =

HFactory.createIndexedSlicesQuery(keyspace, stringSerializer, stringSerializer, stringSerializer);

indexedSlicesQuery.setColumnFamily("Npanxx");

indexedSlicesQuery.setColumnNames("city","lat","lng");

indexedSlicesQuery.addEqualsExpression("state", "TX");

indexedSlicesQuery.addEqualsExpression("city", "Austin");

indexedSlicesQuery.addGteExpression("lat", "30.30");

QueryResult<OrderedRows<String, String, String>> result = indexedSlicesQuery.execute();

return result;

}

4.6.2. Sample Code run by Maven

The output is:

[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---

Rows({512204=Row(

512204,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73)])),

512206=Row(512206,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73)])),

512205=Row(512205,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73)]))})

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

4.6.3. CLI

[default@Tutorial] get npanxx where state='TX' and city='Austin' and lat>'30.30';

-------------------

RowKey: 512204

=> (column=city, value=Austin, timestamp=1329299521508000)

=> (column=lat, value=30.32, timestamp=1329299521540000)

=> (column=lng, value=097.73, timestamp=1329299521555000)

=> (column=state, value=TX, timestamp=1329299521524000)

-------------------

RowKey: 512206

=> (column=city, value=Austin, timestamp=1329299521618000)

=> (column=lat, value=30.32, timestamp=1329299521633000)

=> (column=lng, value=097.73, timestamp=1329299522491000)

=> (column=state, value=TX, timestamp=1329299521618000)

-------------------

RowKey: 512205

=> (column=city, value=Austin, timestamp=1329299521555000)

=> (column=lat, value=30.32, timestamp=1329299521586000)

=> (column=lng, value=097.73, timestamp=1329299521602000)

=> (column=state, value=TX, timestamp=1329299521571000)

3 Rows Returned.

Elapsed time: 16 msec(s).

4.7. Insertion

4.7.1. Sample Code

InsertRowsForColumnFamilies.java

public QueryResult<?> execute() {

Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);

mutator.addInsertion("CA Burlingame", "StateCity", HFactory.createColumn(650L, "37.57x122.34",longSerializer,stringSerializer));

mutator.addInsertion("650", "AreaCode", HFactory.createStringColumn("Burlingame__650", "37.57x122.34"));

mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("lat", "37.57"));

mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("lng", "122.34"));

mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("city", "Burlingame"));

mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("state", "CA"));

MutationResult mr = mutator.execute();

return null;

}

4.7.2. Sample Code run by Maven

Omitted

4.7.3. CLI

[default@Tutorial] set StateCity['CA Burlingame']['650']='37.57x122.34';

[default@Tutorial] set AreaCode[‘650'][‘Burlingame__650’]=’37.57x122.34';

[default@Tutorial] set Npanxx['650222']['lat']='37.57';

…

4.8. Deletion

4.8.1. Sample Code

InsertRowsForColumnFamilies.java

public QueryResult<?> execute() {

Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);

//Mutator.addDeletion(String key, String cf, String columnName, Serializer<String> nameSerializer)

//columnName as null means to delete the whole row.

mutator.addDeletion("CA Burlingame", "StateCity", null, stringSerializer);

mutator.addDeletion("650", "AreaCode", null, stringSerializer);

mutator.addDeletion("650222", "Npanxx", null, stringSerializer);

// adding a non-existent key like the following will cause the insertion of a tombstone

// mutator.addDeletion("652", "AreaCode", null, stringSerializer);

MutationResult mr = mutator.execute();

return null;

}

4.8.2. Sample Code run by Maven

Omitted…

4.8.3. CLI

[default@Tutorial] del StateCity['CA Burlingame'];

[default@Tutorial] del AreaCode['650'];

[default@Tutorial] del Npanxx['650222'];

Important Note: Whatever you use, either java code or CLI, the deletion event will still leave the DeletedColumn row key there marked as Tombstone (hehe, 墓碑, a really good naming) which can be retrieved back by command of ‘list’ like this.

[default@Tutorial] list StateCity;

Using default limit of 100

-------------------

RowKey: CA Burlingame

-------------------

RowKey: TX Austin

=> (column=202, value=30.27x097.74, timestamp=1329297768323000)

=> (column=203, value=30.27x097.74, timestamp=1329297768338000)

=> (column=204, value=30.32x097.73, timestamp=1329297768354000)

=> (column=205, value=30.32x097.73, timestamp=1329297768370000)

=> (column=206, value=30.32x097.73, timestamp=1329297768385000)

2 Rows Returned.

Elapsed time: 16 msec(s).

As you see, two rows returned! Even the row of ‘CA Burlingame’ has been deleted.

Even worse, if the deletion of non-existing key will cause an issue called ‘insertion of a tombstone’ which means it will add one more row in the Column Family!!!

Fortrunately, the command of ‘get’ won’t retrieve it back any more.

[default@Tutorial] get StateCity['CA Burlingame'];

Returned 0 results.

Elapsed time: 0 msec(s).

Go deeper? Please read on.

When will Cassandra remove these tombstones? As I know, two ways:

1. Wait until gc_grace_seconds is timeout (Not verified yet)

The gc_grace_seconds is set per CF and can be updated without a restart.

How to get gc_grace_seconds? Simply use CLI:

[default@Tutorial] show schema;

…

create column family StateCity

with column_type = 'Standard'

and comparator = 'LongType'

and default_validation_class = 'UTF8Type'

and key_validation_class = 'UTF8Type'

and rows_cached = 0.0

and row_cache_save_period = 0

and row_cache_keys_to_save = 2147483647

and keys_cached = 200000.0

and key_cache_save_period = 14400

and read_repair_chance = 1.0

and gc_grace = 864000 // 10 days, OMG

and min_compaction_threshold = 4

and max_compaction_threshold = 32

and replicate_on_write = true

and row_cache_provider = 'ConcurrentLinkedHashCacheProvider'

and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy';

…

2. The Compaction event (under investigation but no luck yet)

The Compaction will be triggered automatically.

But how to trigger compaction manually? Use nodetool as well.

C:\java\apache-cassandra-1.0.7\bin>nodetool -h localhost flush Tutorial

Starting NodeTool

C:\java\apache-cassandra-1.0.7\bin>nodetool -h localhost compact Tutorial

Starting NodeTool

Then we can see some logging messages in the Cassandra console.

But as I found, the tombstones are still here. (WHY???)

C:\java\apache-cassandra-1.0.7\bin>sstable2json ..\runtime\data\Tutorial\StateCity-hc-9-Data.db

{

"4341204275726c696e67616d65": [["650","37.57x122.34",1329316454906000]],

"54582041757374696e": [["202","30.27x097.74",1329297768323000], ["203","30.27x097.74",1329297768338000], ["204","30.32x097.73",1329297768354000], ["205","30.32x097.73",1329297768370000], ["206","30.32x097.73",1329297768385000]],

"616263": []

}

And still appears in the list command. (KAO, 阴魂不散? Big why???)

[default@Tutorial] list statecity;

Using default limit of 100

-------------------

RowKey: CA Burlingame

-------------------

RowKey: TX Austin

=> (column=202, value=30.27x097.74, timestamp=1329297768323000)

=> (column=203, value=30.27x097.74, timestamp=1329297768338000)

=> (column=204, value=30.32x097.73, timestamp=1329297768354000)

=> (column=205, value=30.32x097.73, timestamp=1329297768370000)

=> (column=206, value=30.32x097.73, timestamp=1329297768385000)

-------------------

RowKey: abc

3 Rows Returned.

Elapsed time: 31 msec(s).

在这儿咱发几句牢骚：

1. 可能是学习深度还不足的原因，感觉CLI比较弱，适合初始化建模DDL和简单的数据分析；

2. Tombstone的清理问题还没有最终得到验证，暂时挂起，权当悬案先，以后有答案了再补充、更正

0
顶

0
踩

分享到：

Apache Cassandra Learning Step by Step ( ... | Apache Cassandra Learning Step by Step ( ...

2012-02-16 16:48
浏览 2726
评论(0)
分类:互联网
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Apache Cassandra Learning Step by Step (3): Samples ABC

4. Samples ABC

4.1. Get a Single Column by a Key

4.1.1. Sample Code

4.1.2. Sample Code run by Maven

4.1.3. CLI

4.2. Get multiple columns by a Key

4.2.1. Sample Code

4.2.2. Sample Code run by Maven

4.2.3. CLI(TODO)

4.3. Get multiple rows by a set of Key

4.3.1. Sample Code

4.3.2. Sample Code run by Maven

4.3.3. CLI(TODO)

4.4. Get Slices from a Range of Rows by Key

4.4.1. Sample Code

4.4.2. Sample Code run by Maven

4.4.3. CLI(TODO)

4.5. Get Slices from a Range of Rows by Columns

4.5.1. Sample Code

4.5.2. Sample Code run by Maven

4.5.3. CLI

4.6. Get Slices from Indexed Columns

4.6.1. Sample Code

4.6.2. Sample Code run by Maven

4.6.3. CLI

4.7. Insertion

4.7.1. Sample Code

4.7.2. Sample Code run by Maven

4.7.3. CLI

4.8. Deletion

4.8.1. Sample Code

4.8.2. Sample Code run by Maven

4.8.3. CLI

评论

发表评论

相关推荐

Apache Cassandra Learning Step by Step (5): 实战性的JTwissandra项目

Apache Cassandra Learning Step by Step (4): Data Modeling

Apache Cassandra Learning Step by Step (2): Core Concepts

Apache Cassandra Learning Step by Step (1)

最近访客更多访客>>