- 浏览: 146935 次
- 性别:
- 来自: 杭州
文章分类
最新评论
-
fei33423:
流程引擎的流转比状态机要复杂太多了把. 我觉得你即可以看成平行 ...
工作流引擎是否应该建立在有限状态机(Finite State Machine, FSM)引擎之上? -
c601097836:
为什么你们都说FSM,而不是UML State machine ...
工作流引擎是否应该建立在有限状态机(Finite State Machine, FSM)引擎之上? -
ronghao:
di1984HIT 写道类似啊,我现在也有这个困惑、我的工作流 ...
工作流引擎是否应该建立在有限状态机(Finite State Machine, FSM)引擎之上? -
di1984HIT:
不错啊。学习了。
[转]hadoop使用中的几个小细节(一) -
di1984HIT:
好的、
工作流20种基本模式的理解
====16 Feb 2012, by Bright Zheng (IT进行时)====
4. Samples ABC
We try to learn it step by step to understand the concepts and Java API usages by means of:
1. Concept Introduction
2. CLI
3. Java Sample Code
4.1. Get a Single Column by a Key
4.1.1. Sample Code
public QueryResult<HColumn<String,String>> execute() { ColumnQuery<String, String, String> columnQuery = HFactory.createStringColumnQuery(keyspace); columnQuery.setColumnFamily("Npanxx"); columnQuery.setKey("512204"); columnQuery.setName("city"); QueryResult<HColumn<String, String>> result = columnQuery.execute();
return result; } |
4.1.2. Sample Code run by Maven
C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner" |
The output is:
[INFO] --- exec-maven-plugin:1.1.2-Beta1:java (default-cli) @ cassandra-tutorial ---
HColumn(city=Austin)
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
4.1.3. CLI
[default@Tutorial] get Npanxx['512204']['city']; => (column=city, value=Austin, timestamp=1329234388328000) Elapsed time: 16 msec(s). |
4.2. Get multiple columns by a Key
4.2.1. Sample Code
public QueryResult<ColumnSlice<Long,String>> execute() { SliceQuery<String, Long, String> sliceQuery = HFactory.createSliceQuery(keyspace, stringSerializer, longSerializer, stringSerializer); sliceQuery.setColumnFamily("StateCity"); sliceQuery.setKey("TX Austin");
//way 1: set multiple columnNames sliceQuery.setColumnNames(202L, 203L, 204L);
//way 2: use setRange // change 'reversed' to true to get the columns in reverse order //sliceQuery.setRange(202L, 204L, false, 5);
QueryResult<ColumnSlice<Long, String>> result = sliceQuery.execute(); return result; } |
4.2.2. Sample Code run by Maven
C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get_slice_sc" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner" |
The output is:
[INFO] --- exec-maven-plugin:1.1.2-Beta1:java (default-cli) @ cassandra-tutorial --- ColumnSlice([HColumn(202=30.27x097.74), HColumn(203=30.27x097.74), HColumn(204=30.32x097.73)] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS |
4.2.3. CLI(TODO)
TODO: Refering to CLI Syntax, Cassandra can’t get multiple columns at one ‘get’ command? |
4.3. Get multiple rows by a set of Key
4.3.1. Sample Code
public QueryResult<Rows<String,String,String>> execute() { MultigetSliceQuery<String, String, String> multigetSlicesQuery = HFactory.createMultigetSliceQuery(keyspace, stringSerializer, stringSerializer, stringSerializer); multigetSlicesQuery.setColumnFamily("Npanxx"); multigetSlicesQuery.setColumnNames("city","state","lat","lng"); multigetSlicesQuery.setKeys("512202","512203","512205","512206"); QueryResult<Rows<String, String, String>> results = multigetSlicesQuery.execute(); return results; } |
4.3.2. Sample Code run by Maven
C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="multiget_slice" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner" |
The output is:
[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---
Rows({
512205=Row(512205,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)])),
512206=Row(512206,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)])),
512203=Row(512203,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.27), HColumn(lng=097.74), HColumn(state=TX)])),
512202=Row(512202,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.27), HColumn(lng=097.74), HColumn(state=TX)]))})
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
4.3.3. CLI(TODO)
TODO: N/A? |
4.4. Get Slices from a Range of Rows by Key
4.4.1. Sample Code
GetRangeSlicesForStateCity.java
public QueryResult<OrderedRows<String,String,String>> execute() { RangeSlicesQuery<String, String, String> rangeSlicesQuery = HFactory.createRangeSlicesQuery(keyspace, stringSerializer, stringSerializer, stringSerializer); rangeSlicesQuery.setColumnFamily("Npanxx"); rangeSlicesQuery.setColumnNames("city","state","lat","lng"); rangeSlicesQuery.setKeys("512202", "512205"); rangeSlicesQuery.setRowCount(5); QueryResult<OrderedRows<String, String, String>> results = rangeSlicesQuery.execute(); return results; } |
Important Note: The result actually is NOT meaningful (expected return might be 512202-512205, 4 rows, but actually not) since the Key is sorted by RandomPartitioner (which can be configured in /conf/cassandra.yaml, but not recommend to do so). The result can be referred at “Sample Code run by Maven”.
4.4.2. Sample Code run by Maven
C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get_range_slices" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner" |
The output is:
[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---
Rows({
512202=Row(512202,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.27), HColumn(lng=097.74), HColumn(state=TX)])),
512206=Row(512206,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)])),
512205=Row(512205,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73), HColumn(state=TX)]))
})
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
4.4.3. CLI(TODO)
TODO: N/A |
4.5. Get Slices from a Range of Rows by Columns
4.5.1. Sample Code
GetSliceForAreaCodeCity.java
public QueryResult<ColumnSlice<String,String>> execute() { SliceQuery<String, String, String> sliceQuery = HFactory.createSliceQuery(keyspace, stringSerializer, stringSerializer, stringSerializer); sliceQuery.setColumnFamily("AreaCode"); sliceQuery.setKey("512"); // change the order argument to 'true' to get the last 2 columns in descending order // gets the first 4 columns "between" Austin and Austin__204 according to comparator sliceQuery.setRange("Austin", "Austin__204", false, 5);
QueryResult<ColumnSlice<String, String>> result = sliceQuery.execute();
return result; } |
4.5.2. Sample Code run by Maven
C:\projects_learning\learning-cassandra-tutorial>mvn -e exec:java -Dexec.args="get_slice_acc" -Dexec.mainClass="com.datastax.tutorial.TutorialRunner" |
The output is:
[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---
ColumnSlice([
HColumn(Austin__202=30.27x097.74),
HColumn(Austin__203=30.27x097.74),
HColumn(Austin__204=30.32x097.73)
])
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
4.5.3. CLI
N/A |
4.6. Get Slices from Indexed Columns
4.6.1. Sample Code
GetIndexedSlicesForCityState.java
public QueryResult<OrderedRows<String, String, String>> execute() { IndexedSlicesQuery<String, String, String> indexedSlicesQuery = HFactory.createIndexedSlicesQuery(keyspace, stringSerializer, stringSerializer, stringSerializer); indexedSlicesQuery.setColumnFamily("Npanxx"); indexedSlicesQuery.setColumnNames("city","lat","lng"); indexedSlicesQuery.addEqualsExpression("state", "TX"); indexedSlicesQuery.addEqualsExpression("city", "Austin"); indexedSlicesQuery.addGteExpression("lat", "30.30"); QueryResult<OrderedRows<String, String, String>> result = indexedSlicesQuery.execute();
return result; } |
4.6.2. Sample Code run by Maven
|
The output is:
[INFO] --- exec-maven-plugin:1.2:java (default-cli) @ cassandra-tutorial ---
Rows({512204=Row(
512204,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73)])),
512206=Row(512206,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73)])),
512205=Row(512205,ColumnSlice([HColumn(city=Austin), HColumn(lat=30.32), HColumn(lng=097.73)]))})
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
4.6.3. CLI
[default@Tutorial] get npanxx where state='TX' and city='Austin' and lat>'30.30'; ------------------- RowKey: 512204 => (column=city, value=Austin, timestamp=1329299521508000) => (column=lat, value=30.32, timestamp=1329299521540000) => (column=lng, value=097.73, timestamp=1329299521555000) => (column=state, value=TX, timestamp=1329299521524000) ------------------- RowKey: 512206 => (column=city, value=Austin, timestamp=1329299521618000) => (column=lat, value=30.32, timestamp=1329299521633000) => (column=lng, value=097.73, timestamp=1329299522491000) => (column=state, value=TX, timestamp=1329299521618000) ------------------- RowKey: 512205 => (column=city, value=Austin, timestamp=1329299521555000) => (column=lat, value=30.32, timestamp=1329299521586000) => (column=lng, value=097.73, timestamp=1329299521602000) => (column=state, value=TX, timestamp=1329299521571000)
3 Rows Returned. Elapsed time: 16 msec(s). |
4.7. Insertion
4.7.1. Sample Code
InsertRowsForColumnFamilies.java
public QueryResult<?> execute() { Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);
mutator.addInsertion("CA Burlingame", "StateCity", HFactory.createColumn(650L, "37.57x122.34",longSerializer,stringSerializer)); mutator.addInsertion("650", "AreaCode", HFactory.createStringColumn("Burlingame__650", "37.57x122.34")); mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("lat", "37.57")); mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("lng", "122.34")); mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("city", "Burlingame")); mutator.addInsertion("650222", "Npanxx", HFactory.createStringColumn("state", "CA"));
MutationResult mr = mutator.execute(); return null; } |
4.7.2. Sample Code run by Maven
Omitted |
4.7.3. CLI
[default@Tutorial] set StateCity['CA Burlingame']['650']='37.57x122.34'; [default@Tutorial] set AreaCode[‘650'][‘Burlingame__650’]=’37.57x122.34'; [default@Tutorial] set Npanxx['650222']['lat']='37.57'; … |
4.8. Deletion
4.8.1. Sample Code
InsertRowsForColumnFamilies.java
public QueryResult<?> execute() { Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);
//Mutator.addDeletion(String key, String cf, String columnName, Serializer<String> nameSerializer) //columnName as null means to delete the whole row. mutator.addDeletion("CA Burlingame", "StateCity", null, stringSerializer); mutator.addDeletion("650", "AreaCode", null, stringSerializer); mutator.addDeletion("650222", "Npanxx", null, stringSerializer); // adding a non-existent key like the following will cause the insertion of a tombstone // mutator.addDeletion("652", "AreaCode", null, stringSerializer); MutationResult mr = mutator.execute(); return null;
} |
4.8.2. Sample Code run by Maven
Omitted… |
4.8.3. CLI
[default@Tutorial] del StateCity['CA Burlingame']; [default@Tutorial] del AreaCode['650']; [default@Tutorial] del Npanxx['650222']; |
Important Note: Whatever you use, either java code or CLI, the deletion event will still leave the DeletedColumn row key there marked as Tombstone (hehe, 墓碑, a really good naming) which can be retrieved back by command of ‘list’ like this.
[default@Tutorial] list StateCity; Using default limit of 100 ------------------- RowKey: CA Burlingame ------------------- RowKey: TX Austin => (column=202, value=30.27x097.74, timestamp=1329297768323000) => (column=203, value=30.27x097.74, timestamp=1329297768338000) => (column=204, value=30.32x097.73, timestamp=1329297768354000) => (column=205, value=30.32x097.73, timestamp=1329297768370000) => (column=206, value=30.32x097.73, timestamp=1329297768385000) 2 Rows Returned. Elapsed time: 16 msec(s). |
As you see, two rows returned! Even the row of ‘CA Burlingame’ has been deleted.
Even worse, if the deletion of non-existing key will cause an issue called ‘insertion of a tombstone’ which means it will add one more row in the Column Family!!!
Fortrunately, the command of ‘get’ won’t retrieve it back any more.
[default@Tutorial] get StateCity['CA Burlingame']; Returned 0 results. Elapsed time: 0 msec(s). |
Go deeper? Please read on.
When will Cassandra remove these tombstones? As I know, two ways:
1. Wait until gc_grace_seconds is timeout (Not verified yet)
The gc_grace_seconds is set per CF and can be updated without a restart.
How to get gc_grace_seconds? Simply use CLI:
[default@Tutorial] show schema; … create column family StateCity with column_type = 'Standard' and comparator = 'LongType' and default_validation_class = 'UTF8Type' and key_validation_class = 'UTF8Type' and rows_cached = 0.0 and row_cache_save_period = 0 and row_cache_keys_to_save = 2147483647 and keys_cached = 200000.0 and key_cache_save_period = 14400 and read_repair_chance = 1.0 and gc_grace = 864000 // 10 days, OMG and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = true and row_cache_provider = 'ConcurrentLinkedHashCacheProvider' and compaction_strategy = 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'; … |
2. The Compaction event (under investigation but no luck yet)
The Compaction will be triggered automatically.
But how to trigger compaction manually? Use nodetool as well.
C:\java\apache-cassandra-1.0.7\bin>nodetool -h localhost flush Tutorial Starting NodeTool C:\java\apache-cassandra-1.0.7\bin>nodetool -h localhost compact Tutorial Starting NodeTool |
Then we can see some logging messages in the Cassandra console.
But as I found, the tombstones are still here. (WHY???)
C:\java\apache-cassandra-1.0.7\bin>sstable2json ..\runtime\data\Tutorial\StateCity-hc-9-Data.db { "4341204275726c696e67616d65": [["650","37.57x122.34",1329316454906000]], "54582041757374696e": [["202","30.27x097.74",1329297768323000], ["203","30.27x097.74",1329297768338000], ["204","30.32x097.73",1329297768354000], ["205","30.32x097.73",1329297768370000], ["206","30.32x097.73",1329297768385000]], "616263": [] } |
And still appears in the list command. (KAO, 阴魂不散? Big why???)
[default@Tutorial] list statecity; Using default limit of 100 ------------------- RowKey: CA Burlingame ------------------- RowKey: TX Austin => (column=202, value=30.27x097.74, timestamp=1329297768323000) => (column=203, value=30.27x097.74, timestamp=1329297768338000) => (column=204, value=30.32x097.73, timestamp=1329297768354000) => (column=205, value=30.32x097.73, timestamp=1329297768370000) => (column=206, value=30.32x097.73, timestamp=1329297768385000) ------------------- RowKey: abc
3 Rows Returned. Elapsed time: 31 msec(s). |
在这儿咱发几句牢骚:
1. 可能是学习深度还不足的原因,感觉CLI比较弱,适合初始化建模DDL和简单的数据分析;
2. Tombstone的清理问题还没有最终得到验证,暂时挂起,权当悬案先,以后有答案了再补充、更正
发表评论
-
Apache Cassandra Learning Step by Step (5): 实战性的JTwissandra项目
2012-02-25 22:08 2560在完成了Apache Cassandra的四个基本学习步骤之后 ... -
Apache Cassandra Learning Step by Step (4): Data Modeling
2012-02-22 18:14 212922 Feb 2012, by Bright Zheng (I ... -
Apache Cassandra Learning Step by Step (2): Core Concepts
2012-02-15 21:04 2267====15 Feb 2012, by Bright Zhen ... -
Apache Cassandra Learning Step by Step (1)
2012-02-14 21:58 2619By Bright Zheng (IT进行时) 1. A ...
相关推荐
Learning Apache Cassandra 2015
Learning_Apache_Cassandra .pdf
Learning Apache Cassandra - Second Edition by Sandeep Yarabarla English | 25 Apr. 2017 | ASIN: B01N52R0B5 | 360 Pages | AZW3 | 10.68 MB Key Features Install Cassandra and set up multi-node clusters ...
Title: Mastering Apache Cassandra, 2nd Edition Author: Nishant Neeraj Length: 322 pages Edition: 2 Language: English Publisher: Packt Publishing Publication Date: 2015-02-27 ISBN-10: 1784392618 ISBN-...
Spring Data for Apache Cassandra API。 Spring Data for Apache Cassandra 开发文档
Beginning Apache Cassandra Development introduces you to one of the most robust and best-performing NoSQL database platforms on the planet. Apache Cassandra is a document database following the JSON ...
Mastering Apache Cassand
Apache Cassandra is the most commonly used NoSQL database written in Java and is renowned in the industry as the only NoSQL solution that can accommodate the complex requirements of today’s modern ...
Cassandra的主要特点就是它不是一个数据库,而是由一堆数据库节点共同构成的一个分布式网络服务,对Cassandra 的一个写操作,会被复制到其他节点上去,对Cassandra的读操作,也会被路由到某个节点上面去读取。...
学习 cassandra 可以读一读,关于配置,部署,cassandra 的安全等知识
apache-cassandra-3.11.13 官网原版
Cassandra(apache-cassandra-3.11.11-bin.tar.gz)是一套开源分布式NoSQL数据库系统。它最初由Facebook开发,用于储存收件箱等简单格式数据,集GoogleBigTable的数据模型与Amazon Dynamo的完全分布式的架构于一身...
适用于ApacheCassandra:registered:的DataStax Node.js驱动程序一种现代,功能丰富且高度可调的Node.js客户端库,适用于Apache Cassandra和DSE,仅使用Cassandra的二进制协议和Cassandra查询语言。 用于Apache...
Apache Cassandra is the perfect choice for building fault tolerant and scalable databases. Implementing Cassandra will enable you to take advantage of its features which include replication of data ...
使用键空间、数据库表和分区键值访问数据,而无需编写 CQL 查询。 或者,在Cassandra:trade_mark:数据库上执行CQL查询,然后将查询结果导入MATLAB。
Apache Cassandra构建工具 Jenkins Job DSL脚本创建CI作业: jenkins-dsl/ Jenkins Job构建/测试运行时脚本: build-scripts/ Apache Cassandra打包实用程序: cassandra-release/ docker/ 建筑包 创建包含构建...