[zz]Tokyo Cabinet Observations

iunknown

浏览: 404083 次

最近访客更多访客>>

kristy_yy

pulsar_lxl

aura521521

u011729897

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

网络编程

Python Scheme TokyoCabinet PHP memcached

http://parand.com/say/index.php/2009/04/09/tokyo-cabinet-observations/

I’m using Tokyo Cabinet with Python tc for a decent sized amount of data (~19G in a single hash table) on OS X. A few observations and oddities:

    * Writes slow down significantly as the database size grows. I’m writing 97 roughly equal sized batches to the tch table. The first batch takes ~40 seconds, and processing time seems to increase fairly linearly, with the last taking ~14 minutes. Not sure why this would be the case, but it’s discouraging. I’ll probably write a simple partitioning scheme to split the data into multiple databases and keep the size of each small, but it seems like this should be handled out of the box for me.
    * [Update] I implemented a simple partitioning scheme, and sure enough it makes a big difference. Apparently keeping the file size small (where small is < 500G) is important. Surprising - why doens’t TC implement partitioning if it’s susceptible to performance issues with larger file sizes? Is this a python tc issue or a Tokyo Cabinet issue?
    * [Also] Seems I can only open 53-54 tc.HDB()’s before I get an ‘mmap error’, limiting how much I can partition.
    * Reading records that have already been read from the tch seems to go much faster on the second access (like an order of magnitude faster). I suspect this is the disk cache at work, but if anyone has extra info on this please enlighten me.

Another somewhat surprising aspect: using the tc library you’re essentially embedding Tokyo Cabinet in your app; I had assumed it was going to be network based access, but it’s not. You can do network access either using the memcached protocol or using pytyrant.

分享到：

SPHiveDB: 基于 sqlite 的数据库服务器 | spmemvfs: 在内存中加载/保存 sqlite3 数 ...

2009-05-05 17:41
浏览 1367
评论(5)
查看更多

5 楼 iunknown 2009-06-01

http://tokyocabinet.sourceforge.net/spex-en.html
improves robustness : database file is not corrupted even under catastrophic situation.

4 楼 iunknown 2009-06-01

http://torum.net/2009/05/tokyo-cabinet-protected-database-iteration/

If all goes well, the counter variable will be set to the number of records in the database. This function is slightly more complex than using tchdbiternext() but you are guaranteed to iterate atomically which is pretty important for a table scanner.

3 楼 iunknown 2009-06-01

http://www.dmo.ca/blog/benchmarking-hash-databases-on-large-data/

As can be seen from those results, CDB kills all comers in this simulation of our normal workload. Perhaps there are ways to tune Tokyo Cabinet to perform better on large data sets?

2 楼 iunknown 2009-05-06

http://www.readwriteweb.com/archives/is_the_relational_database_doomed.php?p=3

1 楼 iunknown 2009-05-06

http://randomfoo.net/2009/04/20/some-notes-on-distributed-key-stores

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论