couchdb漫游指南

zuroc

浏览: 1320260 次
性别:
来自: 江苏

最近访客更多访客>>

tiger754

u012363178

xxxcccvvv

surprise

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

随笔

CouchDB Python MySQL Erlang 数据挖掘

==== 启动 ====
balin couchdb # ./utils/run
参数有

-h          display a short help message and exit
-V          display version information and exit
-a FILE     add configuration FILE to chain
-A DIR      add configuration DIR to chain
-n          reset configuration file chain (including system default)
-c          print configuration file chain and exit
-i          use the interactive Erlang shell
-b          spawn as a background process(作为后台进程)
-p FILE     set the background PID FILE (overrides system default)
-r SECONDS respawn background process after SECONDS (defaults to no respawn)
-o FILE     redirect background stdout to FILE (defaults to $STDOUT_FILE)
-e FILE     redirect background stderr to FILE (defaults to $STDERR_FILE)
-s          display the status of the background process
-k          kill the background process, will respawn if needed
-d          shutdown the background process(关闭)

=== 配置 ===
balin couchdb # vi etc/couchdb/local_dev.ini
这里可以指定端口号等
常用的有

[httpd]
port = 12345
bind_address = 0.0.0.0

[admins]
用户名 = 密码

=== 使用 ===
http://123.123.123.123:12345/_utils/
可以创建数据库

===== python 中的使用 =====
http://123.123.123.123:12345/_utils/database.html?python-tests

以操纵这个数据库作为演示,python库有几个函数比如update([...])不能用,不能用用户名密码等等,也许要修一下...

from couchdb import client
from couchdb.client import Document
server = client.Server('http://123.123.123.123:12345/')

#打开数据库
db = server['python-tests']

#创建一条数据
doc_id = db.create({'type': 'Person', 'name': 'John Doe'})

#获取一条数据,这个doc接口和字典一样
doc = db[doc_id]

#_rev是版本,_id是uuid
doc.items()
[(u'_rev', u'1-2963977070'),
(u'_id', u'4a36f238f4facbe08762b1a958cef39e'),
(u'type', u'Person'),
(u'name', u'John Doe')]

#可以自己指定主键
db['JohnDoe'] = {'type': 'person', 'name': 'John Doe'}

db['JohnDoe'].items()
[(u'_rev', u'1-2744716443'),
(u'_id', u'JohnDoe'),
(u'type', u'person'),
(u'name', u'John Doe')]

#更新
badman = db['JohnDoe']
badman[age]=1234
db['JohnDoe'] = badman

#删除,可以用db.delete(doc)来删除
del db['JohnDoe']

#遍历
for row in db.view('_all_docs'):
    print row.id

#看数据库信息
db.info()
{u'compact_running': False,
u'db_name': u'python-tests',
u'disk_size': 24381,
u'doc_count': 13,
u'doc_del_count': 0,
u'instance_start_time': u'1241518867280531',
u'purge_seq': 0,
u'update_seq': 21}

#文档可以有2进制的附件 put_attachment 用这个函数上传

# 查询,map_fun是一个js函数,emit是emit(key,value)。key,value均可是null
# web页面上有Select view查询,可以直接搜索测试
# 好像要用unicode字符不然找不到囧啊

db['/logo/xxx1.jpg']={"type":"logo","size":1}
db['/logo/xxx2.jpg']={"type":"logo","size":2}
db['/logo/xxx3.jpg']={"type":"logo","size":3}
db['/logo/xxx4.jpg']={"type":"logo","size":4}

map_fun = u'''
function(doc) {
    if (doc.type=='logo')
        emit(doc._id, doc.size);
}
'''

for row in db.query(map_fun):
    print row
输出
<Row id=u'logo/xxx1.jpg', key=u'logo/xxx1.jpg', value=1>
<Row id=u'logo/xxx2.jpg', key=u'logo/xxx2.jpg', value=2>
<Row id=u'logo/xxx3.jpg', key=u'logo/xxx3.jpg', value=3>
<Row id=u'logo/xxx4.jpg', key=u'logo/xxx4.jpg', value=4>

我们还可以加上reduce函数
比如

reduce_fun = u'''
function(keys, values, rereduce) {
    return sum(values)
}
'''
for row in db.query(map_fun,reduce_fun):
    print row
输出
<Row key=None, value=10>

reduce 中 rereduce变量的含义如下

   1. rereduce为false

        * key为array，element为：[key,id]，key为map function产生的key，id为Document对应id
        * values为array，elements为map function产生的结果
        * 比如 reduce([ [key1,id1], [key2,id2], [key3,id3] ], [value1,value2,value3], false)

   2. rereduce为true

        * key为null
        * values为array，element为前一次reduce返回的结果
        * 比如reduce(null, [中间结果1,中间结果2,中间结果3], true)

这里有一些map/reduce演示的例子,比较好懂
http://labs.mudynamics.com/wp-content/uploads/2009/04/icouch.html

==== Creating Views ====

View 可以理解为索引了不过这个索引不是实时的...

接着上文的例子

db["_design/test"]={
"views":
{
    "all": {
      "map": "function(doc) { if (doc.type == 'logo') emit(null, doc) }"
    },
    "size_large_than_2": {
      "map": "function(doc) { if (doc.size && parseInt(doc.size)>2) emit(null,doc) }"
    },
    "total_size": {
      "map": "function(doc) { emit(null,parseInt(doc.size)) }",
      "reduce": "function(keys,values) { return sum(values) }"
    }
}
}

然后刷新
http://123.123.123.123:12345/_utils/database.html?python-tests

可以看到 select views中多了test

也可访问
http://123.123.123.123:12345/python-tests/_design/test/_view/all
可以加上limit这一类参数
http://123.123.123.123:12345/python-tests/_design/test/_view/all?limit=2
点着看看
http://123.123.123.123:12345/python-tests/_design/test/_view/all?limit=2&skip=1
这样可以做分页,不过(http://stackoverflow.com/questions/312163/pagination-in-couchdb)
"""A simpler method of doing this is to use the skip parameter to work
out the starting document for the page, however this method should be
used with caution. The skip \parameter simply causes the internal
engine to not return entries that it is iterating over. While this
gives the desired behaviour it is much slower than finding the first
document for the page by key. The more documents that are skipped, the
slower the request will be."""
所以最好配合下面的startkey之类的来用skip

类似参数还有

排序 descending=false
开始结束 startkey="abc"&endkey="abcZZZZZZZZZ"
可以用docid startkey_docid=null

group=true 用法有的复杂看这里,是用来合并的结果的
http://jchrisa.net/drl/_design/sofa/_show/post/markov_chains_using_couchdb_s_g

key可以的复杂的key比如
The query startkey=["foo"]&endkey=["foo",{}] will match most array
keys with "foo" in the first element, such as ["foo","bar"] and
["foo",["bar","baz"]]. However it will not match
["foo",{"an":"object"}]

点着看看

python中可以这样访问

for row in db.view('_design/test/_view/all'):
    print row.id

输出
logo/xxx1.jpg
logo/xxx2.jpg
logo/xxx3.jpg
logo/xxx4.jpg

又如
for row in db.view('_design/test/_view/size_large_than_2'):
    print row

<Row id=u'logo/xxx3.jpg', key=None, value={u'_rev': u'1-3347158087', u'_id': u'logo/xxx3.jpg', u'type': u'logo', u'size': 3}>
<Row id=u'logo/xxx4.jpg', key=None, value={u'_rev': u'1-1107796651', u'_id': u'logo/xxx4.jpg', u'type': u'logo', u'size': 4}>

==== 网络资源 ====

这里有一篇中文的简介,可以看看作为背景知识
http://hi.baidu.com/freeway2000/blog/item/8f76ed11f26bc8c1a6ef3f53.html

CouchDB: The Definitive Guide
http://books.couchdb.org/relax/

=== 注 ===
1.
couchdb 根据网上的测试表明
写入速度比 mysql 慢4倍
创建索引速度比 mysql 慢50倍

2.
couchdb 只写入不删除
需要定期做整理
类似垃圾回收的copy+删除
需要预留大量磁盘空间

3.
索引不是实时的
你可能看到的是旧的数据

我的个人看法:
单单看性能,couchdb的确很不理想
但是couchdb可以把数据以view的方式展现,要什么,就新建什么样的view
这种随心所欲索引方式,在不少应用的场合,
通过view的方式把这种查询结果持久化,
可以大大减少了把传统意义上的重复且相似查询.

举一个例子,
比如好友广播,
每一个人创建一个view,
也许可以吧...

0
顶

1
踩

分享到：

数据库.温故 | lcs.py 最长公共子串算法

2009-05-06 00:07
浏览 2573
评论(2)
分类:编程语言
查看更多

2 楼 JeffreyHsu 2009-07-27

速度太慢了，令人发指，而且不能排序

新鲜玩意，完全不可以实际应用

1 楼 Arbow 2009-05-06

引用

couchdb 根据网上的测试表明
写入速度比 mysql 慢4倍
创建索引速度比 mysql 慢50倍

确实挺慢的，也许是速度原因阻碍了国内的发展，感觉对于不大不小的公司，技术方案评估上，速度性能很看重。从这点上大家会选择key value db，而不是document db

引用

但是couchdb可以把数据以view的方式展现,要什么,就新建什么样的view
这种随心所欲索引方式,在不少应用的场合,
通过view的方式把这种查询结果持久化,
可以大大减少了把传统意义上的重复且相似查询

感觉这种需求没有被挖掘出来，大家还是适合使用sql来获取所需数据，不会考虑这种陌生的东西。

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论