论坛首页 综合技术论坛

my CouchDB tutorial

浏览 3351 次
精华帖 (0) :: 良好帖 (0) :: 新手帖 (0) :: 隐藏帖 (0)
作者 正文
   发表时间:2010-07-23  



历史


    “Couch” 是 “Cluster Of Unreliable Commodity Hardware” 的首字母缩写,它反映了 CouchDB 的目标具有高度可伸缩性,提供了高可用性和高可靠性,即使运行在容易出现故障的硬件上也是如此。CouchDB 最初是用 C++ 编写的,但在 2008 年 4 月,这个项目转移到 Erlang OTP 平台进行容错测试。

    在这篇访谈中,Katz谈 到,CouchDB其实是将Lotus Notes的核心剥离出来,去芜存菁的产物。

    IBM曾经资助CouchDB,允许Katz全职从事项目的开发。

    2009年,Katz和Chris Anderson等 一些同仁组建Relaxed公司,同年11月公司获得200万美元风险投资, 改名为Couchio。




特性:

  • NOSQL, 面向文档的数据库
  • 追加型数据库
  • 无中心???

  • 多 版本并发性控制(Multiversion concurrency controlMVCC)
    • -它向每个客户机提供数据库的 最新版本的快照。这意味着在提交事务之前,其他用户不能看到更改。许多现代数据库开始从锁机制前移到 MVCC,包括 Oracle(V7 之后)和 Microsoft® SQL Server 2005 及更新版本。
  • HTTP接口,JSON API 访问
  • 强 大的 B-树储存引擎
  • Map/Reduce

    PS. 上面描述的这些正处在迅速的进化之中……包括身份验证,同步过滤,URL Mapping 等等,所有需要用到的一切,正在迅速被增加进来。 (已经呼之欲出 —— CouchDB 将会进化成为一个 AppServer)
我不喜欢这点,觉得应该做成一个有 Map Reduce框架的数据库,而不是一个AppServer(CouchApp) -Lin Yang 7/23/10 7:57 PM 

Examples


获 取CouchDB server info:

curl http://127.0.0.1:5984/

 

{"couchdb":"Welcome","version":"0.11.0"}

创 建DB:

curl -X PUT http://127.0.0.1:5984/wiki

    CouchDB will reply with the following message, if the database does not exist:

{"ok":true}

    or, with a different response message, if the database already exists:

{"error":"file_exists","reason":"The database could not be created, the file already exists."}

获取DB 信息:

curl -X GET http://127.0.0.1:5984/wiki

 

{"db_name":"wiki","doc_count":0,"doc_del_count":0,"update_seq":0,
"purge_seq":0,"compact_running":false,"disk_size":79,
"instance_start_time":"1272453873691070","disk_format_version":5}

删 除DB:

curl -X DELETE http://127.0.0.1:5984/wiki

 

{"ok":true}

在wiki下 创建一个称为 apple 的文档
curl -X PUT http://127.0.0.1:5984/wiki/apple -H "Content-Type: application/json" -d {}
 
{"ok":true,"id":"apple","rev":"1801185866"}

获取文档:
curl -X GET http://127.0.0.1:5984/wiki/apple
{"_id":"apple","_rev":"1801185866"}


update 文档:
curl -X PUT http://localhost:5984/wiki/apple -H "Content-Type: application/json" -d '{"_rev":"1801185866" ,"a":3}'
{"ok":true,"id":"apple","rev":"2-b5be0b773091"}

典 型的 CouchDB View (query)

map: function(doc) {
    if (doc._attachments) {
         emit("with attachment", 1);
     }
     else {
         emit("without attachment", 1);
     }
}
reduce: function(keys, values) {
    return sum(values);
}

curl -s -i -X POST -H 'Content-Type: application/json'
-d '{"map": "function(doc){if(doc._attachments) {emit(\"with\",1);} else {emit(\"without\",1);}}",
"reduce": "function(keys, values) {return sum(values);}"}'
'http://localhost:5984/somedb/_temp_view?group=true'



Futon界面:


http://localhost:5984/_utils/


关于View /Map Reduce:


the view is defined by a JavaScript function that maps view keys to values

两种View

  • permanent view
    • stored inside special documents called design documents
    • create: create doc: http://localhost:5984/{dbname}/_design/my_views
    • query: GET URI /{dbname}/{docid}/{viewname}
    •                                      _design/my_views/_view/all_docs

  • temporary view
    • Slow , very expensive to compute
    • POST URI /{dbname}/_temp_view

创建view :

在Futon界面上,Overview > wiki
右上角选择 view : Temporary View.
(If your Futon Web-Client acts funny, clear the cookies futon created )

Map & Reduce

map
function(doc) {
    emit(null, doc);
}


    A view function should accept a single argument: the document object. To produce results, it should call the implicitly available emit(key, value) function.For every invocation of that function, a result row is added to the view

    To be able to filter or sort the view by some document property, you would use that property for the key. For example, the following view would allow you to lookup customer documents by the LastName or FirstName fields:

 

function(doc) {
    if (doc.Type == "customer") {
        emit(doc.LastName, {FirstName: doc.FirstName, Address: doc.Address});
        emit(doc.FirstName, {LastName: doc.LastName, Address: doc.Address});
    }
}
reduce
function (key, values, rereduce) {
    return sum(values);
}


Reduce functions must handle two cases:

1. When rereduce is false:

  • key will be an array whose elements are arrays of the form [key,id], where key is a key emitted by the map function and id is that of the document from which the key was generated.

  • values will be an array of the values emitted for the respective elements in keys

  • i.e. reduce([ [key1,id1], [key2,id2], [key3,id3] ], [value1,value2,value3], false)

2. When rereduce is true:

  • key will be null

  • values will be an array of values returned by previous calls to the reduce function

  • i.e. reduce(null, [intermediate1,intermediate2,intermediate3], true)


注意:

  • 一个View 可以只有map函数
  • A reduce function must reduce the input values to a smaller output value    (reduce 函数要处理emit的结果,也要处理自己返回的结果)
  • emit发射的key可以是一个数组格式:[X,Y,1]

Reduce Vs reReduce

据说,比如一个map函数产生如下 Key->Value pair:
[X, Y, 0] -> Object_A
[X, Y, 1] -> Object_B
[X, Y, 2] -> Object_C
[X, Y, 3] -> Object_D
然后,reduce函数会受到如下3个调用. ()
reduce( [  [[X,Y,0], id0] , [[X,Y,1], id1]  ], [Object_A, Object_B], false)
reduce( [  [[X,Y,2], id3] , [[X,Y,3], id3]  ], [Object_C, Object_D], false)
reduce( null                                 , [Object_CB, Object_CD], true)


    我还是不懂得这个ReReduce是什么用处,因为我觉得,对于每个key,会来一次reduce.
之后不会再有第二次reduce了。

group

group=true可 以让 Reduce 方法按照 Map 方法输出的键进行分组

效果见示例代码.


Debugging Views

有个log函数可以用于输出debug信息:
{
     "map": "function(doc) { log(doc); }"
}
tail -f /var/log/couchdb/couch.log


View相关的主要文档(我觉得这三个文档写得真难 懂...)

这 里有个不错的在线演示: 只是不错~不是最好. -Lin Yang 7/23/10 7:47 PM 
http://labs.mudynamics.com/wp-content/uploads/2009/04/icouch.html



安装:

ubuntu

源里面有.

编译安装:

依赖真多...
yum install js-devel icu libicu libicu-devel
wget http://curl.haxx.se/download/curl-7.21.0.tar.gz && tar vzxf curl-7.21.0.tar.gz && cd curl-7.21.0 && ./configure && make  && make install


#/usr/local/bin/couchdb -b (background)
Apache CouchDB has started, time to relax.

果 yum里面没有 ... 安装SpiderMonkey

Erlang Client

http接口,不需要专用client,但是有client更 好...


couchbeam

eCouch

 

erlcouch ..


 



benchmark

某人做的:
CouchDB inserts ~2-3k documents / second in a >100k documents database
-------- 0.3-0.4ms / doc
CouchDB inserts get slower on bigger databases

在 我的8核8G上(4G free)
ab -n 10000 -c 100 http://localhost:5984/wiki/apple   (查询请求)
Server Software:        CouchDB/0.11.0
Server Hostname:        localhost
Document Length:        118 bytes
Requests per second:    3066.17 [#/sec] (mean)
Time per request:       32.614 [ms] (mean)
Time per request:       0.326 [ms] (mean, across all concurrent requests)





好的文档:


http://wiki.apache.org/couchdb/Getting_started_with_Erlang
http://en.wikipedia.org/wiki/CouchDB  有例子
http://www.ibm.com/developerworks/cn/opensource/os-couchdb/  介绍+例子
http://www.ibm.com/developerworks/cn/opensource/os-cn-couchdb/index.html  (长,ready)

CouchDB: The Definitive Guide 的翻译blog  by  时之刻痕

http://wiki.apache.org/couchdb/FrontPage  总文档!!!

clients:
http://wiki.apache.org/couchdb/Getting_started_with_Python  python示例代码
http://wiki.apache.org/couchdb/API_Cheatsheet  API
http://news.csdn.net/a/20100714/219109.html

书:

http://books.couchdb.org/relax/

 

副自己的测试代码:

 

#!/usr/bin/python
#coding:utf-8

# example:
# do_request('10.99.60.91:8080', '/home', 'PUT', '', {"Content-type": "application/x-www-form-urlencoded"} )
def do_request2(netloc, path, method, data='', headers={}):
    import httplib
    conn = httplib.HTTPConnection(netloc)
    conn.request(method, path, data, headers)
    response = conn.getresponse()
    if response.status/100 == 2:
        data = response.read()
        return data
    print 'ERROR: response.status = %d'% response.status
    print 'response data is ', response.read()

def do_request(url, method, data='', headers={}):
    print '>>>>>>>>>>>>> do_request: ', url
    from urlparse import urlparse 
    o = urlparse(url)
    path = o.path
    if o.query:
        path = path + '?' + o.query
    return do_request2(o.netloc, path, method, data, headers)


def delete_db(db_name):
    return do_request('http://127.0.0.1:5984/%s'%db_name, 'DELETE')

def create_db(db_name):
    return do_request('http://127.0.0.1:5984/%s'%db_name, 'PUT')

def create_doc(db_name, doc_id, doc):
    print 'create_doc %s ' % doc_id
    return do_request('http://127.0.0.1:5984/%s/%s'%(db_name, doc_id), 'PUT', doc, {'Content-Type': 'application/json'})

def get_doc(db_name, doc_id):
    return do_request('http://127.0.0.1:5984/%s/%s'%(db_name, doc_id), 'GET', '', {})
#query_string='?group=false'
def query_temp_view(db_name, doc, query_string=''):
    url = 'http://127.0.0.1:5984/%s/_temp_view%s'%(db_name, query_string)
    return do_request(url, 'POST', doc, {'Content-Type': 'application/json; charset=UTF-8'})

# 这个API是创建好多个view,一个design document
def create_permanent_view(db_name, view_name, views_json):
    return create_doc(db_name, view_name, views_json) 



def test():
    print delete_db('phone')
    print create_db('phone')
    print create_doc('phone', 'Nokia-5200','''
        {"make": "Nokia", 
        "price": 100, 
        "os": "s40"}
            ''')

    print create_doc('phone', 'Nokia-1661','''
        {"make": "Nokia", 
        "price": 32.5, 
        "os": "s40"}
            ''')

    print create_doc('phone', 'Nokia-E63','''
        {"make": "Nokia", 
        "price": 500, 
        "os": "s60"}
            ''')

    print create_doc('phone', 'HTC-Wildfire','''
        {"make": "HTC", 
        "price": 200, 
        "os": "Android"}
            ''')

    print create_doc('phone', 'BlackBerry-Bold','''
        {"make": "BlackBerry", 
        "price": 300, 
        "os": "BlackBerry-OS"}
            ''')

    print create_doc('phone', 'Samsung-Galaxy-S','''
        {"make": "Samsung", 
        "price": 400, 
        "os": "Android"}
            ''')
    print create_doc('phone', 'iPhone4','''
        {"make": "Apple", 
        "price": 1000, 
        "os": "Mac"}
            ''')
    ############################################################################
    # get all docs
    view = ''' 
{
    "map" : "function(doc){
        emit(doc.price, doc);
    }"
}
    '''
    print query_temp_view('phone', view);


    ############################################################################
    # select sum(price) form phone 
    view = ''' 
{
    "map" : "function(doc){
        emit('all-price', doc.price);
    }",
    "reduce" : "function(key, values, rereduce){
        return sum(values);
    }"
}
    '''
    print query_temp_view('phone', view);


    ############################################################################
    # test permanent_view
    views = '''
{
    "language": "javascript",
    "views": {
        "all_phones": {
            "map" : "function(doc){
                emit(doc.price, doc);
            }"
        },
        "sum_price": {
            "map" : "function(doc){
                emit('all-price', doc.price);
            }",
            "reduce" : "function(key, values, rereduce){
                return sum(values);
            }"
        },
    }
}
    '''
    create_permanent_view('phone', '_design/my_views', views)
    print ':::: retrive the views DOC: '
    print get_doc('phone', '_design/my_views')
    print ':::: now let us try to query on this views "all_phones"'
    print get_doc('phone', '_design/my_views/_view/all_phones')

    print ':::: And query on this views "sum_price"'
    print get_doc('phone', '_design/my_views/_view/sum_price')

    ############################################################################
    print ':::: we use temp view for test'
    print ':::: get all phones of Nokia: '
    view = ''' 
{
    "map" : "function(doc){
        //log(doc); // debug fun
        if (doc.make == 'Nokia')
            emit(null, doc);
    }"
}
    '''
    print query_temp_view('phone', view);


    ############################################################################
    print ':::: get phones count of every os : '
    view = ''' 
{
    "map" : "function(doc){
        emit(doc.os, 1);
    }",
    "reduce" : "function(key, values, rereduce){
        log('reduce called!!!!');
        log(key);
        log(values);
        log(rereduce);
        return sum(values); 
    }"
}
    '''
    print query_temp_view('phone', view, '?group=true');

    print ':::: let us look at the parm group.. if we set group=false : '
    print query_temp_view('phone', view, '?group=false');

    ############################################################################
    print ':::: let us get a list of unique os , just like SQL: SELECT DISTINCT(os) FROM phone'
    view = ''' 
{
    "map" : "function(doc){
        emit(doc.os, null);
    }",
    "reduce" : "function(key, values, rereduce){
        return null;
    }"
}
    '''
    print query_temp_view('phone', view, '?group=true');
   

    ############################################################################
    print ':::: let us get all phone sort by price || SELECT _id FROM phone SORT BY price'
    view = ''' 
{
    "map" : "function(doc){
        emit(doc.price, doc._id);
    }",
}
    '''
    print query_temp_view('phone', view);


    ############################################################################
    print ':::: let us get min price || SELECT min(price) FROM phone '
    view = ''' 
{
    "map" : "function(doc){
        emit('p', doc.price);
    }",
    "reduce" : "function(key, values, rereduce){
        return Math.min.apply( Math, values);
            //http://labs.mudynamics.com/wp-content/uploads/2009/04/icouch.html 上 
            //computing min width/height ( js模拟)的例子在我这里不行
    }"
}
    '''
    print query_temp_view('phone', view, '?group=true');



    ############################################################################
    print ':::: I hava to try emit a array like emit([a,b,c], value)'

    view = ''' 
{
    "map" : "function(doc){
        emit(['p', 'min'], doc.price);
        emit(['p', 'max'], doc.price);
    }",
    "reduce" : "function(key, values, rereduce){
        log('reduce called!!!!');
        log(key);
        log(values);
        log(rereduce);

        return Math.min.apply( Math, values);
    }"
}
    '''
    print query_temp_view('phone', view, '?group=true&group-level=2');

    print ':::: if gropu-level==1 '
    print ':::: [p, min] and [p, max] will come together to a reduce fun, like this'
    print ':::: [[["p","max"],"BlackBerry-Bold"],[["p","max"],"HTC-Wildfire"],[["p","max"],"iPhone4"],[["p","max"],"Nokia-1661"],[["p","max"],"Nokia-5200"],[["p","max"],"Nokia-E63"],[["p","max"],"Samsung-Galaxy-S"],[["p","min"],"BlackBerry-Bold"],[["p","min"],"HTC-Wildfire"],[["p","min"],"iPhone4"],[["p","min"],"Nokia-1661"],[["p","min"],"Nokia-5200"],[["p","min"],"Nokia-E63"],[["p","min"],"Samsung-Galaxy-S"]]'

    print ':::: if gropu-level==2 '
    print ':::: [p, min] and [p, max] will come separate.......... like this'
    print '[[["p","min"],"Samsung-Galaxy-S"],[["p","min"],"Nokia-E63"],[["p","min"],"Nokia-5200"],[["p","min"],"Nokia-1661"],[["p","min"],"iPhone4"],[["p","min"],"HTC-Wildfire"],[["p","min"],"BlackBerry-Bold"]]' 

    #TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTODO ############################################################################
    print ':::: let us retrive top N os'









if __name__ == "__main__":
    test()
 
   发表时间:2010-07-23  
前两天CouchDB发布了1.0版本,据说对于大文档的插入速度提高了300%...
0 请登录后投票
论坛首页 综合技术版

跳转论坛:
Global site tag (gtag.js) - Google Analytics