`
reason2003
  • 浏览: 91476 次
  • 性别: Icon_minigender_1
  • 来自: 南京
社区版块
存档分类
最新评论

couchdb Database Compaction

 
阅读更多

http://wiki.apache.org/couchdb/Compaction

 

 

Database Compaction

 

Compaction compresses the database file by removing unused sections created during updates. Old revisions of documents are also removed from the database though a small amount of meta data is kept for use in conflict during replication. The number of revisions (default of 1000) can be configured using the _revs_limit URL endpoint, available since version 0.8-incubating.

Compaction is manually triggered per database. Support for queued compaction of multiple databases is planned. Please note that compaction will be run as a background task.

 

Example

 

Compaction is triggered by an HTTP POST request to the _compact sub-resource of your database. On success, HTTP status 202 is returned immediately. Although the request body is not used you must still specify "application/json" as Content-Type for the request.

 

curl -H "Content-Type: application/json" -X POST http://localhost:5984/my_db/_compact
#=> {"ok":true}

 

GET requesting your database base URL (see HTTP_database_API#Database_Information) gives a hash of statuses that look like this:

 

curl -X GET http://localhost:5984/my_db
#=> {"db_name":"my_db", "doc_count":1, "doc_del_count":1, "update_seq":4, "purge_seq":0, "compact_running":false, "disk_size":12377, "instance_start_time":"1267612389906234", "disk_format_version":5}

 

The compact_running key will be set to true during compaction.

 

Compaction of write-heavy databases

 

It is not a good idea to attempt compaction on a database node that is near full capacity for its write load. The problem is the compaction process may never catch up with the writes if they never let up, and eventually it will run out of disk space.

Compaction should be attempted when the write load is less than full capacity. Read load won't affect its ability to complete, however. To have the least impact possible on clients, the database remains online and fully functional to readers and writers. It is a design limitation that database compaction can't complete when at capacity for write load. It may be reasonable to schedule compactions during off-peak hours.

In a clustered environment the write load can be switched off for any node before compaction and brought back up to date with replication once complete.

In the future, a single CouchDB node can be changed to stop or fail other updates if the write load is too heavy for it to complete in a reasonable time.

 

View compaction

 

Views need compaction like databases. There is a compact views feature introduced with CouchDB 0.10.0:

 

curl -H "Content-Type: application/json" -X POST http://localhost:5984/dbname/_compact/designname
#=> {"ok":true}

 

This compacts the view index from the current version of the design document. The HTTP response code is 202 Accepted (like compaction for databases) and a compaction background task will be created. Information on running compactions can be fetched withHTTP_view_API#Getting_Information_about_Design_Documents_(and_their_Views).

View indexes on disk are named after their MD5 hash of the view definition. When you change a view, old indexes remain on disk. To clean up all outdated view indexes (files named after the MD5 representation of views, that does not exist anymore) you can trigger a view cleanup:

 

curl -H "Content-Type: application/json" -X POST http://localhost:5984/dbname/_view_cleanup
#=> {"ok":true}

 

 

Automatic Compaction

 

Since CouchDB 1.2 it is possible to configure automatic compaction, so that compaction of databases and views is automatically triggered based on various criteria. Automatic compaction is configured in CouchDB's configuration files. The compaction daemon is responsible for triggering the compaction. It is automatically started, but disabled by default

 

[daemons]
#...
compaction_daemon={couch_compaction_daemon, start_link, []}

 

 

[compaction_daemon]
; The delay, in seconds, between each check for which database and view indexes
; need to be compacted.
check_interval = 300
; If a database or view index file is smaller then this value (in bytes),
; compaction will not happen. Very small files always have a very high
; fragmentation therefore it's not worth to compact them.
min_file_size = 131072

 

The criteria for triggering the compactions is configured in the "compactions" section.

 

[compactions]
; List of compaction rules for the compaction daemon.
; The daemon compacts databases and their respective view groups when all the
; condition parameters are satisfied. Configuration can be per database or
; global, and it has the following format:
;
; database_name = [ {ParamName, ParamValue}, {ParamName, ParamValue}, ... ]
; _default = [ {ParamName, ParamValue}, {ParamName, ParamValue}, ... ]

 

 

Possible Parameters

 

  • db_fragmentation: If the ratio (as an integer percentage), of the amount of old data (and its supporting metadata) over the database file size is equal to or greater then this value, this database compaction condition is satisfied. This value is computed as
    (file_size - data_size) / file_size * 100
    The data_size and file_size values can be obtained when querying a database's information URI (GET /dbname/).

  • view_fragmentation: If the ratio (as an integer percentage), of the amount of old data (and its supporting metadata) over the view index (view group) file size is equal to or greater then this value, then this view index compaction condition is satisfied. This value is computed as:
    (file_size - data_size) / file_size * 100
    The data_size and file_size values can be obtained when querying a view group's information URI (GET /dbname/_design/groupname/_info).

  • from _and_ to: The period for which a database (and its view groups) compaction is allowed. The value for these parameters must obey the format: HH:MM - HH:MM (HH in [0..23], MM in [0..59])

  • strict_window: If a compaction is still running after the end of the allowed period, it will be canceled if this parameter is set to 'true'. It defaults to 'false' and it's meaningful only if the *period* parameter is also specified.

  • parallel_view_compaction: If set to 'true', the database and its views are compacted in parallel. This is only useful on certain setups, like for example when the database and view index directories point to different disks. It defaults to 'false'.

Before a compaction is triggered, an estimation of how much free disk space is needed is computed. This estimation corresponds to 2 times the data size of the database or view index. When there's not enough free disk space to compact a particular database or view index, a warning message is logged.

 

Examples

 

  1. [{db_fragmentation, "70%"}, {view_fragmentation, "60%"}]
    The foo database is compacted if its fragmentation is 70% or more. Any view index of this database is compacted only if its fragmentation is 60% or more.

  2. [{db_fragmentation, "70%"}, {view_fragmentation, "60%"}, {from, "00:00"}, {to, "04:00"}]
    Similar to the preceding example but a compaction (database or view index) is only triggered if the current time is between midnight and 4 AM.

  3. [{db_fragmentation, "70%"}, {view_fragmentation, "60%"}, {from, "00:00"}, {to, "04:00"}, {strict_window, true}]
    Similar to the preceding example - a compaction (database or view index) is only triggered if the current time is between midnight and 4 AM. If at 4 AM the database or one of its views is still compacting, the compaction process will be canceled.

  4. [{db_fragmentation, "70%"}, {view_fragmentation, "60%"}, {from, "00:00"}, {to, "04:00"}, {strict_window, true}, {parallel_view_compaction, true}]
    Similar to the preceding example, but a database and its views can be compacted in parallel.

 

Default Configuration

 

The default configuration - if enabled - applies to all databases. For example

 

_default = [{db_fragmentation, "70%"}, {view_fragmentation, "60%"}, {from, "23:00"}, {to, "04:00"}]

 

分享到:
评论

相关推荐

    Beginning CouchDB.pdf

    ### CouchDB基础知识与应用 #### 一、CouchDB简介 CouchDB是Apache基金会旗下的一个开源NoSQL数据库系统,其设计目标是为了更好地支持Web 2.0时代的数据存储需求。该数据库以其易于使用、可扩展性和高可用性著称,...

    apache-couchdb-2.3.1.zip

    Apache CouchDB是一个开源的、基于文档的分布式数据库系统,它采用了JSON作为数据格式,并使用JavaScript进行查询和数据处理。在2.3.1版本中,CouchDB继续提供了一流的可扩展性和灵活性,适用于各种应用程序,特别是...

    CouchDB,Python

    - **操作文档**:创建、读取、更新和删除文档是通过`Database`对象进行的。例如,创建一个新文档: ```python doc = {"name": "John Doe", "age": 30} db.save(doc) ``` - **查询视图**:设计文档包含视图,...

    couchdb源码

    CouchDB是一款开源的文档数据库管理系统,以其独特的JSON数据模型、RESTful API和分布式系统设计而闻名。在深入探讨CouchDB源码之前,我们首先理解CouchDB的基本概念和工作原理。 CouchDB的核心是基于JSON...

    CouchDB20 分钟入门

    学习couchDB 的入门教程

    CouchDB权威指南(中文 完整版)

    根据提供的文件信息,我们可以推断出这是一份关于CouchDB的权威指南,该文档为中文版,并且是完整的两百多页版本。下面将基于这些信息生成相关的CouchDB知识点。 ### CouchDB简介 CouchDB是一款面向文档、分布式且...

    Apress.Beginning.CouchDB.Dec.2009.pdf

    《初识CouchDB》是一本面向专业人士的专业书籍,它详细介绍了Apache CouchDB数据库管理系统的基础知识和高级特性。CouchDB作为一个不断发展的非关系型数据库系统,与传统的SQL数据库相比,具有独特的特性和优势。...

    CouchDB权威指南(带详细目录)PDF

    通过《CouchDB权威指南》,你将学会如何通过CouchDB的RESTful Web接口来使用它,此外你还会熟悉CouchDB的一些主要特性,比如简单的文档的CRUD(创建、读取、更新、删除); 高级的MapReduce,部署优化等更多的内容...

    CouchDB独立博客sofa-CouchDB.zip

    sofa-CouchDB 是 CouchDB 的独立博客,使用 CouchDB 的书来做主要内容,这方便了所有用来在这博客上交流他们的想法,并且里面提供了很多帮助指导,这都是 HTML,Javascript 和 CouchDB 的结晶。目前支持任何人在上面...

    CouchDB资料整理

    CouchDB是一款开源的数据库系统,属于Apache软件基金会的一个项目。它是一种NoSQL数据库,以文件存储形式使用JSON作为数据存储格式,并采用JavaScript作为查询语言。CouchDB具有灵活的API,支持MapReduce和HTTP等...

    数据库CouchDB入门到精通.txt打包整理.zip

    CouchDB是一款开源的、基于文档的分布式数据库系统,它采用了JSON作为数据格式,JavaScript作为查询语言,并且支持多版本并发控制。这个压缩包“数据库CouchDB入门到精通.txt打包整理.zip”显然包含了关于CouchDB的...

    Fabric 1.4基于couchdb环境搭建

    Fabric 1.4基于couchdb环境搭建步骤,以及基于couchdb的区块链多字段数据查询

    Apache-CouchDB.zip

    CouchDB 是一个开源的面向文档的数据库管理系统,可以通过 RESTful JavaScript Object Notation (JSON) API 访问。术语 “Couch” 是 “Cluster Of Unreliable Commodity Hardware” 的首字母缩写,它反映了 CouchDB...

    CouchDB权威指南

    《CouchDB权威指南》是一本深入探讨CouchDB数据库系统的专著,旨在为读者提供全面、详尽的CouchDB知识。CouchDB是一种基于文档的分布式数据库系统,采用JSON作为数据格式,JavaScript作为查询语言,并且支持多版本...

    apache-couchdb-2.3.1.msi

    CouchDB 是一个开源的面向文档的数据库管理系统,可以通过 RESTful JavaScript Object Notation (JSON) API 访问。术语 “Couch” 是 “Cluster Of Unreliable Commodity Hardware” 的首字母缩写,它反映了 CouchDB...

Global site tag (gtag.js) - Google Analytics