- 浏览: 982409 次
- 性别:
- 来自: 广州
最新评论
-
qingchuwudi:
有用,非常感谢!
erlang进程的优先级 -
zfjdiamond:
你好 这条命令 在那里输入??
你们有yum 我有LuaRocks -
simsunny22:
这个是在linux下运行的吧,在window下怎么运行escr ...
escript的高级特性 -
mozhenghua:
http://www.erlang.org/doc/apps/ ...
mnesia 分布协调的几个细节 -
fxltsbl:
A new record of 108000 HTTP req ...
Haproxy 1.4-dev2: barrier of 100k HTTP req/s crossed
原文地址: http://www.metabrew.com/article/on-bulk-loading-data-into-mnesia/
On bulk loading data into Mnesia
Consider this a work-in-progress; I will update this post if I find a ‘better’ way to do fast bulk loading
The time has come to replace my ets-based storage backend with something non-volatile. I considered a dets/ets hybrid, but I really need this to be replicated to at least a second node for HA / failover. Mnesia beckoned.
The problem:
15 million [fairly simple] records
1 Mnesia table: bag, disc_copies, just 1 node, 1 additional index
Hardware is a quad-core 2GHz CPU, 16GB Ram, 8x 74Gig 15k rpm scsi disks in RAID-6
Takes ages* to load and spews a load of “Mnesia is overloaded” warnings
* My definition of ‘takes ages’: Much longer than PostgreSQL \copy or MySQL LOAD DATA INFILE
At this point all I want is a quick way to bulk-load some data into a disc_copies table on a single node, so I can get on with running some tests.
Here is the table creation code:
mnesia:create_table(subscription,
[
{disc_copies, [node()]},
{attributes, record_info(fields, subscription)},
{index, [subscribee]}, %index subscribee too
{type, bag}
]
)
The subscription record is fairly simple:
{subscription, subscriber={resource, user, 123}, subscribee={resource, artist, 456}}
I’m starting erlang like so:
erl +A 128 -mnesia dir '"/home/erlang/mnesia_dir"' -boot start_sasl
The interesting thing there is really the +A 128 - this spreads the cpu load better between the 4 cores.
Attempt 0) ‘by the book’ one transaction to rule them all
Something like this:
mnesia:transaction(fun()-> [ mnesia:write(S) || S <- Subs ] end)
Time taken: Too long, I gave up after 12 hours
Number of “Mnesia overloaded” warnings: lots
Conclusion: Must be a better way
TODO: actually run this test and time it.
Attempt 1) dirty_write
There isn’t really any need to do this in a transaction, so I tried dirty_write.
[ mnesia:dirty_write(S) || S <- Subs ]
And here’s the warning in full:
=ERROR REPORT==== 13-Oct-2008::16:53:57 ===
Mnesia('mynode@myhost'): ** WARNING ** Mnesia is overloaded: {dump_log,
write_threshold}
Time taken: 890 secs
Number of “Mnesia overloaded” warnings: lots
Conclusion: Workable, but nothing to boast about. Those warnings are annoying
Attempt 2) dirty_write, defer index creation
A common trick with traditional RDBMS would be to bulk load the data into the table and add the indexes afterwards. In some scenarios you can avoid costly incremental index update operations. If you are doing this in one gigantic transaction it shouldn’t matter, and I’m not really sure how mnesia works under the hood (something I plan to rectify if I end up using it for real).
I tried a similar approach by commenting out the {index, [subscribee]} line above, doing the load, then using mnesia:add_table_index(subscriber, subscribee) afterwards to add the index once all the data was loaded. Note that mnesia was still building the primary index on the fly, but that can’t be helped.
Time taken: 883 secs (679s load + 204s index creation)
Number of “Mnesia overloaded” warnings: lots
Conclusion: Insignificant, meh
Attempt 3) mnesia:ets() trickery
This is slightly perverted, but I tried it because I was suspicious that incrementally updating the on-disk data wasn’t especially optimal. The idea is to make a ram_only table and use the mnesia:ets() function to write directly to the ets table (doesn’t get much faster than ets). The table can then be converted to disc_copies. There are caveats - to quote The Fine Manual:
Call the Fun in a raw context which is not protected by a transaction. The Mnesia function call is performed in the Fun are performed directly on the local ets tables on the assumption that the local storage type is ram_copies and the tables are not replicated to other nodes. Subscriptions are not triggered and checkpoints are not updated, but it is extremely fast.
I can live with that. I don’t mind if replication takes a while to setup when I put this into production - I’ll gladly take any optimisations I can get at this stage (testing/development).
Loading a list of subscriptions looks like this:
mnesia:ets(fun()-> [mnesia:dirty_write(S) || S <- Subs] end).
And to convert this into disc_copies once data is loaded in:
mnesia:change_table_copy_type(subscription, node(), disc_copies).
Time taken: 745 secs (699s load + 46s convert to disc_copies)
Number of “Mnesia overloaded” warnings: none!
Conclusion: Fastest yet, bit hacky
Summary
At least the ets() trick doesn’t spew a million warnings. I also need to examine the output of mnesia:dump_to_textfile and see if loading data from that format is any faster.
TODO:
Examine / test using the dum_to_textfile method
Run full transactional load and time it
Try similar thing with PostgreSQL
On bulk loading data into Mnesia
Consider this a work-in-progress; I will update this post if I find a ‘better’ way to do fast bulk loading
The time has come to replace my ets-based storage backend with something non-volatile. I considered a dets/ets hybrid, but I really need this to be replicated to at least a second node for HA / failover. Mnesia beckoned.
The problem:
15 million [fairly simple] records
1 Mnesia table: bag, disc_copies, just 1 node, 1 additional index
Hardware is a quad-core 2GHz CPU, 16GB Ram, 8x 74Gig 15k rpm scsi disks in RAID-6
Takes ages* to load and spews a load of “Mnesia is overloaded” warnings
* My definition of ‘takes ages’: Much longer than PostgreSQL \copy or MySQL LOAD DATA INFILE
At this point all I want is a quick way to bulk-load some data into a disc_copies table on a single node, so I can get on with running some tests.
Here is the table creation code:
mnesia:create_table(subscription,
[
{disc_copies, [node()]},
{attributes, record_info(fields, subscription)},
{index, [subscribee]}, %index subscribee too
{type, bag}
]
)
The subscription record is fairly simple:
{subscription, subscriber={resource, user, 123}, subscribee={resource, artist, 456}}
I’m starting erlang like so:
erl +A 128 -mnesia dir '"/home/erlang/mnesia_dir"' -boot start_sasl
The interesting thing there is really the +A 128 - this spreads the cpu load better between the 4 cores.
Attempt 0) ‘by the book’ one transaction to rule them all
Something like this:
mnesia:transaction(fun()-> [ mnesia:write(S) || S <- Subs ] end)
Time taken: Too long, I gave up after 12 hours
Number of “Mnesia overloaded” warnings: lots
Conclusion: Must be a better way
TODO: actually run this test and time it.
Attempt 1) dirty_write
There isn’t really any need to do this in a transaction, so I tried dirty_write.
[ mnesia:dirty_write(S) || S <- Subs ]
And here’s the warning in full:
=ERROR REPORT==== 13-Oct-2008::16:53:57 ===
Mnesia('mynode@myhost'): ** WARNING ** Mnesia is overloaded: {dump_log,
write_threshold}
Time taken: 890 secs
Number of “Mnesia overloaded” warnings: lots
Conclusion: Workable, but nothing to boast about. Those warnings are annoying
Attempt 2) dirty_write, defer index creation
A common trick with traditional RDBMS would be to bulk load the data into the table and add the indexes afterwards. In some scenarios you can avoid costly incremental index update operations. If you are doing this in one gigantic transaction it shouldn’t matter, and I’m not really sure how mnesia works under the hood (something I plan to rectify if I end up using it for real).
I tried a similar approach by commenting out the {index, [subscribee]} line above, doing the load, then using mnesia:add_table_index(subscriber, subscribee) afterwards to add the index once all the data was loaded. Note that mnesia was still building the primary index on the fly, but that can’t be helped.
Time taken: 883 secs (679s load + 204s index creation)
Number of “Mnesia overloaded” warnings: lots
Conclusion: Insignificant, meh
Attempt 3) mnesia:ets() trickery
This is slightly perverted, but I tried it because I was suspicious that incrementally updating the on-disk data wasn’t especially optimal. The idea is to make a ram_only table and use the mnesia:ets() function to write directly to the ets table (doesn’t get much faster than ets). The table can then be converted to disc_copies. There are caveats - to quote The Fine Manual:
Call the Fun in a raw context which is not protected by a transaction. The Mnesia function call is performed in the Fun are performed directly on the local ets tables on the assumption that the local storage type is ram_copies and the tables are not replicated to other nodes. Subscriptions are not triggered and checkpoints are not updated, but it is extremely fast.
I can live with that. I don’t mind if replication takes a while to setup when I put this into production - I’ll gladly take any optimisations I can get at this stage (testing/development).
Loading a list of subscriptions looks like this:
mnesia:ets(fun()-> [mnesia:dirty_write(S) || S <- Subs] end).
And to convert this into disc_copies once data is loaded in:
mnesia:change_table_copy_type(subscription, node(), disc_copies).
Time taken: 745 secs (699s load + 46s convert to disc_copies)
Number of “Mnesia overloaded” warnings: none!
Conclusion: Fastest yet, bit hacky
Summary
At least the ets() trick doesn’t spew a million warnings. I also need to examine the output of mnesia:dump_to_textfile and see if loading data from that format is any faster.
TODO:
Examine / test using the dum_to_textfile method
Run full transactional load and time it
Try similar thing with PostgreSQL
发表评论
-
OTP R14A今天发布了
2010-06-17 14:36 2677以下是这次发布的亮点,没有太大的性能改进, 主要是修理了很多B ... -
R14A实现了EEP31,添加了binary模块
2010-05-21 15:15 3030Erlang的binary数据结构非常强大,而且偏向底层,在作 ... -
如何查看节点的可用句柄数目和已用句柄数
2010-04-08 03:31 4814很多同学在使用erlang的过程中, 碰到了很奇怪的问题, 后 ... -
获取Erlang系统信息的代码片段
2010-04-06 21:49 3475从lib/megaco/src/tcp/megaco_tcp_ ... -
iolist跟list有什么区别?
2010-04-06 20:30 6529看到erlang-china.org上有个 ... -
erlang:send_after和erlang:start_timer的使用解释
2010-04-06 18:31 8386前段时间arksea 同学提出这个问题, 因为文档里面写的很不 ... -
Latest news from the Erlang/OTP team at Ericsson 2010
2010-04-05 19:23 2013参考Talk http://www.erlang-factor ... -
对try 异常 运行的疑问,为什么出现两种结果
2010-04-05 19:22 2842郎咸武<langxianzhe@163.com> ... -
Erlang ERTS Async基础设施
2010-03-19 00:03 2517其实Erts的Async做的很不错的, 相当的完备, 性能又高 ... -
CloudI 0.0.9 Released, A Cloud as an Interface
2010-03-09 22:32 2476基于Erlang的云平台 看了下代码 质量还是不错的 完成了不 ... -
Memory matters - even in Erlang (再次说明了了解内存如何工作的必要性)
2010-03-09 20:26 3439原文地址:http://www.lshift.net/blog ... -
Some simple examples of using Erlang’s XPath implementation
2010-03-08 23:30 2050原文地址 http://www.lshift.net/blog ... -
lcnt 环境搭建
2010-02-26 16:19 2614抄书:otp_doc_html_R13B04/lib/tool ... -
Erlang强大的代码重构工具 tidier
2010-02-25 16:22 2486Jan 29, 2010 We are very happy ... -
[Feb 24 2010] Erlang/OTP R13B04 has been released
2010-02-25 00:31 1387Erlang/OTP R13B04 has been rele ... -
R13B04 Installation
2010-01-28 10:28 1390R13B04后erlang的源码编译为了考虑移植性,就改变了编 ... -
Running tests
2010-01-19 14:51 1486R13B03以后 OTP的模块加入了大量的测试模块,这些模块都 ... -
R13B04在细化Binary heap
2010-01-14 15:11 1508从github otp的更新日志可以清楚的看到otp R13B ... -
R13B03 binary vheap有助减少binary内存压力
2009-11-29 16:07 1668R13B03 binary vheap有助减少binary内存 ... -
erl_nif 扩展erlang的另外一种方法
2009-11-26 01:02 3218我们知道扩展erl有2种方法, driver和port. 这2 ...
相关推荐
• Transactions and Other Access Contexts describes the transactions properties that make Mnesia into a fault tolerant, real-time distributed database management system. This section also describes ...
《Mnesia用户手册》是专为理解和操作Erlang编程语言中的Mnesia数据库管理系统而编写的详尽指南。Mnesia是Erlang OTP (Open Telephony Platform) 库中的一个核心组件,它是一个强大的分布式数据库系统,特别适用于...
1.2.Mnesia.数据库管理系统(DBMS 2、开始.Mnesia 2.1.首次启动.Mnesia 2.2.一个示例 3、构建.Mnesia.数据库 3.1.定义模式 3.2.数据模型 3.3.启动.Mnesia 3.4.创建新表 4、事务和其他上下文存取 ...
- **数据模型**:Mnesia 支持两种主要的数据模型:*正交表(Table)* 和 *活动数据表(Active Data Table, ADT)*。正交表用于存储静态数据,而ADT则用于动态数据,支持实时更新和查询。 - **启动 Mnesia**:启动...
### Mnesia数据库:Erlang中的分布式数据库管理系统 #### 引言 Mnesia,作为Erlang编程语言的一部分,是一款由爱立信公司开发的分布式数据库管理系统(DBMS)。自1997年以来,Mnesia一直是Open Telecom Platform...
Mnesia 是一个分布式数据库管理系统,它是 Erlang 语言环境的一部分,专门设计用于在分布式系统中存储和查询数据。随着业务需求的增长,单个 Mnesia 表的大小和性能可能会成为瓶颈。为了解决这个问题,Mnesia 提供了...
Mnesia是一个分布式数据库管理系统(DBMS),适合于电信和其它需要持续运行和具备软实时特性的Erlang应用。 目 录 1 、介绍 . . .. . .. . . .. . 4 1.1 关于 Mnesia . . .. . .. . . .. . 4 1.2 Mnesia ...
Mnesia是一个分布式数据库管理系统,特别适用于需要持续运行和具备软实时特性的Erlang应用,如电信系统。它作为开放式电信平台(OTP)的一部分,是用Erlang编程语言实现的。Mnesia的设计初衷是为了解决电信应用中的...
“mnesia_pg:Postgres后端通过mnesia_ext到Mnesia” 这个标题揭示了一个项目,它的目标是将PostgreSQL数据库作为Erlang的Mnesia分布式数据库系统的一个后端。Mnesia_ext是Mnesia的一个扩展,它允许添加自定义的数据...
Mnesia是一个分布式数据库管理系统,适合于电信和其它需要持续运行和具备软实时特性的Erlang应用,越来越受关注和使用,但是目前Mnesia资料却不多,很多都只有官方的用户指南。下面的内容将着重说明 如何做 Mnesia ...
### Mnesia概述与关键知识点 #### 一、Mnesia数据库管理系统简介 Mnesia是一个专为电信应用设计的分布式、容错数据库管理系统(DBMS)。它由爱立信公司的计算机科学实验室开发,旨在解决传统商用数据库管理系统...
**Mnesia:分布式数据库的简单而强大的接口** Mnesia是一个高度可扩展的分布式数据库管理系统,主要为实时系统设计,尤其适用于Erlang和Elixir编程语言。它作为OTP(Open Telephony Platform)的一部分,提供了对高...