`
mryufeng
  • 浏览: 982412 次
  • 性别: Icon_minigender_1
  • 来自: 广州
社区版块
存档分类
最新评论

How to Fix Erlang Crashes When Using Mnesia

阅读更多
原文地址: http://streamhacker.wordpress.com/2008/12/20/how-to-fix-erlang-out-of-memory-crashes-when-using-mnesia/
December 20, 2008 at 9:53 am (erlang) (database, dets, ets, iteration, memory, mnesia, records, transactions)
If you’re getting erlang out of memory crashes when using mnesia, chances are you’re doing it wrong, for various values of it. These out of memory crashes look something like this:

Crash dump was written to: erl_crash.dump
eheap_alloc: Cannot allocate 999999999 bytes of memory (of type "heap")
Possible Causes

You’re doing it wrong
Someone else is doing it wrong
You don’t have enough RAM
While it’s possible that the crash is due to not having enough RAM, or that some other program or process is using too much RAM for itself, chances are it’s your fault.

One of the reasons these crashes can catch you by surprise is that the erlang VM is using a lot more memory than you might think. Erlang is a functional language with single assignment and no shared memory. A major consequence is that when you change a variable or send a message to another process, a new copy of the variable is created. So an operation as simple as dict:update_counter(”foo”, 1, Dict1) consumes twice the memory of Dict1 since Dict1 is copied to create the return value. And anything you do with ets, dets, or mnesia will result in at least 2 copies of every term: 1 copy for your process, and 1 copy for each table. This is because mnesia uses ets and/or dets for storage, which both use 1 process per table. That means every table operation results in a message pass, sending your term to the table or vice-versa. So that’s why erlang may be running out of memory. Here’s how to fix it.

Use Dirty Operations

If you’re doing anything in a transaction, try to figure out how to do it dirty, or at least move as many operations as possible out of the transaction. Mnesia transactions are separate processes with their own temporary ets tables. That means there’s the original term(s) that must be passed in to the transaction or read from other tables, any updated copies that your code creates, copies of terms that are written to the temporary ets table, the final copies of terms that are written to actual table(s) at the end of the transaction, and copies of any terms that are returned from the transaction process. Here’s an example to illustrate:

example() ->
    T = function() ->
        Var1 = mnesia:read(example_table, "foo"),
        Var2 = update(Var2), % a user-defined function to update Var1
        ok = mnesia:write(Var2),
        Var2
    end,
    {atomic, Var2} = mnesia:transaction(T),
    Var2.
First off, we already have a copy of Var1 in example_table. It gets sent to the transaction process when you do mnesia:read, creating a second copy. Var1 is then updated, resulting in Var2, which I’ll assume has the same memory footprint of Var1. So now we have 3 copies. Var2 is then written to a temporary ets table because mnesia:write is called within a transaction, creating a fourth copy. The transaction ends, sending Var2 back to the original process, and also overwriting Var1 with Var2 in example_table. That’s 2 more copies, resulting in a total of 6 copies. Let’s compare that to a dirty operation.

example() ->
    Var1 = mnesia:dirty_read(example_table, "foo"),
    Var2 = update(Var1),
    ok = mnesia:dirty_write(Var2),
    Var2.
Doing it dirty results in only 4 copies: the original Var1 in example_table, the copy sent to your process, the updated Var2, and the copy sent to mnesia to be written. Dirty operations like this will generally have 2/3 the memory footprint of operations done in a transaction.

Reduce Record Size

Figuring out how to reduce your record size by using different data structures can create huge gains by drastically reducing the memory footprint of each operation, and possibly removing the need to use transaction. For example, let’s say you’re storing a large record in mnesia, and using transactions to update it. If the size of the record grows by 1 byte, then each transactional operation like the above will require an additional 5 bytes of memory, or dirty operations will require an additional 3 bytes. For multi-megabyte records, this adds up very quickly. The solution is to figure how to break that record up into many small records. Mnesia can use any term as a key, so for example, if you’re storing a record with a dict in mnesia such as {dict_record, “foo”, Dict}, you can split that up into many records like [{tuple_record, {"foo", Key1}, Val1}]. Each of these small records can be accessed independently, which could eliminate the need to use transactions, or at least drastically reduce the memory footprint of each transaction.

Iterate in Batches

Instead of getting a whole bunch of records from mnesia all at once, using mnesia:dirty_match_object or mnesia:dirty_select, iterate over the records in batches. This is analagous to using lists operations on mnesia tables. The match_object methods may return a huge number of records, and all those records have to be sent from the table process to your process, doubling the amount of memory required. By iteratively doing operations on batches of records, you’re only accessing a portion at a time, reducing the amount of memory being used at once. Here’s some code examples that only access 1 record at a time. Note that if the table changes during iteration, the behavior is undefined. You could also use the select operations to process records in batches of NObjects at a time.

Dirty Mnesia Foldl

dirty_foldl(F, Acc0, Table) ->
    dirty_foldl(F, Acc0, Table, mnesia:dirty_first(Table)).

dirty_foldl(_, Acc, _, '$end_of_table') ->
    Acc;
dirty_foldl(F, Acc, Table, Key) ->
    Acc2 = lists:foldl(F, Acc, mnesia:dirty_read(Table, Key)),
    dirty_foldl(F, Acc2, Table, mnesia:dirty_next(Table, Key)).
Dirty Mnesia Foreach

dirty_foreach(F, Table) ->
    dirty_foreach(F, Table, mnesia:dirty_first(Table)).

dirty_foreach(_, _, '$end_of_table') ->
    ok;
dirty_foreach(F, Table, Key) ->
    lists:foreach(F, mnesia:dirty_read(Table, Key)),
    dirty_foreach(F, Table, mnesia:dirty_next(Table, Key)).
Conclusion

It’s probably your fault
Do as little as possible inside transactions
Use dirty operations instead of transactions
Reduce record size
Iterate in small batches

Ulf said,
January 7, 2009 at 1:44 pm
Recommending dirty operations over transactions should come with a very big caveat: you change the semantics and forego safety, esp. if you have a replicated system. Dirty operations do not guarantee that replication works, for example. It may even work partially, given certain error situations, causing database inconsistency.

I normally advice people to use mnesia:activity(Type, F) rather than mnesia:transaction(F), and to always start with real transactions, then measure and - only if really necessary (and safe!), switch to dirty where neeeded. This can then be done by just changing Type from ‘transaction’ to ‘async_dirty’.

In my experience, the “iterate in small batches” should be one of the first points. It is very good advice. Also, monitor ets and mnesia tables to see if they keep growing. Inserting temporary objects and forgetting to delete them is a fairly common source of memory growth.

In other cases, a form of load control may well be what’s needed, making sure that the system doesn’t take on more work than it can handle (easy to do in an asynchronous environment). One very simple such device would be a gen_server that workers ask (synchronously) for permission before starting a new task. The server can monitor the ‘run_queue’ to guard against cpu overload, memory usage, number of running processes, etc., depending on where your bottlenecks are. Keep it very simple.


分享到:
评论

相关推荐

    erlang——Mnesia用户手册.pdf

    1.2.Mnesia.数据库管理系统(DBMS 2、开始.Mnesia 2.1.首次启动.Mnesia 2.2.一个示例 3、构建.Mnesia.数据库 3.1.定义模式 3.2.数据模型 3.3.启动.Mnesia 3.4.创建新表 4、事务和其他上下文存取 ...

    Mnesia User's Guide

    Examples are included how to start an Erlang session, specify a Mnesia database directory, initialize a database schema, start Mnesia, and create tables. Initial prototyping of record definitions is ...

    Erlang-game-server开发实践.pdf

    * 数据库层:使用Erlang的ETS、Mnesia、MySQL、MongoDB等数据库来存储游戏数据。 游戏服务器实现 在实现游戏服务器时,我们需要考虑以下几个方面: * 玩家和NPC的进程模型:使用Erlang的轻量级进程来模拟玩家和...

    mnesia数据库文档

    ### Mnesia数据库:Erlang中的分布式数据库管理系统 #### 引言 Mnesia,作为Erlang编程语言的一部分,是一款由爱立信公司开发的分布式数据库管理系统(DBMS)。自1997年以来,Mnesia一直是Open Telecom Platform...

    Mnesia用户手册.zip

    《Mnesia用户手册》是专为理解和操作Erlang编程语言中的Mnesia数据库管理系统而编写的详尽指南。Mnesia是Erlang OTP (Open Telephony Platform) 库中的一个核心组件,它是一个强大的分布式数据库系统,特别适用于...

    Introducing Erlang: Getting Started in Functional Programming

    In this updated second edition, author Simon St.Laurent shows you how to write simple Erlang programs by teaching you one skill at a time. You’ll learn about pattern matching, recursion, message ...

    erlang-mnesia-19.3.6.4-1.el7.x86_64.rpm

    erlang-mnesia-19.3.6.4-1.el7.x86_64.rpm

    Mnesia用户手册

    在Erlang编程语言中,Mnesia是一个分布式数据库管理系统,专为实时系统设计,具有高可用性和容错性。本手册旨在为开发者提供全面的Mnesia使用指南,帮助他们理解并有效地利用这个强大的工具。 **1. Mnesia概述** ...

    Mnesia Overview

    - **符号编程语言Erlang**:Mnesia采用Erlang作为其目标编程语言,Erlang是一种非常适合并发处理的语言,特别适合于构建容错系统。 #### 三、Mnesia在电信领域的应用 电信应用通常具有以下特点: 1. **高可用性...

    Mnesia table fragmentation 过程及算法分析

    Mnesia 是一个分布式数据库管理系统,它是 Erlang 语言环境的一部分,专门设计用于在分布式系统中存储和查询数据。随着业务需求的增长,单个 Mnesia 表的大小和性能可能会成为瓶颈。为了解决这个问题,Mnesia 提供了...

    erlang mnesia 数据库基本查询

    Mnesia是一个分布式数据库管理系统,适合于电信和其它需要持续运行和具备软实时特性的Erlang应用,越来越受关注和使用,但是目前Mnesia资料却不多,很多都只有官方的用户指南。下面的内容将着重说明 如何做 Mnesia ...

    Erlang - Structured Programming Using Processes

    ### Erlang - 结构化编程使用进程 #### 摘要与背景介绍 本文档探讨了如何在Erlang环境中应用结构化编程技术,并通过一个个人会计软件的应用案例来展示进程作为设计构建的重要作用。作者Jay Nelson从多个角度讨论了...

    Erlang6大数据存储方式总结

    在Erlang中,处理大数据存储有多种方式,其中包括ETS(Erlang Term Storage)、DETS(Distributed Erlang Term Storage)、Mnesia以及MySQL等。下面将对这些存储方式进行详细解析。 1. ETS(Erlang Term Storage) ...

    erlang高性能集群

    erlang的高性能集群服务器,erlang解决方案。 供大家学习使用

    Mnesia用户手册(docx版)

    Mnesia 是一个强大的分布式数据库管理系统(DBMS),专门为Erlang编程语言设计,特别适用于需要高可用性、持续运行和软实时特性的电信和其他关键业务应用。这个系统允许在多个节点间同步数据,提供了一种在分布式...

Global site tag (gtag.js) - Google Analytics