- 浏览: 982286 次
- 性别:
- 来自: 广州
最新评论
-
qingchuwudi:
有用,非常感谢!
erlang进程的优先级 -
zfjdiamond:
你好 这条命令 在那里输入??
你们有yum 我有LuaRocks -
simsunny22:
这个是在linux下运行的吧,在window下怎么运行escr ...
escript的高级特性 -
mozhenghua:
http://www.erlang.org/doc/apps/ ...
mnesia 分布协调的几个细节 -
fxltsbl:
A new record of 108000 HTTP req ...
Haproxy 1.4-dev2: barrier of 100k HTTP req/s crossed
不小心在 lib/mnesia/doc/misc/implementation.txt中找到的, 对于深入理解mnesia实现,阅读源码是很好的参考资料。
Mnesia
1 Introduction
This document aims to give a brief introduction of the implementation
of mnesia, it's data and functions.
Håkan has written other mnesia papers of interest, (see ~hakan/public_html/):
o Resource consumption (mnesia_consumption.txt)
o What to think about when changing mnesia (mnesia_upgrade_policy.txt)
o Mnesia internals course (mnesia_internals_slides.pdf)
o Mnesia overview (mnesia_overview.pdf)
1.1. Basic concepts
In a mnesia cluster all nodes are equal, there is no concept off
master or backup nodes. That said when mixing disc based (uses the
disc to store meta information) nodes and ram based (do not use disc
at all) nodes the disc based ones sometimes have precedence over ram
based nodes.
2 Meta Data
Mnesia has two types of global meta data, static and dynamic.
All the meta data is stored in the ets table mnesia_gvar.
2.1 Static Meta Data
The static data is the schema information, usually kept in
'schema.DAT' file, the data is created with
mnesia:create_schema(Nodes) for disc nodes (i.e. nodes which uses the
disc). Ram based mnesia nodes create an empty schema at startup.
The static data i.e. schema, contains information about which nodes
are involved in the cluster and which type (ram or disc) they have. It
also contains information about which tables exist on which node and
so on.
The schema information (static data) must always be the same on all
active nodes in the mnesia cluster. Schema information is updated via
schema functions, e.g. mnesia:add_table_copy/3,
mnesia:change_table_copy/3...
2.2 Dynamic Meta Data
The dynamic data is transient and is local to each mnesia node
in the cluster. Examples of dynamic data is: currently active mnesia
nodes, which tables are currently available and where are they
located. Dynamic data is updated internally by each mnesia during the
nodes lifetime, i.e. when nodes goes up and down or are added to or
deleted from the mnesia cluster.
3 Processes and Files
The most important processes in mnesia are mnesia_monitor,
mnesia_controller, mnesia_tm and mnesia_locker.
Mnesia_monitor acts as supervisor and monitors all resources. It
listens for nodeup and nodedown and keeps links to other mnesia nodes,
if a node goes down it forwards the information to all the necessary
processes, e.g. mnesia_controller, mnesia_locker, mnesia_tm and all
transactions. During start it negotiates the protocol version with
the other nodes and keep track of which nodes uses which version. The
monitor process also detects and warns about partioned networks, it is
then up to the user to deal with them. It is the owner of all open
files, ets tables and so on.
The mnesia_controller process is responsible for loading tables,
keeping the dynamic meta data updated, synchronize dangerous work such
as schema transactions vs dump log operation vs table loading/sending.
The last two processes are involved in all transactions, the
mnesia_locker process manages transaction locks, and mnesia_tm manages
all transaction work.
4 Startup and table Loading
The early startup is mostly driven by the mnesia_tm process/module,
logs are dumped (see log dumping), node-names of other nodes in the
cluster are retrieved from the static meta data or from environment
parameters and initial connections are made to the other mnesia
nodes.
The rest of start up is driven by the mnesia_controller process where
the schema (static meta data) is merged between each node, this is
done to keep the schema consistent between all nodes in the
cluster. When the schema is merged all local tables are put in a
loading queue, tables which are only available or have local content
is loaded directly from disc or created if they are type ram_copies.
The other tables are kept in the queue until mnesia decides whether to
load them from disk or from another node. If another mnesia node has
already loaded the table, i.e. got a copy in ram or an open dets file,
the table is always loaded from that node to keep the data consistent.
If no other node has a loaded copy of the table, some mnesia node has
to load it first, and the other nodes can copy the table from the
first node. Mnesia keeps information about when other nodes went down,
a starting mnesia will check which nodes have been down, if some of
the nodes have not been down the starting node will let those nodes
load the table first. If all other nodes have been down then the
starting mnesia will load the table. The node that is allowed to load
the table will load it and the other nodes will copy it from that node.
If a node, which the starter node has not a 'mnesia_down' note from,
is down the starter node will have to wait until that node comes up
and decision can be taken, this behavior can be overruled by user
settings. The order of table loading could be described as:
1. Mnesia downs, Normally decides from where mnesia should load tables.
2. Master nodes (overrides mnesia downs).
3. Force load (overrides Master nodes).
1) If possible, load table from active master nodes
2) if no master nodes is active load from any active nodes,
3) if no active node has an active table get local copy
(if ram create empty one)
Currently mnesia can handle one download and one upload at the same
time. Dumping and loading/sending may run simultaneously but neither
of them may run during schema commit. Loaders/senders may not start if
a schema commit is enqueued. That synchronization is made to prohibit
that the schema transaction modifies the meta data and the
prerequisites of the table loading changes.
The actual loading of a table is implemented in 'mnesia_loader.erl'.
It currently works as follows:
Receiver Sender
-------- ------
Spawned
Find sender node
Queue sender request ---->
Spawned
*)Spawn real receiver <---- Send init info
*)Grab schema lock for Grab write table lock
that table to avoid Subscribe receiver
deadlock with schema transactions to table updates
Create table (ets or dets) Release Lock
Get data (with ack ---->
as flow control) <---- Burst data to receiver
Send no_more
Apply subscription messages
Store copy on disc Grab read lock
Create index, snmp data Update meta data info
and checkpoints if needed cleanup
no_more ---->
Release lock
*) Don't spawn or grab schema lock if operation is add_table_copy,
it's already a schema operation.
5 Transaction
Transaction are normally driven from the client process, i.e. the
process that call 'mnesia:transaction'. The client first acquires a
globally unique transaction id (tid) and temporary transaction storage
(ts an ets table) from mnesia_tm and then executes the transaction
fun. Mnesia-api calls such as 'mnesia:write/1' and 'mnesia:read'
contains code for acquiring the needed locks. Intermediate database
states and acquired locks are kept in the transaction storage, and all
mnesia operations has to be "patched" against that store. I.e. a write
operation in a transaction should be seen within (and only within)
that transaction, if the same key is read after the write.
After the transaction fun is completed the ts is analyzed to see which
nodes are involved in the transaction, and what type of commit protocol
shall be used. Then the result is committed and additional work such as
snmp, checkpoints and index updates are performed. The transaction is
finish by releasing all resources.
An example:
Example = fun(X) ->
{table1, key, Value} = mnesia:read(table1, key),
ok = mnesia:write(table1, {table1, key, Value+X}),
{table1, key, Updated} = mnesia:read(table1, key),
Updated
end,
mnesia:transaction(Example, [10]).
A message overview of a simple successful asynchronous transaction
non local
Client Process mnesia_tm(local) mnesia_locker mnesia_tm
------------------------------------------------------------------------
Get tid ---->
<--- Tid and ts
Get read lock
from available node ------------------------------->
Value <----------Value or restart trans---
Patch value against ts
Get write lock
from all nodes ------------------------------->
------------------------------->
ok's <<---------ok's or restart trans---
write data in ts
Get read lock,already done.
Read data Value
'Patch' data with ts
Fun return Value+X.
If everything is ok
commit transaction
Find the nodes that the transaction
needs to be committed on and
collect every update from ts.
Ask for commit ----------->
----------------------------------------------->
Ok's <<--------- ------------------------------
Commit ----------------------------------------------->
log commit decision on disk
Commit locally
update snmp
update checkpoints
notify subscribers
update index
Release locks ------------------------------->
Release transaction ----->
Return trans result
----------------------
If all needed resources are available, i.e. the needed tables are
loaded somewhere in the cluster during the transaction, and the user
code doesn't crash, a transaction in mnesia won't fail. If something
happens in the mnesia cluster such as node down from the replica the
transaction was about to read from, or that a lock couldn't be
acquired and the transaction was not allowed to be queued on that
lock, the transaction is restarted, i.e. all resources are released
and the fun is called again. By default a transaction can be
restarted is infinity many times, but the user may choose to limit
the number of restarts.
The dirty operations don't do any of the above they just finds out
where to write the data, logs the operation to disk and casts (or call
in case of sync_dirty operation) the data to those nodes. Therefore
the dirty operations have the drawback that each write or delete sends
a message per operation to the involved nodes.
There is also a synchronous variant of 2-phase commit protocol which
waits on an additional ack message after the transaction is committed
on every node. The intention is to provide the user with a way to
solve overloading problems.
A 3-phase commit protocol is used for schema transaction or if the
transaction result is going to be committed in a asymmetrical way,
i.e. a transaction that writes to table a and b where table a and b
have replicas on different nodes. The outcome of the transactions are
stored temporary in an ets table and in the log file.
6 Schema transactions
Schema transactions are handled differently than ordinary
transactions, they are implemented in mnesia_schema (and in
mnesia_dumper). The schema operation is always spawned to protect from
that the client process dies during the transaction.
The actual transaction fun checks the pre-conditions and acquires the
needed locks and notes the operation in the transaction store. During
the commit, the schema transaction runs a schema prepare operation (on
every node) that does the needed prerequisite job. Then the operation
is logged to disc, and the actual commit work is done by dumping the
log. Every schema operation has special clause in mnesia_dumper to
handle the finishing work. Every schema prepare operation has a
matching undo_prepare operation which needs to be invoked if the
transaction is aborted.
7 Locks
"The locking algorithm is a traditional 'two-phase locking'* and the
deadlock prevention is 'wait-die'*, time stamps for the wait-die algorithm
is 'Lamport clock'* maintained by mnesia_tm. The Lamport clock is kept
when the transaction is restarted to avoid starving."
* References can be found in the paper mnesia_overview.pdf
Klacke, Håkan and Hans wrote about mnesia.
What the quote above means is that read locks are acquired on the
replica that mnesia read from, write locks are acquired on all nodes
which have a replica. Several read lock can lock the same object, but
write locks are exclusive. The transaction identifier (tid) is a ever
increasing system uniq counter which have the same sort order on every
node (a Lamport clock), which enables mnesia_locker to order the lock
requests. When a lock request arrives, mnesia_locker checks whether
the lock is available, if it is a 'granted' is sent back to the client
and the lock is noted as taken in an ets table. If the lock is already
occupied, it's tid is compared with tid of the transaction holding the
lock. If the tid of holding transaction is greater than the tid of
asking transaction it's allowed to be put in the lock queue (another
ets table) and no response is sent back until the lock is released, if
not the transaction will get a negative response and mnesia_tm will
restart the transaction after it has slept for a random time.
Sticky locks works almost as a write lock, the first time a sticky
lock is acquired a request is sent to all nodes. The lock is marked as
taken by the requesting node (not transaction), when the lock is later
released it's only released on the node that has the sticky lock,
thus the next time a transaction is requesting the lock it don't need
to ask the others nodes. If another node wants the lock it has to request
a lock release first, before it can acquire the lock.
8 Fragmented tables
Fragmented tables are used to split a large table in smaller parts.
It is implemented as a layer between the client and mnesia which
extends the meta data with additional properties and maps a {table,
key} tuple to a table_fragment.
The default mapping is erlang:phash() but the user may provide his own
mapping function to be able to predict which records is stored in
which table fragment, e.g. the client may want to steer where a
record generated from a certain device is placed.
The foreign key is used to co-locate other tables to the same node.
The other additinal table attributes are also used to distribute the
table fragments.
9 Log Dumping
All operations on disk tables are stored on a log 'LATEST.LOG' on
disk, so mnesia can redo the transactions if the node goes down.
Dumping the log means that mnesia moves the committed data from the
general log to the table specific disk storage. To avoid that the log
grows to large and uses a lot of disk space and makes the startup slow,
mnesia dumps the log during it's uptime. There are two triggers that
start the log dumping, timeouts and the number of commits since last
dump, both are user configurable.
Disc copies tables are implemented with two disk_log files, one
'table.DCD' (disc copies data) and one 'table.DCL' (disc copies log).
The dcd contains raw records, and the dcl contains operations on that
table, i.e. '{write, {table, key, value}}' or '{delete, {table,
key}}'. First time a record for a specific table is found when
dumping the table, the size of both the dcd and the dcl files are
checked. And if the sizeof(dcl)/sizeof(dcd) is greater than a
threshold, the current ram table is dumped to file 'table.DCD' and the
corresponding dcl file is deleted, and all other records in the
general log that belongs to that table are ignored. If the threshold
is not meet than the operations in the general log to that table are
appended to the dcl file. On start up both files are read, first the
contents of the dcd are loaded to an ets table, then it's modified by
the operations stored in the corresponding dcl file.
Disc only copies tables updates the 'dets' file directly when
committing the data so those entries can be ignored during normal log
dumping, they are only added to the 'dets' file during startup when
mnesia don't know the state of the disk table.
10 Checkpoints and backups
Checkpoints are created to be able to take snapshots of the database,
which is pretty good when you want consistent backups, i.e. you don't
want half of a transaction in the backup. The checkpoint creates a
shadow table (called retainer) for each table involved in the
checkpoint. When a checkpoint is requested it will not start until all
ongoing transactions are completed. The new transactions will update
both the real table and update the shadow table with operations to
undo the changes on the real table, when a key is modified the first
time. I.e. when write operation '{table, a, 14}' is made, the shadow
table is checked if key 'a' has a undo operation, if it has, nothing
more is done. If not a {write, {table, a, OLD_VALUE}} is added to the
shadow table if the real table had an old value, if not a {delete,
{table, a}} operation is added to the shadow table.
The backup is taken by copying every record in the real table and then
appending every operation in the shadow table to the backup, thus
undoing the changes that where made since the checkpoint where
started.
上面的几个材料 都能google的到的
Mnesia
1 Introduction
This document aims to give a brief introduction of the implementation
of mnesia, it's data and functions.
Håkan has written other mnesia papers of interest, (see ~hakan/public_html/):
o Resource consumption (mnesia_consumption.txt)
o What to think about when changing mnesia (mnesia_upgrade_policy.txt)
o Mnesia internals course (mnesia_internals_slides.pdf)
o Mnesia overview (mnesia_overview.pdf)
1.1. Basic concepts
In a mnesia cluster all nodes are equal, there is no concept off
master or backup nodes. That said when mixing disc based (uses the
disc to store meta information) nodes and ram based (do not use disc
at all) nodes the disc based ones sometimes have precedence over ram
based nodes.
2 Meta Data
Mnesia has two types of global meta data, static and dynamic.
All the meta data is stored in the ets table mnesia_gvar.
2.1 Static Meta Data
The static data is the schema information, usually kept in
'schema.DAT' file, the data is created with
mnesia:create_schema(Nodes) for disc nodes (i.e. nodes which uses the
disc). Ram based mnesia nodes create an empty schema at startup.
The static data i.e. schema, contains information about which nodes
are involved in the cluster and which type (ram or disc) they have. It
also contains information about which tables exist on which node and
so on.
The schema information (static data) must always be the same on all
active nodes in the mnesia cluster. Schema information is updated via
schema functions, e.g. mnesia:add_table_copy/3,
mnesia:change_table_copy/3...
2.2 Dynamic Meta Data
The dynamic data is transient and is local to each mnesia node
in the cluster. Examples of dynamic data is: currently active mnesia
nodes, which tables are currently available and where are they
located. Dynamic data is updated internally by each mnesia during the
nodes lifetime, i.e. when nodes goes up and down or are added to or
deleted from the mnesia cluster.
3 Processes and Files
The most important processes in mnesia are mnesia_monitor,
mnesia_controller, mnesia_tm and mnesia_locker.
Mnesia_monitor acts as supervisor and monitors all resources. It
listens for nodeup and nodedown and keeps links to other mnesia nodes,
if a node goes down it forwards the information to all the necessary
processes, e.g. mnesia_controller, mnesia_locker, mnesia_tm and all
transactions. During start it negotiates the protocol version with
the other nodes and keep track of which nodes uses which version. The
monitor process also detects and warns about partioned networks, it is
then up to the user to deal with them. It is the owner of all open
files, ets tables and so on.
The mnesia_controller process is responsible for loading tables,
keeping the dynamic meta data updated, synchronize dangerous work such
as schema transactions vs dump log operation vs table loading/sending.
The last two processes are involved in all transactions, the
mnesia_locker process manages transaction locks, and mnesia_tm manages
all transaction work.
4 Startup and table Loading
The early startup is mostly driven by the mnesia_tm process/module,
logs are dumped (see log dumping), node-names of other nodes in the
cluster are retrieved from the static meta data or from environment
parameters and initial connections are made to the other mnesia
nodes.
The rest of start up is driven by the mnesia_controller process where
the schema (static meta data) is merged between each node, this is
done to keep the schema consistent between all nodes in the
cluster. When the schema is merged all local tables are put in a
loading queue, tables which are only available or have local content
is loaded directly from disc or created if they are type ram_copies.
The other tables are kept in the queue until mnesia decides whether to
load them from disk or from another node. If another mnesia node has
already loaded the table, i.e. got a copy in ram or an open dets file,
the table is always loaded from that node to keep the data consistent.
If no other node has a loaded copy of the table, some mnesia node has
to load it first, and the other nodes can copy the table from the
first node. Mnesia keeps information about when other nodes went down,
a starting mnesia will check which nodes have been down, if some of
the nodes have not been down the starting node will let those nodes
load the table first. If all other nodes have been down then the
starting mnesia will load the table. The node that is allowed to load
the table will load it and the other nodes will copy it from that node.
If a node, which the starter node has not a 'mnesia_down' note from,
is down the starter node will have to wait until that node comes up
and decision can be taken, this behavior can be overruled by user
settings. The order of table loading could be described as:
1. Mnesia downs, Normally decides from where mnesia should load tables.
2. Master nodes (overrides mnesia downs).
3. Force load (overrides Master nodes).
1) If possible, load table from active master nodes
2) if no master nodes is active load from any active nodes,
3) if no active node has an active table get local copy
(if ram create empty one)
Currently mnesia can handle one download and one upload at the same
time. Dumping and loading/sending may run simultaneously but neither
of them may run during schema commit. Loaders/senders may not start if
a schema commit is enqueued. That synchronization is made to prohibit
that the schema transaction modifies the meta data and the
prerequisites of the table loading changes.
The actual loading of a table is implemented in 'mnesia_loader.erl'.
It currently works as follows:
Receiver Sender
-------- ------
Spawned
Find sender node
Queue sender request ---->
Spawned
*)Spawn real receiver <---- Send init info
*)Grab schema lock for Grab write table lock
that table to avoid Subscribe receiver
deadlock with schema transactions to table updates
Create table (ets or dets) Release Lock
Get data (with ack ---->
as flow control) <---- Burst data to receiver
Send no_more
Apply subscription messages
Store copy on disc Grab read lock
Create index, snmp data Update meta data info
and checkpoints if needed cleanup
no_more ---->
Release lock
*) Don't spawn or grab schema lock if operation is add_table_copy,
it's already a schema operation.
5 Transaction
Transaction are normally driven from the client process, i.e. the
process that call 'mnesia:transaction'. The client first acquires a
globally unique transaction id (tid) and temporary transaction storage
(ts an ets table) from mnesia_tm and then executes the transaction
fun. Mnesia-api calls such as 'mnesia:write/1' and 'mnesia:read'
contains code for acquiring the needed locks. Intermediate database
states and acquired locks are kept in the transaction storage, and all
mnesia operations has to be "patched" against that store. I.e. a write
operation in a transaction should be seen within (and only within)
that transaction, if the same key is read after the write.
After the transaction fun is completed the ts is analyzed to see which
nodes are involved in the transaction, and what type of commit protocol
shall be used. Then the result is committed and additional work such as
snmp, checkpoints and index updates are performed. The transaction is
finish by releasing all resources.
An example:
Example = fun(X) ->
{table1, key, Value} = mnesia:read(table1, key),
ok = mnesia:write(table1, {table1, key, Value+X}),
{table1, key, Updated} = mnesia:read(table1, key),
Updated
end,
mnesia:transaction(Example, [10]).
A message overview of a simple successful asynchronous transaction
non local
Client Process mnesia_tm(local) mnesia_locker mnesia_tm
------------------------------------------------------------------------
Get tid ---->
<--- Tid and ts
Get read lock
from available node ------------------------------->
Value <----------Value or restart trans---
Patch value against ts
Get write lock
from all nodes ------------------------------->
------------------------------->
ok's <<---------ok's or restart trans---
write data in ts
Get read lock,already done.
Read data Value
'Patch' data with ts
Fun return Value+X.
If everything is ok
commit transaction
Find the nodes that the transaction
needs to be committed on and
collect every update from ts.
Ask for commit ----------->
----------------------------------------------->
Ok's <<--------- ------------------------------
Commit ----------------------------------------------->
log commit decision on disk
Commit locally
update snmp
update checkpoints
notify subscribers
update index
Release locks ------------------------------->
Release transaction ----->
Return trans result
----------------------
If all needed resources are available, i.e. the needed tables are
loaded somewhere in the cluster during the transaction, and the user
code doesn't crash, a transaction in mnesia won't fail. If something
happens in the mnesia cluster such as node down from the replica the
transaction was about to read from, or that a lock couldn't be
acquired and the transaction was not allowed to be queued on that
lock, the transaction is restarted, i.e. all resources are released
and the fun is called again. By default a transaction can be
restarted is infinity many times, but the user may choose to limit
the number of restarts.
The dirty operations don't do any of the above they just finds out
where to write the data, logs the operation to disk and casts (or call
in case of sync_dirty operation) the data to those nodes. Therefore
the dirty operations have the drawback that each write or delete sends
a message per operation to the involved nodes.
There is also a synchronous variant of 2-phase commit protocol which
waits on an additional ack message after the transaction is committed
on every node. The intention is to provide the user with a way to
solve overloading problems.
A 3-phase commit protocol is used for schema transaction or if the
transaction result is going to be committed in a asymmetrical way,
i.e. a transaction that writes to table a and b where table a and b
have replicas on different nodes. The outcome of the transactions are
stored temporary in an ets table and in the log file.
6 Schema transactions
Schema transactions are handled differently than ordinary
transactions, they are implemented in mnesia_schema (and in
mnesia_dumper). The schema operation is always spawned to protect from
that the client process dies during the transaction.
The actual transaction fun checks the pre-conditions and acquires the
needed locks and notes the operation in the transaction store. During
the commit, the schema transaction runs a schema prepare operation (on
every node) that does the needed prerequisite job. Then the operation
is logged to disc, and the actual commit work is done by dumping the
log. Every schema operation has special clause in mnesia_dumper to
handle the finishing work. Every schema prepare operation has a
matching undo_prepare operation which needs to be invoked if the
transaction is aborted.
7 Locks
"The locking algorithm is a traditional 'two-phase locking'* and the
deadlock prevention is 'wait-die'*, time stamps for the wait-die algorithm
is 'Lamport clock'* maintained by mnesia_tm. The Lamport clock is kept
when the transaction is restarted to avoid starving."
* References can be found in the paper mnesia_overview.pdf
Klacke, Håkan and Hans wrote about mnesia.
What the quote above means is that read locks are acquired on the
replica that mnesia read from, write locks are acquired on all nodes
which have a replica. Several read lock can lock the same object, but
write locks are exclusive. The transaction identifier (tid) is a ever
increasing system uniq counter which have the same sort order on every
node (a Lamport clock), which enables mnesia_locker to order the lock
requests. When a lock request arrives, mnesia_locker checks whether
the lock is available, if it is a 'granted' is sent back to the client
and the lock is noted as taken in an ets table. If the lock is already
occupied, it's tid is compared with tid of the transaction holding the
lock. If the tid of holding transaction is greater than the tid of
asking transaction it's allowed to be put in the lock queue (another
ets table) and no response is sent back until the lock is released, if
not the transaction will get a negative response and mnesia_tm will
restart the transaction after it has slept for a random time.
Sticky locks works almost as a write lock, the first time a sticky
lock is acquired a request is sent to all nodes. The lock is marked as
taken by the requesting node (not transaction), when the lock is later
released it's only released on the node that has the sticky lock,
thus the next time a transaction is requesting the lock it don't need
to ask the others nodes. If another node wants the lock it has to request
a lock release first, before it can acquire the lock.
8 Fragmented tables
Fragmented tables are used to split a large table in smaller parts.
It is implemented as a layer between the client and mnesia which
extends the meta data with additional properties and maps a {table,
key} tuple to a table_fragment.
The default mapping is erlang:phash() but the user may provide his own
mapping function to be able to predict which records is stored in
which table fragment, e.g. the client may want to steer where a
record generated from a certain device is placed.
The foreign key is used to co-locate other tables to the same node.
The other additinal table attributes are also used to distribute the
table fragments.
9 Log Dumping
All operations on disk tables are stored on a log 'LATEST.LOG' on
disk, so mnesia can redo the transactions if the node goes down.
Dumping the log means that mnesia moves the committed data from the
general log to the table specific disk storage. To avoid that the log
grows to large and uses a lot of disk space and makes the startup slow,
mnesia dumps the log during it's uptime. There are two triggers that
start the log dumping, timeouts and the number of commits since last
dump, both are user configurable.
Disc copies tables are implemented with two disk_log files, one
'table.DCD' (disc copies data) and one 'table.DCL' (disc copies log).
The dcd contains raw records, and the dcl contains operations on that
table, i.e. '{write, {table, key, value}}' or '{delete, {table,
key}}'. First time a record for a specific table is found when
dumping the table, the size of both the dcd and the dcl files are
checked. And if the sizeof(dcl)/sizeof(dcd) is greater than a
threshold, the current ram table is dumped to file 'table.DCD' and the
corresponding dcl file is deleted, and all other records in the
general log that belongs to that table are ignored. If the threshold
is not meet than the operations in the general log to that table are
appended to the dcl file. On start up both files are read, first the
contents of the dcd are loaded to an ets table, then it's modified by
the operations stored in the corresponding dcl file.
Disc only copies tables updates the 'dets' file directly when
committing the data so those entries can be ignored during normal log
dumping, they are only added to the 'dets' file during startup when
mnesia don't know the state of the disk table.
10 Checkpoints and backups
Checkpoints are created to be able to take snapshots of the database,
which is pretty good when you want consistent backups, i.e. you don't
want half of a transaction in the backup. The checkpoint creates a
shadow table (called retainer) for each table involved in the
checkpoint. When a checkpoint is requested it will not start until all
ongoing transactions are completed. The new transactions will update
both the real table and update the shadow table with operations to
undo the changes on the real table, when a key is modified the first
time. I.e. when write operation '{table, a, 14}' is made, the shadow
table is checked if key 'a' has a undo operation, if it has, nothing
more is done. If not a {write, {table, a, OLD_VALUE}} is added to the
shadow table if the real table had an old value, if not a {delete,
{table, a}} operation is added to the shadow table.
The backup is taken by copying every record in the real table and then
appending every operation in the shadow table to the backup, thus
undoing the changes that where made since the checkpoint where
started.
评论
6 楼
mryufeng
2009-10-19
whrllm 写道
老大啥时候也把文中提到的这些文档也挖出来呀,呵呵 期待呀...
o Resource consumption (mnesia_consumption.txt)
o What to think about when changing mnesia (mnesia_upgrade_policy.txt)
o Mnesia internals course (mnesia_internals_slides.pdf)
o Mnesia overview (mnesia_overview.pdf)
o Resource consumption (mnesia_consumption.txt)
o What to think about when changing mnesia (mnesia_upgrade_policy.txt)
o Mnesia internals course (mnesia_internals_slides.pdf)
o Mnesia overview (mnesia_overview.pdf)
上面的几个材料 都能google的到的
5 楼
whrllm
2009-10-19
老大啥时候也把文中提到的这些文档也挖出来呀,呵呵 期待呀...
o Resource consumption (mnesia_consumption.txt)
o What to think about when changing mnesia (mnesia_upgrade_policy.txt)
o Mnesia internals course (mnesia_internals_slides.pdf)
o Mnesia overview (mnesia_overview.pdf)
o Resource consumption (mnesia_consumption.txt)
o What to think about when changing mnesia (mnesia_upgrade_policy.txt)
o Mnesia internals course (mnesia_internals_slides.pdf)
o Mnesia overview (mnesia_overview.pdf)
4 楼
litaocheng
2009-09-23
真是细致入微啊...
赞
赞
3 楼
mryufeng
2009-09-16
R13B02对mnesia的梳理很大 觉得是时候把这个东西做的更好了。
2 楼
bachmozart
2009-09-16
期待老大以后每天都不小心一下
1 楼
dennis_zane
2009-09-16
多谢老大分享。
发表评论
-
OTP R14A今天发布了
2010-06-17 14:36 2677以下是这次发布的亮点,没有太大的性能改进, 主要是修理了很多B ... -
R14A实现了EEP31,添加了binary模块
2010-05-21 15:15 3030Erlang的binary数据结构非常强大,而且偏向底层,在作 ... -
如何查看节点的可用句柄数目和已用句柄数
2010-04-08 03:31 4814很多同学在使用erlang的过程中, 碰到了很奇怪的问题, 后 ... -
获取Erlang系统信息的代码片段
2010-04-06 21:49 3475从lib/megaco/src/tcp/megaco_tcp_ ... -
iolist跟list有什么区别?
2010-04-06 20:30 6529看到erlang-china.org上有个 ... -
erlang:send_after和erlang:start_timer的使用解释
2010-04-06 18:31 8386前段时间arksea 同学提出这个问题, 因为文档里面写的很不 ... -
Latest news from the Erlang/OTP team at Ericsson 2010
2010-04-05 19:23 2013参考Talk http://www.erlang-factor ... -
对try 异常 运行的疑问,为什么出现两种结果
2010-04-05 19:22 2842郎咸武<langxianzhe@163.com> ... -
Erlang ERTS Async基础设施
2010-03-19 00:03 2517其实Erts的Async做的很不错的, 相当的完备, 性能又高 ... -
CloudI 0.0.9 Released, A Cloud as an Interface
2010-03-09 22:32 2476基于Erlang的云平台 看了下代码 质量还是不错的 完成了不 ... -
Memory matters - even in Erlang (再次说明了了解内存如何工作的必要性)
2010-03-09 20:26 3439原文地址:http://www.lshift.net/blog ... -
Some simple examples of using Erlang’s XPath implementation
2010-03-08 23:30 2050原文地址 http://www.lshift.net/blog ... -
lcnt 环境搭建
2010-02-26 16:19 2614抄书:otp_doc_html_R13B04/lib/tool ... -
Erlang强大的代码重构工具 tidier
2010-02-25 16:22 2486Jan 29, 2010 We are very happy ... -
[Feb 24 2010] Erlang/OTP R13B04 has been released
2010-02-25 00:31 1387Erlang/OTP R13B04 has been rele ... -
R13B04 Installation
2010-01-28 10:28 1390R13B04后erlang的源码编译为了考虑移植性,就改变了编 ... -
Running tests
2010-01-19 14:51 1486R13B03以后 OTP的模块加入了大量的测试模块,这些模块都 ... -
R13B04在细化Binary heap
2010-01-14 15:11 1508从github otp的更新日志可以清楚的看到otp R13B ... -
R13B03 binary vheap有助减少binary内存压力
2009-11-29 16:07 1668R13B03 binary vheap有助减少binary内存 ... -
erl_nif 扩展erlang的另外一种方法
2009-11-26 01:02 3218我们知道扩展erl有2种方法, driver和port. 这2 ...
相关推荐
session, specify a Mnesia database directory, initialize a database schema, start Mnesia, and create tables. Initial prototyping of record definitions is also discussed. • Build a Mnesia Database ...
Api-Social-Amnesia.zip,忘记过去。社交健忘症确保你的社交媒体帐户只显示你最近的历史,而不是5年前“那个阶段”的帖子。,一个api可以被认为是多个软件设备之间通信的指导手册。例如,api可用于web应用程序之间的...
B站视频地址: 做了文字校验,已经成功上线,有兴趣的小伙伴可以扫码体验:可以微信搜索:失忆备忘录一、失忆的由来之所以开发这款软件,是因为在那段时间事情很多,但是经常忘记。虽然市面上类似的功能很多,我之前...
AMNESIA是一个基于Erlang编程语言的开源库,专门设计用于简化与关系数据库管理系统(RDBMS)的交互。Erlang以其并发处理、容错性和高效性能在分布式系统领域备受推崇,而AMNESIA则进一步扩展了Erlang的功能,使...
语言:English (United States) 遗忘的延伸 Chrome失忆症是一个Chrome扩展程序,可让您有选择地不记得自己的任何浏览历史记录。...有关更多信息,请访问https://github.com/DanielBok/chrome-amnesia。
失忆症是一种提醒,允许您定义警报,贴纸(贴子)以提醒您一些重要的内容以及有关所需内容的注释。 可以将警报编程为在给定时间显示,可以在桌面上放置贴纸以随时查看。
《Mnesia用户手册》是专为理解和操作Erlang编程语言中的Mnesia数据库管理系统而编写的详尽指南。Mnesia是Erlang OTP (Open Telephony Platform) 库中的一个核心组件,它是一个强大的分布式数据库系统,特别适用于...
"Amnesia"是一个可能与计算机安全或数据丢失相关的主题,暗示了系统或用户可能遭遇了某种形式的记忆丧失,即数据无法访问或丢失的情况。在IT领域,这种情况通常涉及到磁盘故障、病毒攻击、误操作或者软件错误。"Post...
8.附录.A:Mnesia.错误信息 8.1.Mnesia.中的错误 9.附录.B:备份回调函数接口 9.1.Mnesia.备份回调行为 10.附录.C:作业存取回调接口 10.1.Mnnesia.存取回调行为 11.附录.D:分片表哈希回调接口 11.1....
### Mnesia数据库:Erlang中的分布式数据库管理系统 #### 引言 Mnesia,作为Erlang编程语言的一部分,是一款由爱立信公司开发的分布式数据库管理系统(DBMS)。自1997年以来,Mnesia一直是Open Telecom Platform...
"Amnesia_CCK"是一个基于开源系统开发的项目,它主要关注的是"失忆症"游戏的源代码,经过特定的调整和优化,以便在CCK(Content Creation Kit)平台上进行展示。CCK通常是一个工具集,允许用户创建、修改和扩展游戏...
在 Erlang 中实现 Linear Hashing 需要考虑几个方面。首先,Hash 表的效率与总元素数和 Bucket 数的比例(N/B)有关。比例越高,查找效率就越低。为了保持 Hash 表的性能,需要控制 Bucket 的数量。Bucket 数量可以...
8 附录 A : Mnesia 错误信息 . . .. . . 75 8.1 Mnesia 中的错误 . . . . .. . 75 9 附录 B :备份回调函数接口 . . .. . .. . . .. . 76 9.1 Mnesia 备份回调行为 . . .. . . . .. . 76 10 附录 C :作业存取...