Concurrency Programming 相關報告

lukeshei

浏览: 394221 次
性别:
来自: 台北

最近访客更多访客>>

xbl001529

huomiam

songabcd1234

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Concurrency-Programming erlang

Erlang 网络应用多线程 C++C

一. 我會接觸Erlang的緣由
1.RFID Middleware

2.jabber (xml::stream http://zh.wikipedia.org/wiki/Jabber)

3.ejabber (http://www.process-one.net/en/ )

二. 現在的商業環境(web server)所面臨的問題
1.連線的數量不斷的攀升

2.連線的時間很長
傳統上httpd 使用Prefork的方式來解決,短時間時密集連線的問題,在現在的環境愈到了嚴重的挑戰,比如: HTTP_Streaming、Server Push、COMET 這些需要長時間連線的架構,使得httpd 能夠服務的連線變少了,而fork process 最大的問題是,他所需要佔用記憶體的空間過於龐大,於是其他的伺服器架構崛起(lighthttpd ghttpd …)

The C10K problem( http://www.kegel.com/c10k.html )
It's time for web servers to handle ten thousand clients simultaneously, don't you think? After all, the web is a big place now.
And computers are big, too. You can buy a 1000MHz machine with 2 gigabytes of RAM and an 1000Mbit/sec Ethernet card for $1200 or so. Let's see - at 20000 clients, that's 50KHz, 100Kbytes, and 50Kbits/sec per client. It shouldn't take any more horsepower than that to take four kilobytes from the disk and send them to the network once a second for each of twenty thousand clients. (That works out to $0.08 per client, by the way. Those $100/client licensing fees some operating systems charge are starting to look a little heavy!) So hardware is no longer the bottleneck???

三. Concurrency Programming
1. fork
原始的程式
(程式+資料) --fork(複製一份)(程式+資料)

當程式fork 後,child 繼承原來的資料,此後彼此不相關,如果要傳遞資訊,需要使用pipe sharememory 或是 unix socket 來做資料交換

2. thread
事實上在Linux 系統下，執行緒只是一個light weight process：Linux 核心是以fork() system call 來產生一個新的行程（process），而執行緒是以clone() system call 產生的。fork()和clone()的差別只是在clone()可以指定和父行程共用的資源有哪些，當所有資源都和父行程共用時就相當於一個執行緒了。因為Thread 的使用會讓子父行程共用資源,因此非常容易引發dead lock / race condition …這類的問題

3. lightweight Threads ( http://www.defmacro.org/ramblings/concurrency.html)
Erlang process 是一個輕量級的Thread,因此他可以非常輕易的去開啟或是結束且快速在彼此做切換,因為掀開他的底層,他只是一個簡單的function罷了,process節省了大量的context switching浪費僅在一些function上做切換的動作(Erlang 的Thread 是 vm level thread)

這份文件簡單的提到了Erlang的概觀
http://mirror.linux.org.au/pub/linux.conf.au/2007/video/talks
/252.pdf

四. Erlang ( http://www.erlang.org/ )
1.以下是 about Erlang 對他自己的簡述

Erlang is a programming language which has many features more commonly associated with an operating system than with a programming language: concurrent processes, scheduling, memory management, distribution, networking, etc.
The initial open-source Erlang release contains the implementation of Erlang, as well as a large part of Ericsson's middleware for building distributed high-availability systems.
Erlang is characterized by the following features:
Concurrency - Erlang has extremely lightweight processes whose memory requirements can vary dynamically. Processes have no shared memory and communicate by asynchronous message passing. Erlang supports applications with very large numbers of concurrent processes. No requirements for concurrency are placed on the host operating system.
Distribution - Erlang is designed to be run in a distributed environment. An Erlang virtual machine is called an Erlang node. A distributed Erlang system is a network of Erlang nodes (typically one per processor). An Erlang node can create parallel processes running on other nodes, which perhaps use other operating systems. Processes residing on different nodes communicate in exactly the same was as processes residing on the same node.
Soft real-time - Erlang supports programming "soft" real-time systems, which require response times in the order of milliseconds. Long garbage collection delays in such systems are unacceptable, so Erlang uses incremental garbage collection techniques.
Hot code upgrade - Many systems cannot be stopped for software maintenance. Erlang allows program code to be changed in a running system. Old code can be phased out and replaced by new code. During the transition, both old code and new code can coexist. It is thus possible to install bug fixes and upgrades in a running system without disturbing its operation.
Incremental code loading - Users can control in detail how code is loaded. In embedded systems, all code is usually loaded at boot time. In development systems, code is loaded when it is needed, even when the system is running. If testing uncovers bugs, only the buggy code need be replaced.
External interfaces - Erlang processes communicate with the outside world using the same message passing mechanism as used between Erlang processes. This mechanism is used for communication with the host operating system and for interaction with programs written in other languages. If required for reasons of efficiency, a special version of this concept allows e.g. C programs to be directly linked into the Erlang runtime system.

2.Erlang 語言上的概觀
書籍: ( http://pragmaticprogrammer.com/titles/jaerlang/index.html )

[ Sequential Erlang ]

Exam1:

Consider the factorial function N! defined by:
N!=N*(N-1) when N>0
N!=1 when N=0

-module(math1).
-export([fac/1]).

fac(N) when N > 0 -> N * fac(N-1);
fac(0)-> 1.

Exam2:

-module(math2).
-export([sum1/1, sum2/1]).

sum1([H | T]) -> H + sum1(T);
sum1([]) -> 0.

sum2(L) -> sum2(L, 0).
sum2([], N) -> N;
sum2([H | T], N) -> sum2(T, H+N).

[ Concurrency Programming ]

Exam3:

-module(concurrency).

-export([start/0, say /2]).

say (What, 0) ->
done;
say (What, Times) ->
io:format("~p~n", [What]),
say_something(What, Times - 1).

start() ->
spawn(tut14, say, [hello, 3]),
spawn(tut14, say, [goodbye, 3]).

Exam4:

-module(area_server).
-export([loop/0]).

loop() ->
receive
{rectangle, Width, Ht} ->
io:format("Area of rectangle is ~p~n",[Width * Ht]),
loop();
{circle, R} ->
io:format("Area of circle is ~p~n", [3.14159 * R * R]),
loop();
Other ->
io:format("I don't know what the area of a ~p is ~n",[Other]),
loop()
end.

We can create a process which evaluates loop/0 in the shell:

Pid = spawn(area_server,loop,[]).
Pid ! {rectangle, 6, 10}.
Pid ! {circle, 23}.
Pid ! {triangle,2,4,5}.

4. Erlang –style process or event-based model for actors ( http://lambda-the-ultimate.org/node/1615 )
( http://lamp.epfl.ch/~phaller/doc/haller07coord.pdf )

Message passing
Each process has its own input queue for messages it receives. New messages received are put at the end of the queue. When a process executes a receive, the first message in the queue is matched against the first pattern in the receive, if this matches, the message is removed from the queue and the actions corresponding to the the pattern are executed.
However, if the first pattern does not match, the second pattern is tested, if this matches the message is removed from the queue and the actions corresponding to the second pattern are executed. If the second pattern does not match the third is tried and so on until there are no more pattern to test. If there are no more patterns to test, the first message is kept in the queue and we try the second message instead. If this matches any pattern, the appropriate actions are executed and the second message is removed from the queue (keeping the first message and any other messages in the queue). If the second message does not match we try the third message and so on until we reach the end of the queue. If we reach the end of the queue, the process blocks (stops execution) and waits until a new message is received and this procedure is repeated.
Of course the Erlang implementation is "clever" and minimizes the number of times each message is tested against the patterns in each receive.
五. Erlang相關資源
Website:
Open Source Erlang
http://www.erlang.org
http://www.process-one.net/en/projects/

Mail List:
Erlang-questions -- Erlang/OTP discussions
http://www.erlang.org/mailman/listinfo/erlang-questions

BOOK:
Concurrent programming in Erlang
http://www.erlang.org/download/erlang-book-part1.pdf
Programming Erlang Software for a Concurrent World
http://pragmaticprogrammer.com/titles/jaerlang/index.html

MY BLOG: http://rd-program.blogspot.com

分享到：

抱歉!我沒有惡意,不過這樣做可能會導致部 ... | NEC RFID應用的核心

2007-04-28 07:07
浏览 14928
评论(46)
论坛回复 / 浏览 (46 / 35035)
分类:编程语言
查看更多

46 楼 Trustno1 2007-07-27

continuation,实则上是一种函数调用方式或者说代码风格
g(result) { }
f(....,g)
{ .....
result=.....;
g(result);
}
这是一个最简单的continuation style.
call/cc只是实现continuation style的一种语法糖.

ucontext / fiber / callcc / yield(python) 都是可以用于实现coroutine的某种技术.

erlang和coroutine是不同的,coroutine是由代码决定何时进行调度.erlang的process则由erlang scheduler来实现分时调度.

subroutine 实则上就是function，只是一个不返回result的函数,在pascal中叫做procedure.

45 楼 pi1ot 2007-07-27

qiezi 写道： 
<div class='quote_div'>

Trustno1 写道

Sorry API 记错了应该是ucontext.h下面的makecontext, swapcontext.等几个函数.

多谢! 你总是能成为别人的指路明灯<img src='/javascripts/fckeditor/editor/images/smiley/msn/thumbs_up.gif' alt=''/> 
 
ucontext / fiber / callcc / yield(python) 和erlang的进程调度是不是应该算是轻量级(用户级)线程了？ 
能不能再解释下continuation / coroutine / subroutine 是否描述的是同一个东西？这些词很是绕人呢。。</div>
<div class='quote_div'/>
<div class='quote_div'>同绕。</div>

44 楼 qiezi 2007-07-27

Trustno1 写道

Sorry API 记错了应该是ucontext.h下面的makecontext, swapcontext.等几个函数.

43 楼 Trustno1 2007-07-27

Sorry API 记错了应该是ucontext.h下面的makecontext, swapcontext.等几个函数.

42 楼 qiezi 2007-07-27

Trustno1 写道

可以通过windows下的fiber或者linux下的ptx_switch_context来做coroutine.

这个ptx_switch_context哪里有资料？完全搜不到呢，搜到的结果都指向这个帖子。。

41 楼 mryufeng 2007-07-21

erlang的IO效率是非常高的是一个用到了epoll writev readv 还有连delay send都考虑了的系统。消息的派遣都用了async thread pool，smp的支持贯穿整个emulator的实现,试问几个C++框架作了就我所知道没有。文件IO的读写 erlang库里面都专门开了一个process来作. 还要一个选项 +c: Disable compensation for sudden changes of system time 对于一个时间驱动的网络程序系统时间的突然改变对程序的冲击就不形容了。做这么细的系统，也只有产品级的系统才拥有的。

40 楼 dcaoyuan 2007-06-11

Erlang因为是函数式编程语言，基本上变量们的值是不变的（只允许绑定一次），这样一来Erlang的share nothing的轻量级进程实际上是除可变数据外share EVERYTHING（包括code）;

相较Ruby/Rails的share nothing，Ruby/Rails真的是share NOTHING，也就是说如果多核CPU下采用每个CPU起一个Rails实例的方法，每个实例就要重新装载、编译一套代码，内存消耗会大得多。

我的看法是如果想用每个核一个Rails实例的方法来简单解决Rails今后在多核CPU下的并行问题是有问题的。

至于Erlang，自OTP-11B加入SMP支持后，多核、单核你都不用自己操心，一切在OTP后台处理好了，而且，很简单就处理好了。

39 楼 pufan 2007-06-10

引用

C++线程数多了效率下降得比较厉害，每连接一线程方式比较容易编写，也能很方便地处理一些超时。erlang在4CPU上用-smp启动，只会启动4个线程，这当然是最优化的方式了。通过优化减少线程数量，C++肯定能占上风，不过我总是会有不同的需求，erlang可以用统一的风格来处理，C++就不得不费尽脑汁去实现了，往往耗费苦心所写的代码性能高不了太多，代码量开发周期肯定就上去了

比较关心erlang的线程调度效率，自己管理也得保存上下文吧，估计肯定比操作系统调度效率低。 
如果这样，在并发性能上erlang应该低于java(NIO,native thread)，那使用erlang的唯一理由就在它的天生集群支持能力了，如何实现的谁来给扫扫盲。

38 楼 Elminster 2007-05-28

ACE 这类东西，我感觉主要的价值在于拿来解决跨平台的问题，它的那几个 reactive/proactive 的框架，用起来并不舒服。
在处理并发 IO 这一块，我现在见过比较好的解决方案还是 windows 下面的 IO Completion Port + Fiber 的方式，可以接近做到在利用异步 IO 提高效率的同时，允许书写上层代码的程序员将 IO 视为一个同步的操作。这个已经相当理想了，主要问题还是在于 1.工作量，2.可移植性，3.无法自然扩展到多台机器构成的集群上去。

37 楼 potian 2007-05-28

我的习惯是，首先采用reverse，基本上不考虑flatten，只有不循环的时候才采用++

36 楼 dcaoyuan 2007-05-28

dcaoyuan 写道： 
<div class='quote_div'>

AvinDev 写道

比如要测试多个字符合并为一个列表，用 [$a] + [$b] + ... 的效率就比 lists:reverse([$z[$y[...[$a]...]]) 差

lists:flatten([$a, $b, "A String", $z]) 
 
 
avindev 做了测试，结果是flatten的效率比++差，看来我主观了，因为ejabbed里喜欢用flatten。连接： 
http://avindev.iteye.com/blog/82560 
</div>
 
 
最近用Erlang多了，对list的了解也好了些，在这里更正一下有关字符合并为list的描述： 
 
1、lists:reverse([A|Acc])效率最高，但用于合并比较多的字符不够直观； 
2、lists:flatten(DeepList)应该用于展平depth > 1的list 
3、对于depth = 1的list，展平应该用lists:append，效率比lists:flatten好； 
4 、lists:append与++是等价的。 
 
这些在Efficiency Guide和lists的文档中有讲。

35 楼 pi1ot 2007-05-27

qiezi 写道

C++写多线程，为了减少线程数量就得大量使用异步方式，这个也很难写，算法就更难分片了，特别是要平均分配到多个CPU上，最后还要把结果汇总。erlang不知道有没有这方面的优势，估计还是免不了要自己spawn，看那个pmap的实现应该是可以方便地做任务分割再结果汇总了。上次看程序员杂志有一篇介绍，erlang在发送消息时会把process调度到同一个线程里，不知道发完了还会不会放回去，不会造成一轮消息过后全跑到一个线程上去了吧？

这样的细节应该是应用自己来取舍和解决的。

34 楼 qiezi 2007-05-27

C++写多线程，为了减少线程数量就得大量使用异步方式，这个也很难写，算法就更难分片了，特别是要平均分配到多个CPU上，最后还要把结果汇总。erlang不知道有没有这方面的优势，估计还是免不了要自己spawn，看那个pmap的实现应该是可以方便地做任务分割再结果汇总了。

上次看程序员杂志有一篇介绍，erlang在发送消息时会把process调度到同一个线程里，不知道发完了还会不会放回去，不会造成一轮消息过后全跑到一个线程上去了吧？

33 楼 Trustno1 2007-05-26

AvinDev 写道

对于每个job自己的一个队列这种方式，我同事认为它存在线程切换开销以及锁的开销，调试也不方便。

一般来说,每个job自己一个队列的方式用linux自己写调度是一个比较好的办法可以做到没有context_switch没有lock开销,我前一个公司是做softswitch的,他们就是这么干的.当然这其实相当于自己写了一个erlang的调度器.

32 楼 Trustno1 2007-05-26

ACE的跨平台特性的问题另说,ACE对在降低网络应用复杂度上做的非常有限.类似ACE_TASK这样handle call back的方式,在处理简单的应用并没有太大的问题,但是一旦通信的逻辑变的复杂起来,本来非常清晰的同步通信交互流程就要手工的拆开,靠自动状态机来维护.这还不算,因为要保持通信点上的互相同步,几乎每一个状态上发送一个消息以后不仅仅要等待feedback消息同时还要用数个timeout维护出错的情况,这就相当于把工作量提高了一倍不止.有的时候你甚至还要为了防止消息到达先后不一致产生的逻辑错乱,而给每个状态上设定一个消息标号,那些消息是你想要的,那些消息是过期的.这就相当于一个状态上再维护若干个子状态.
这是其一,其二流程的拆分取决于每个状态响应的时间,如果一个状态下操作过长你就不得不把原来一个单一的逻辑拆成两个.而且这种分拆往往不是系统设计,编码时候会遇到的问题,而是往往在最后性能调优阶段,你总是会发现某一个状态在大并发量下响应缓慢而把消息队列给撑爆,这个时候你又要在这个状态上进行切分,这个工作量就会非常之大.
当然,可以通过windows下的fiber或者linux下的ptx_switch_context来做coroutine,但是复杂度降低的仍然很有限.

31 楼 AvinDev 2007-05-26

嗯，明白你的意思了。
btw，potian有空的话，不妨谈谈在公司项目中使用Erlang的一些经验和想法吧：）

30 楼 potian 2007-05-26

ACE我们公司也用了很久了
每一个任务并不一定是和一个线程对应，使用线程池或者单线程都可以，我指的是每一个任务相关的需要处理的数据的一个队列 
当你需要解藕，并且任务之间需要相互通讯、相互同步的时候，这些队列极有可能是需要的

29 楼 AvinDev 2007-05-26

我公司有个做了十年以上通信开发的工程师，现在正在使用ACE开发一个高性能的Proxy程序，使用单线程Event Driven的方式，对于逻辑相对简单，计算密集，特别是后端没有什么阻塞调用的应用，这样的模式是非常适合的。但是对于一些更为复杂的应用来说，特别是需要使用到多线程，这时候我个人更倾向于使用Erlang这种方案。

对于每个job自己的一个队列这种方式，我同事认为它存在线程切换开销以及锁的开销，调试也不方便。

28 楼 potian 2007-05-26

解耦当然可以做到，但是解耦需要很多工作（就算在erlang之前，我也选择了每个job自己的一个队列这种方式，就算不是这种做法，你也需要另外一种解偶的方式）。

这中间必然涉及到队列的维护和数据的拷贝，或者类似于erlang的大数据引用等等，而一旦这样做，你就需要考虑引用的计数维护（或者拷贝的算法），队列的同步维护等等一系列问题，这本来就是Erlang的强项，当然，你可以针对某一个应用进行特殊的处理，但是一般我们都会偏向于逐步抽象，形成一定积累的内部框间，我很怀疑绝大多数人能够处理得比Erlang更好。

复杂的网络应用会有很多做法，例如我们的流媒体除了支持TCP方式的“组播”外，由于客户端经常需要轮巡，也就是支持同一个连接的视频源发生变化，用process和message来构造这个模型，非常轻易地就实现了，并且效率很高

27 楼 Elminster 2007-05-25

potian 写道

我不认为复杂网络应用程序的性能，在并发量比较大的时候C++还能占有优势如果系统逻辑比较简单，例如连接到服务器的客户端互相之间关系不大的时候，那么可能C++网络会有优势。但是在复杂的网络应用程序中，网络处理的速度、逻辑的复杂性、同步处理都是影响到性能的重要原因。采用异步方式处理网络IO的程序处理复杂逻辑就非常困难，不但难以调试和扩展，而且本身就会造成性能下降。从处理并发的角度来看，消息处理可以大大提高系统的并发能力。另外，在并行能力的可伸缩性方面，Erlang更具有得天独厚的优势。

不赞同。
从处理 IO 的角度，直接调用操作系统核心提供的异步 API 是效率最高的方式（如果核心的实现不是太烂的话），做这个事情是 C/C++ 的专长。另一方面，数据 IO 和对数据的逻辑处理是可以解耦的，因此我不认为复杂的逻辑是个无法解决的问题。这个地方真正的问题是两个：一是工作量，要达到同样效果，C/C++ 的实现工作量会大不少；二是 C/C++ 的实现无法自然地扩展到集群上去。

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论