`
dcaoyuan
  • 浏览: 307347 次
社区版块
存档分类
最新评论

Reading File in Parallel in Erlang (Was Tim Bray's Erlang Exercise - Round III)

阅读更多

My first solution for Tim's exercise tried to read file in parallel, but I just realized by reading file module's source code, that file:open(FileName, Options) will return a process instead of IO device. Well, this means a lot:

  • It's a process, so, when you request more data on it, you actually send message to it. Since you only send 2 integer: the offset and length, sending message should be very fast. But then, this process (File) will wait for receiving data from disk/io. For one process, the receiving is sequential rather than parallelized.
  • If we look the processes in Erlang as ActiveObjects, which send/receive messages/data in async, since the receiving is sequential in one process, requesting/waiting around one process(or, object) is almost safe for parallelized programming, you usaully do not need to worry about lock/unlock etc. (except the outside world).
  • We can open a lot of File processes to read data in parallel, the bound is the disk/IO and the os' resources limit.

I wrote some code to test file reading in parallel, discardng the disk cache, on my 2-core MacBook, reading file with two processes can speedup near 200% to one process.

The code:

-module(file_pread).

-compile([native]).

-export([start/2]).

-include_lib("kernel/include/file.hrl").

start(FileName, ProcNum) ->
    [start(FileName, ProcNum, Fun) || Fun <- [fun read_file/3, fun pread_file/3]].


start(FileName, ProcNum, Fun) ->
    Start = now(),  

    Main = self(),
    Collector = spawn(fun () -> collect_loop(Main) end),

    Fun(FileName, ProcNum, Collector),
    
    %% don't terminate, wait here, until all tasks done.
    receive
        stop -> io:format("time: ~10.2f ms~n", [timer:now_diff(now(), Start) / 1000]) 
    end.

collect_loop(Main) -> collect_loop_1(Main, undefined, 0).
collect_loop_1(Main, ChunkNum, ChunkNum) -> 
    Main ! stop;
collect_loop_1(Main, ChunkNum, ProcessedNum) ->
    receive
        {chunk_num, ChunkNumX} ->
            collect_loop_1(Main, ChunkNumX, ProcessedNum);
        {seq, _Seq} ->
            collect_loop_1(Main, ChunkNum, ProcessedNum + 1)
    end.

get_chunk_size(FileName, ProcNum) ->
    {ok, #file_info{size=Size}} = file:read_file_info(FileName),
    Size div ProcNum.

read_file(FileName, ProcNum, Collector) ->
    ChunkSize = get_chunk_size(FileName, ProcNum),
    {ok, File} = file:open(FileName, [raw, binary]),
    read_file_1(File, ChunkSize, 0, Collector).
    
read_file_1(File, ChunkSize, I, Collector) ->
    case file:read(File, ChunkSize) of
        eof ->
            file:close(File),
            Collector ! {chunk_num, I};
        {ok, _Bin} -> 
            Collector ! {seq, I},
            read_file_1(File, ChunkSize, I + 1, Collector)
    end.


pread_file(FileName, ProcNum, Collector) ->
    ChunkSize = get_chunk_size(FileName, ProcNum),
    pread_file_1(FileName, ChunkSize, ProcNum, Collector).
       
pread_file_1(FileName, ChunkSize, ProcNum, Collector) ->
    [spawn(fun () ->
                   %% if it's the lastest chuck, read all bytes left, 
                   %% which will not exceed ChunkSize * 2
                   Length = if  I == ProcNum - 1 -> ChunkSize * 2;
                                true -> ChunkSize end,
                   {ok, File} = file:open(FileName, [read, binary]),
                   {ok, _Bin} = file:pread(File, ChunkSize * I, Length),
                   Collector ! {seq, I},
                   file:close(File)
           end) || I <- lists:seq(0, ProcNum - 1)],
    Collector ! {chunk_num, ProcNum}.

The pread_file/3 is parallelized, it always opens new File process for each reading process instead of sharing one opened File process during all reading processes. The read_file/3 is non-parallelized.

To evaulate: (run at least two-time for each test to average disk/IO caches.)

$ erlc -smp file_pread.erl
$ erl -smp

1> file_pread:start("o100k.ap", 2).
time:     691.72 ms
time:      44.37 ms
[ok,ok]
2> file_pread:start("o100k.ap", 2).
time:      74.50 ms
time:      43.59 ms
[ok,ok]
3> file_pread:start("o1000k.ap", 2).
time:    1717.68 ms
time:     408.48 ms
[ok,ok]
4> file_pread:start("o1000k.ap", 2).
time:     766.00 ms
time:     393.71 ms
[ok,ok]
5> 

Let's compare the results for each file (we pick the second testing result of each), the speedup:

  • o100k.ap, 20M, 74.50 / 43.59 - 1= 70%
  • o1000k.ap, 200M, 766.00 / 393.71 - 1 = 95%

On another 4-CPU debian machine, with 4 processes, the best result I got:

4> file_pread:start("o1000k.ap", 4).
time:     768.59 ms
time:     258.57 ms
[ok, ok]
5>

The parallelized reading speedup 768.59 / 258.57 -1 = 197%

I've updated my first solution according to this testing, opening new File process for each reading process instead of sharing the same File process. Of cource, there are still issues that I pointed in Tim Bray's Erlang Exercise on Large Dataset Processing - Round II

Although the above result can also be achieved in other Languages, but I find that coding parallelization in Erlang is a pleasure.

分享到:
评论

相关推荐

    Erlang and OTP in Action

    Erlang and OTP in Action teaches you to apply Erlang’s message passing model for concurrent programming–a completely different way of tackling the problem of parallel programming from the more ...

    erlang19安装包

    Erlang/OTP 19.1 is a service release containing mostly bug fixes, as well as a number of new features and characteristics improvements. Some highlights of the release are: erts: Improved dirty ...

    parallel-in-time algorithm: MGRIT

    The paper derives and analyses the (semi-)discrete dispersion relation of the Parareal parallel-in-time integration method. It investigates Parareal’s wave propagation characteristics with the aim to...

    parallel-supercomputing-in-SIMD-architectures

    在本资料包"parallel-supercomputing-in-SIMD-architectures.tgz"中,我们可能会发现关于如何利用SIMD技术来提升并行计算性能的深入探讨。 SIMD架构的核心思想是通过在同一时钟周期内对多个数据执行相同操作来实现...

    BURNINTEST--硬件检测工具

    - A parallel port loop back plug for the parallel port test. - A USB port loop back plug for the USB port test. - A USB 2.0 port loop back plug for the USB 2.0 port test. - PassMark ModemTest V1.3 ...

    Parallel Programming in OpenMP

    Parallel Programming in OpenMP

    CLIP-Q: Deep Network Compression Learning by In-Parallel Pruning

    CLIP-Q是一种创新的深度学习模型压缩方法,其全称为“Compressing Large Inference models with Parallel Pruning and Quantization”。该技术结合了两种有效的模型优化策略——网络修剪(Pruning)和量子化...

    Parallel-in-time algorithm: diagonalization technique

    In this paper, we discuss time-dependent PDE problems, which are always non-self-adjoint. We propose a block circulant preconditioner for the all-at-once evolutionary PDE system which has block ...

    jmeter-parallel-0.9.jar

    拷贝jmeter-parallel-0.9.jar到Jmeter/lib/ext上。 启动Jmeter。 根据需要添加Parallel Controller: 1)在Jmeter的线程组下面的逻辑控制器,选择bzm并行控制器; 2)把浏览器或者wireShark观察到的同一批次的并发...

    26-0-Intel-Parallel-Studio-XE-2019-YouTube.mp4

    26-0-Intel-Parallel-Studio-XE-2019-YouTube.mp4 26-0-Intel-Parallel-Studio-XE-2019-YouTube.mp4 26-0-Intel-Parallel-Studio-XE-2019-YouTube.mp4

    Parallel Computer Architecture - A Hardware Software Approach

    The most exciting development in parallel computer architecture is the convergence of traditionally disparate approaches on a common machine structure. This book explains the forces behind this ...

    intel parallel studio xe 2013 license file

    《Intel Parallel Studio XE 2013 许可文件详解》 Intel Parallel Studio XE 是一套由Intel公司推出的高效能编程工具套件,专为提升并行计算性能而设计。2013版本是其在当时的重要产品,集成了编译器、调试器、性能...

    In Praise of Programming Massively Parallel Processors: A Hands-on Approach

    ### 并行计算与GPU编程:《In Praise of Programming Massively Parallel Processors: A Hands-on Approach》概览 #### 标题解析 - **标题:“In Praise of Programming Massively Parallel Processors: A Hands-on...

    parallel-studio-xe-2019u4-install-guide-lin.pdf

    parallel-studio-xe-2019u4-install-guide-lin.pdf parallel-studio-xe-2019u4-install-guide-lin.pdf parallel-studio-xe-2019u4-install-guide-lin.pdf parallel-studio-xe-2019u4-install-guide-lin.pdf

    Parallel Computing on Heterogeneous Networks

    Chapter 7 - Advanced Heterogeneous Parallel Programming in mpC Chapter 8 - Toward a Message-Passing Library for Heterogeneous Networks of Computers Part III - Applications of Heterogeneous ...

    前端开源库-webpack-parallel-uglify-3-plugin

    前端开源库-webpack-parallel-uglify-3-pluginwebpack-parallel-uglify-3-plugin,一个用于并行运行uglifyjs的webpack插件。使用uglifyjs 3。

    parallel-studio-a.lic

    标题中的"parallel-studio-a.lic"通常是指Intel Parallel Studio软件的一个许可证文件。Intel Parallel Studio是一套集成的开发工具,专为优化、调试和分析在多核处理器和并行系统上运行的C++、Fortran和C代码而设计...

Global site tag (gtag.js) - Google Analytics