- 浏览: 2551947 次
- 性别:
- 来自: 成都
文章分类
最新评论
-
nation:
你好,在部署Mesos+Spark的运行环境时,出现一个现象, ...
Spark(4)Deal with Mesos -
sillycat:
AMAZON Relatedhttps://www.godad ...
AMAZON API Gateway(2)Client Side SSL with NGINX -
sillycat:
sudo usermod -aG docker ec2-use ...
Docker and VirtualBox(1)Set up Shared Disk for Virtual Box -
sillycat:
Every Half an Hour30 * * * * /u ...
Build Home NAS(3)Data Redundancy -
sillycat:
3 List the Cron Job I Have>c ...
Build Home NAS(3)Data Redundancy
Perl Huge XML Solution(1)Split Files and Multiple Threads
1. Upgrade the Perl
>sudo yum install cpan
>sudo cpan
cpan>install Bundle::CPAN
cpan>reload cpan
cpan>upgrade
Not working with Error Message
make NO isa perl
Solution:
> sudo yum install perl-Config*
Not working to upgrade the perl, but I can install the modules one by one
cpan> install Time::Piece
cpan> install Path::Class
cpan> install autodie
cpan> install Thread::Queue
2. Split The File
split_hero.pl
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Time::Piece;
use Path::Class;
use autodie; # die if problem reading or writing a file
my $OutputSize = 0;
my $OutputCount = 0;
my $MaxSize = 100_000_000;
my $HugeFileName = "data/728";
print localtime->strftime('%Y-%m-%d %X') . "\n";
my $out;
open(my $in, '<', $HugeFileName . '.xml') or die "input: $!\n";
while(<$in>) {
if(!$out) {
$OutputCount++;
$OutputSize = 0;
open($out, '>', $HugeFileName . "/output$OutputCount.xml") or die "output: $!\n";
unless($OutputCount==1) {
print $out qq{<?xml version='1.0' encoding='UTF-8'?>\n};
print $out qq{<source>\n};
}
}
print $out $_;
$OutputSize += length($_);
if(m|</job>|i) { #/
if($OutputSize > $MaxSize) {
print $out "</source>\n";
close($out);
$out = undef;
}
}
}
close($in);
my @files = glob($HugeFileName . "/*.xml");
my $dir = dir($HugeFileName);
my $list_file = $dir->file("file_list");
my $list_file_handle = $list_file->open('>>');
foreach my $file (@files) {
$list_file_handle->print($file . "\n");
print "$file\n";
}
print localtime->strftime('%Y-%m-%d %X') . "\n";
3. Multiple Threads on Perl
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;
my $nthreads = 5;
my $process_q = Thread::Queue->new();
my $failed_q = Thread::Queue->new();
#this is a subroutine, but that runs 'as a thread'.
#when it starts, it inherits the program state 'as is'. E.g.
#the variable declarations above all apply - but changes to
#values within the program are 'thread local' unless the
#variable is defined as 'shared'.
#Behind the scenes - Thread::Queue are 'shared' arrays.
sub worker {
#NB - this will sit a loop indefinitely, until you close the queue.
#using $process_q -> end
#we do this once we've queued all the things we want to process
#and the sub completes and exits neatly.
#however if you _don't_ end it, this will sit waiting forever.
while ( my $server = $process_q->dequeue() ) {
chomp($server);
print threads->self()->tid() . ": pinging $server\n";
my $result = `/sbin/ping -c 1 $server`;
if ($?) { $failed_q->enqueue($server) }
print $result;
}
}
#insert tasks into thread queue.
open( my $input_fh, "<", "server_list" ) or die $!;
print("what is the task list = " . $input_fh . "\n");
$process_q->enqueue(<$input_fh>);
close($input_fh);
#we 'end' process_q - when we do, no more items may be inserted,
#and 'dequeue' returns 'undefined' when the queue is emptied.
#this means our worker threads (in their 'while' loop) will then exit.
$process_q->end();
#start some threads
for ( 1 .. $nthreads ) {
threads->create( \&worker );
}
#Wait for threads to all finish processing.
foreach my $thr ( threads->list() ) {
$thr->join();
}
#collate results. ('synchronise' operation)
while ( my $server = $failed_q->dequeue_nb() ) {
print "$server failed to ping\n";
}
I change that a little bit to call PHP
my $result = `php src/import.php 728 $server`;
4. Test Result
split Huge XML(4.5G) on 2 cores CPU 4G memory Machine in 00:02:05
04:17:24
04:19:29
send to Redis/SQS on 2 cores CPU 4G memory Machine in 00:03:12
04:23:46
04:26:58
References:
http://sillycat.iteye.com/blog/1017590 file handler
http://sillycat.iteye.com/blog/2193773
Perl 1, 2, 3, 4, 6
http://sillycat.iteye.com/blog/1012882
http://sillycat.iteye.com/blog/1012923
http://sillycat.iteye.com/blog/1012940
http://sillycat.iteye.com/blog/1016428
http://sillycat.iteye.com/blog/1017632 string
http://sillycat.iteye.com/blog/1021197 web
http://sillycat.iteye.com/blog/1027282 queue client
http://sillycat.iteye.com/blog/1073593 browser info
Split XML File
http://stackoverflow.com/questions/11313852/split-one-file-into-multiple-files-based-on-delimiter
http://stackoverflow.com/questions/15503980/split-file-by-xml-tag
http://www.experts-exchange.com/Programming/Languages/Scripting/Perl/Q_24760607.html
https://metacpan.org/pod/XML::Twig#xml_split---cut-a-big-XML-file-into-smaller-chunks
http://code.izzid.com/2008/01/21/How-to-move-back-a-line-with-reading-a-perl-filehandle.html
Perl threads
http://stackoverflow.com/questions/26296206/perl-daemonize-with-child-daemons/26297240#26297240
http://stackoverflow.com/questions/6556976/how-to-use-perl-to-run-the-same-php-script-parallel
Perl Zip the File
http://perldoc.perl.org/IO/Compress/Zip.html
1. Upgrade the Perl
>sudo yum install cpan
>sudo cpan
cpan>install Bundle::CPAN
cpan>reload cpan
cpan>upgrade
Not working with Error Message
make NO isa perl
Solution:
> sudo yum install perl-Config*
Not working to upgrade the perl, but I can install the modules one by one
cpan> install Time::Piece
cpan> install Path::Class
cpan> install autodie
cpan> install Thread::Queue
2. Split The File
split_hero.pl
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use Time::Piece;
use Path::Class;
use autodie; # die if problem reading or writing a file
my $OutputSize = 0;
my $OutputCount = 0;
my $MaxSize = 100_000_000;
my $HugeFileName = "data/728";
print localtime->strftime('%Y-%m-%d %X') . "\n";
my $out;
open(my $in, '<', $HugeFileName . '.xml') or die "input: $!\n";
while(<$in>) {
if(!$out) {
$OutputCount++;
$OutputSize = 0;
open($out, '>', $HugeFileName . "/output$OutputCount.xml") or die "output: $!\n";
unless($OutputCount==1) {
print $out qq{<?xml version='1.0' encoding='UTF-8'?>\n};
print $out qq{<source>\n};
}
}
print $out $_;
$OutputSize += length($_);
if(m|</job>|i) { #/
if($OutputSize > $MaxSize) {
print $out "</source>\n";
close($out);
$out = undef;
}
}
}
close($in);
my @files = glob($HugeFileName . "/*.xml");
my $dir = dir($HugeFileName);
my $list_file = $dir->file("file_list");
my $list_file_handle = $list_file->open('>>');
foreach my $file (@files) {
$list_file_handle->print($file . "\n");
print "$file\n";
}
print localtime->strftime('%Y-%m-%d %X') . "\n";
3. Multiple Threads on Perl
#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;
my $nthreads = 5;
my $process_q = Thread::Queue->new();
my $failed_q = Thread::Queue->new();
#this is a subroutine, but that runs 'as a thread'.
#when it starts, it inherits the program state 'as is'. E.g.
#the variable declarations above all apply - but changes to
#values within the program are 'thread local' unless the
#variable is defined as 'shared'.
#Behind the scenes - Thread::Queue are 'shared' arrays.
sub worker {
#NB - this will sit a loop indefinitely, until you close the queue.
#using $process_q -> end
#we do this once we've queued all the things we want to process
#and the sub completes and exits neatly.
#however if you _don't_ end it, this will sit waiting forever.
while ( my $server = $process_q->dequeue() ) {
chomp($server);
print threads->self()->tid() . ": pinging $server\n";
my $result = `/sbin/ping -c 1 $server`;
if ($?) { $failed_q->enqueue($server) }
print $result;
}
}
#insert tasks into thread queue.
open( my $input_fh, "<", "server_list" ) or die $!;
print("what is the task list = " . $input_fh . "\n");
$process_q->enqueue(<$input_fh>);
close($input_fh);
#we 'end' process_q - when we do, no more items may be inserted,
#and 'dequeue' returns 'undefined' when the queue is emptied.
#this means our worker threads (in their 'while' loop) will then exit.
$process_q->end();
#start some threads
for ( 1 .. $nthreads ) {
threads->create( \&worker );
}
#Wait for threads to all finish processing.
foreach my $thr ( threads->list() ) {
$thr->join();
}
#collate results. ('synchronise' operation)
while ( my $server = $failed_q->dequeue_nb() ) {
print "$server failed to ping\n";
}
I change that a little bit to call PHP
my $result = `php src/import.php 728 $server`;
4. Test Result
split Huge XML(4.5G) on 2 cores CPU 4G memory Machine in 00:02:05
04:17:24
04:19:29
send to Redis/SQS on 2 cores CPU 4G memory Machine in 00:03:12
04:23:46
04:26:58
References:
http://sillycat.iteye.com/blog/1017590 file handler
http://sillycat.iteye.com/blog/2193773
Perl 1, 2, 3, 4, 6
http://sillycat.iteye.com/blog/1012882
http://sillycat.iteye.com/blog/1012923
http://sillycat.iteye.com/blog/1012940
http://sillycat.iteye.com/blog/1016428
http://sillycat.iteye.com/blog/1017632 string
http://sillycat.iteye.com/blog/1021197 web
http://sillycat.iteye.com/blog/1027282 queue client
http://sillycat.iteye.com/blog/1073593 browser info
Split XML File
http://stackoverflow.com/questions/11313852/split-one-file-into-multiple-files-based-on-delimiter
http://stackoverflow.com/questions/15503980/split-file-by-xml-tag
http://www.experts-exchange.com/Programming/Languages/Scripting/Perl/Q_24760607.html
https://metacpan.org/pod/XML::Twig#xml_split---cut-a-big-XML-file-into-smaller-chunks
http://code.izzid.com/2008/01/21/How-to-move-back-a-line-with-reading-a-perl-filehandle.html
Perl threads
http://stackoverflow.com/questions/26296206/perl-daemonize-with-child-daemons/26297240#26297240
http://stackoverflow.com/questions/6556976/how-to-use-perl-to-run-the-same-php-script-parallel
Perl Zip the File
http://perldoc.perl.org/IO/Compress/Zip.html
发表评论
-
Stop Update Here
2020-04-28 09:00 316I will stop update here, and mo ... -
NodeJS12 and Zlib
2020-04-01 07:44 476NodeJS12 and Zlib It works as ... -
Docker Swarm 2020(2)Docker Swarm and Portainer
2020-03-31 23:18 369Docker Swarm 2020(2)Docker Swar ... -
Docker Swarm 2020(1)Simply Install and Use Swarm
2020-03-31 07:58 370Docker Swarm 2020(1)Simply Inst ... -
Traefik 2020(1)Introduction and Installation
2020-03-29 13:52 337Traefik 2020(1)Introduction and ... -
Portainer 2020(4)Deploy Nginx and Others
2020-03-20 12:06 431Portainer 2020(4)Deploy Nginx a ... -
Private Registry 2020(1)No auth in registry Nginx AUTH for UI
2020-03-18 00:56 436Private Registry 2020(1)No auth ... -
Docker Compose 2020(1)Installation and Basic
2020-03-15 08:10 374Docker Compose 2020(1)Installat ... -
VPN Server 2020(2)Docker on CentOS in Ubuntu
2020-03-02 08:04 455VPN Server 2020(2)Docker on Cen ... -
Buffer in NodeJS 12 and NodeJS 8
2020-02-25 06:43 385Buffer in NodeJS 12 and NodeJS ... -
NodeJS ENV Similar to JENV and PyENV
2020-02-25 05:14 478NodeJS ENV Similar to JENV and ... -
Prometheus HA 2020(3)AlertManager Cluster
2020-02-24 01:47 423Prometheus HA 2020(3)AlertManag ... -
Serverless with NodeJS and TencentCloud 2020(5)CRON and Settings
2020-02-24 01:46 337Serverless with NodeJS and Tenc ... -
GraphQL 2019(3)Connect to MySQL
2020-02-24 01:48 248GraphQL 2019(3)Connect to MySQL ... -
GraphQL 2019(2)GraphQL and Deploy to Tencent Cloud
2020-02-24 01:48 451GraphQL 2019(2)GraphQL and Depl ... -
GraphQL 2019(1)Apollo Basic
2020-02-19 01:36 328GraphQL 2019(1)Apollo Basic Cl ... -
Serverless with NodeJS and TencentCloud 2020(4)Multiple Handlers and Running wit
2020-02-19 01:19 314Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(3)Build Tree and Traverse Tree
2020-02-19 01:19 318Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(2)Trigger SCF in SCF
2020-02-19 01:18 294Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(1)Running with Component
2020-02-19 01:17 312Serverless with NodeJS and Tenc ...
相关推荐
Chapter 1: Threads and Runnables Chapter 2: Synchronization Chapter 3: Waiting and Notification Chapter 4: Additional Thread Capabilities Part II: Concurrency Utilities Chapter 5: Concurrency ...
sdk2003文档 DLLs, Processes, and Threads
装mysql时提示少perl,安装perl需要依赖包。已包含下面所有包, 版本号匹配。 [Linux]centOS7下RPM安装Perl 下载rpm依赖包,依照顺序安装. perl-parent-0.225-244.el7.noarch perl-...
A self-contained reference that relies on the latest UNIX standards,UNIX Systems Programming provides thorough coverage of files, signals,semaphores, POSIX threads, and client-server communication....
离线安装包,亲测可用
1. **Parallelism**: Multiple threads can execute independently, allowing for more efficient use of processor cores and improved overall throughput. 2. ** Responsiveness**: Threads can be prioritized, ...
Unix Systems Programming Communication, Concurrency, and Threads 2003.chm
- **XML和JSON处理**:Perl有XML::Simple、XML::DOM等模块处理XML数据,JSON::XS用于解析和生成JSON格式的数据。 - **正则表达式**:Perl的正则表达式功能强大,可用于快速查找、替换和提取网络数据中的模式。 6....
Perl由Larry Wall在1987年创建,它的全称是"Practical Extraction and Reporting Language",即“实用提取和报告语言”。Perl的设计理念是结合C、sed、awk等语言的优点,提供一种高效、简洁且功能丰富的编程工具。 ...
离线安装包,亲测可用
Perl中的线程(threads)是程序执行的基本单元,每个线程都有自己的内存空间,可以并行执行任务。线程之间共享进程的资源,如打开的文件描述符和全局变量,但拥有独立的栈,这意味着它们可以同时运行不同的代码块而...
Perl,全称“ Practical Extraction and Reporting Language”,是一种强大的文本处理语言,尤其在系统管理、脚本编程、网络编程以及文本挖掘等领域广泛应用。本教程“Perl入门及高级编程”旨在为初学者提供一个全面...
Coverage also includes files, signals, semaphores, POSIX threads, and client-server communication. The authors illustrate the best ways to write system calls, they present several hands-on projects ...
1. 创建一个互斥量对象,用于控制对缓冲区的访问。 2. 当生产者生成新的数据时,先获取互斥量的锁,检查缓冲区是否已满。 3. 如果未满,将数据添加到缓冲区,并释放互斥量的锁,允许其他线程访问。 4. 如果已满,则...
3. **模块使用**:Perl有丰富的CPAN(Comprehensive Perl Archive Network)库,书中可能涉及一些常用模块的使用,如LWP(用于Web请求)、DBI(数据库接口)或XML::Parser(处理XML文档)。 4. **网络编程**:Perl...
1. **Perl简介**:Perl是一种通用、多用途的脚本编程语言,特别适合文本处理和系统管理任务。Perl的灵活性和强大的字符串处理能力使其在网络编程中具有广泛的应用。 2. **网络基础知识**:书中会讲解TCP/IP协议栈的...
它的名字“Perl”是“Practical Extraction and Reporting Language”的首字母缩写,最初是为了文本处理和报告生成而创建的。随着时间的发展,Perl逐渐发展成为一个功能强大的多用途语言,被广泛用于系统管理、网络...
在 Perl 语言中,使用 threads 包可以实现多线程编程。threads 包提供了多种方法来创建和管理线程,包括创建线程、等待线程、detach 线程、获取线程列表等。使用 threads 包可以方便地实现多线程编程,但需要注意...