`
sillycat
  • 浏览: 2543198 次
  • 性别: Icon_minigender_1
  • 来自: 成都
社区版块
存档分类
最新评论

Perl Project Improvement(3)Perl and XML, IDE and Regex and SQS

 
阅读更多
Perl Project Improvement(3)Perl and XML, IDE and Regex and SQS

1 XML PERL Operation
Search for <referencenumber> and ignore if there is <![CDATA[ ]], generate the contents and put into one file
> time perl -ne 'if (/referencenumber/){ s/<!\[CDATA\[//; s/]]>//; s/.*?>//; s/<.*//; print;}' 1052.xml > referencenumber.xml
real 0m7.773s
user 0m7.084s
sys 0m0.564s
time is just a measure tool for how much time it used to execute the command. It only take 7 seconds to search that in a 2G xml files.
749,999 lines, 749,999 words and 24,525,055 characters.
> wc referecenumber.xml
  749999  749999 24625055 referecenumber.xml

Another command
> grep 'referencenumber' /data/12001.xml | awk -F"</?referencenumber>" '{ print $2}'

> time grep 'referencenumber' /data/1052.xml | awk -F"</?referencenumber>" '{ print $2}' > /data/referencenumber.xml

real 0m44.050s
user 0m45.492s
sys 0m0.809s

2 Env IDE Setting Up
Plugin for Perl on Eclipse
http://www.epic-ide.org/download.php
Download a small Eclipse only for Java
http://www.eclipse.org/downloads/
Set Up the Plugin
http://www.epic-ide.org/running_perl_scripts_within_eclipse/eclipse-runperl-figure4.png

Once I have the latest JAVA only Eclipse there, I will add the Perl Plugin
http://www.epic-ide.org/updates/testing

After install that, we can set up the eclipse Preference with Perl
Perl executatble “/Users/carl/tool/perl-5.16.3/bin/perl"

The select the Project Properties, setting these things:
Perl Include Path —> Add to List ${project_loc}

Set up the Unit tests
[Run] -> [External Tools]->[External Tools Configurations]->[Program] -> New
RunAllTest
      - Location: /Users/carl/tool/perl-5.16.3/prove
      - Working Directory: ${workspace_loc}:/jobs-producer-perl}
      - Arguments: ${build_files:t/*}
SingleTest
      - Location: /Users/carl/tool/perl-5.16.3/perl
      - Working Directory: ${workspace_loc}:/jobs-producer-perl}
     - Arguments: t/NumberUtil.t
PerlApp
     - Location: /Users/carl/tool/perl-5.16.3/perl
     - Working Directory: ${workspace_loc}:/jobs-producer-perl}
     - Arguments: JobProducerApp.pl

3 Perl Regex and Command Supporting
Perl, method to generate the difference files.
sub generateReferenceNumbers {
die "Wrong arguments" if @_ != 2;

#serivces
my $logger = &loadLogger();

my $hugeFileName = $_[0];
my $source_id = $_[1];

#prepare 2 arrays
my @redisArray = ();
my @xmlArray;

#big File location should be from parameters
#output reference number file should be in the same directory
my $bigFile = "/data/1052.xml";
my $referencenumberFile = "/data/referencenumber.xml";

#command to regex the reference numbers
`perl -ne "if (/referencenumber/){ s/<referencenumber>//; s/<\\/referencenumber>//; s/<!\\[CDATA\\[//; s/]]>//; s/\\s*\\t*//; print; }" $bigFile > $referencenumberFile`;

#read and trim the reference numbers from file to array
open(my $fileHandler, "<", $referencenumberFile) or die "Failed to open file: $!\n";
while(<$fileHandler>) {
          chomp;
          push @xmlArray, $_;
}
close $fileHandler;

#find the differences
my @differencesArray = lib::CollectionUtil::differenceInArrays(\@xmlArray,\@redisArray);

#logging and testing the difference
#$logger->info("the difference array = @differencesArray");
#my $first = $differencesArray[0];
#$logger->info("===$first==");

#output the difference to XML and send 2 next steps

}

The most important part is this:
`perl -ne "if (/referencenumber/){ s/<referencenumber>//; s/<\\/referencenumber>//; s/<!\\[CDATA\\[//; s/]]>//; s/\\s*\\t*//; print; }" $bigFile > $referencenumberFile`

-ne means we can put regex there to find the match.
s/<referencenumber>// means once we find the match, replace <referencenumber> to empty ‘’| //
s/<\\/referencenumber>// means replace </referencenumber> to empty
s/<!\\[CDATA\\[>// means replace <![CDATA[> to empty
s/]]>// means replace ]]> to empty
s/\\s*\\t*// means replace all the blank, tap characters to empty

Read lines of the file and push to array
open(my $fileHandler, "<", $referencenumberFile) or die "Failed to open file: $!\n";
while(<$fileHandler>) {
          chomp;
          push @xmlArray, $_;
}
close $fileHandler;

3 SQS
http://search.cpan.org/~penfold/Amazon-SQS-Simple-2.04/lib/Amazon/SQS/Simple.pm
http://search.cpan.org/~penfold/Amazon-SQS-Simple-2.04/
> cpan -fi Amazon::SQS::Simple

Error Message:
ERROR [try ]: On calling SendMessage: 501 Protocol scheme 'https' is not supported (LWP::Protocol::https not installed) at lib/QueueClientHandler.pm line 39.

Solution:
> cpan -fi LWP::Protocol::https

Error Message:
t/QueueClientHandler.t (Wstat: 0 Tests: 2 Failed: 0)
  Parse errors: No plan found in TAP output

Solution:
Change to logging the output, not print

Some core Classes, QueueClientHandler.pm
use strict;
use warnings;

use lib::CollectionUtil;
use Amazon::SQS::Simple;

package lib::QueueClientHandler;

sub init {
  my $configService = &loadService('configService');
  my $logger = &loadLogger();


  $logger->debug("init SQS connection-----");

  $logger->debug("--------------------------");

  my $access_key = 'AKIAIMxxxxxxx'; # Your AWS Access Key ID
  my $secret_key = 'BIr5Xlu1xxxxxxxx'; # Your AWS Secret Key

  my $register = IOC::Registry->instance();
  my $container = $register->getRegisteredContainer('JobsProducer');


  my $queueClient = new Amazon::SQS::Simple($access_key, $secret_key);

  $container->register(IOC::Service->new('queueService'
               => sub { $queueClient }));

  return 1;
}

sub sendMessage(){
my $queueService = &loadService('queueService');
my $endpoint = 'https://sqs.us-east-1.amazonaws.com/216323611345/stage-tasks';
my $taskQueue = $queueService->GetQueue($endpoint);
my $response = $taskQueue->SendMessage('Hello world!');
}

sub fetchMessage(){
# Retrieve a message
my $queueService = &loadService('queueService');
my $logger = &loadLogger();


my $endpoint = 'https://sqs.us-east-1.amazonaws.com/216323611345/stage-tasks';
my $taskQueue = $queueService->GetQueue($endpoint);
    my $msg = $taskQueue->ReceiveMessage();

    #$msg->MessageBody
    #print $msg->MessageBody() ;
    if($msg){
    $logger->info("Message I get is = ". $msg->MessageBody());
    # Delete the message
    $taskQueue->DeleteMessage($msg->ReceiptHandle);
    }

}

sub loadService {
   #check parameters
   die "Wrong arguments" if @_ != 1;

   my $serviceName = $_[0];
   my $register = IOC::Registry->instance();

   my $service = $register->searchForService($serviceName)
        || die "Failt to find the service name = " . $serviceName . " in RedisClientHandler.";

   return $service;
}

sub loadLogger {
   my $logger = Log::Log4perl::get_logger("lib::RedisClientHandler");
   return $logger;
}

1;

__END__

Test Class to Send the Messages, QueueClientHandler.t
use strict;
use warnings;

use Test::More qw(no_plan);

use Log::Log4perl::Level;
use Log::Log4perl qw(:easy);

use YAML::XS qw(LoadFile);
use Data::Dumper;
use Cwd;

use IOC;

# Verify module can be included via "use" pragma
BEGIN { use_ok('lib::QueueClientHandler') };

# Verify module can be included via "require" pragma
require_ok( 'lib::QueueClientHandler' );

#init the test class
#logging
Log::Log4perl->init(cwd() ."/conf/log4perl-test.conf");
our $logger = Log::Log4perl::get_logger("JobsProducer");

#load configuration
my $config = LoadFile(cwd() .'/conf/config.yaml');
$logger->debug("----init configuration --------");
$logger->debug(Dumper($config));
$logger->debug("-------------------------------");

my $container = IOC::Container->new('JobsProducer');

$container->register(IOC::Service->new('configService'
               => sub { $config } ));

my $register = IOC::Registry->new();
$register->registerContainer($container);

# Test the Init Operation
lib::QueueClientHandler::init();

lib::QueueClientHandler::sendMessage();

#lib::QueueClientHandler::fetchMessage();

Consumer Pulling the Messages, TaskConsumerApp.pl
# import advertiser job feeds
#
# usage: $0  stop  stop after current batch
#        $0  start import loop

use strict;
use warnings;

use IOC;

use Log::Log4perl::Level;
use Log::Log4perl qw(:easy);

use YAML::XS qw(LoadFile);
use Data::Dumper;
use lib::MysqlDAOHandler;
use lib::RedisClientHandler;
use lib::FeedFileHandler;
use lib::JobImportHandler;
use lib::StringUtil;
use lib::NumberUtil;
use lib::QueueClientHandler;

use threads;
use threads::shared;
use Time::Piece;
use Cwd;

use constant FLAG_PID => 'JOBS_PRODUCER_RUNNING';

my $runningEnv =  $ENV{'RUNNING_ENV'};

#logging
Log::Log4perl->init(cwd() . "/conf/log4perl-${runningEnv}.conf");
my $logger = Log::Log4perl::get_logger("JobsProducer");

#IOC
my $container = IOC::Container->new('JobsProducer');
my $register = IOC::Registry->new();
$register->registerContainer($container);

#configuration
my $config = LoadFile(cwd() . "/conf/config-${runningEnv}.yaml");
$logger->debug("----init configuration --------");
$logger->debug(Dumper($config));
$logger->debug("-------------------------------");
$container->register(IOC::Service->new('configService'
               => sub { $config } ));

#receive params
my $pidFileName = $config->{pidFilePath} . FLAG_PID;

# data file path
my $dataFilePath = $config->{dataFilePath};

# php script path
my $phpScriptPath = $config->{phpScriptPath};

#my $MAX_SPLIT_SIZE = 100_000_000; #max split file size
my $MAX_SPLIT_SIZE = $config->{maxSplitFileSize};

if (@ARGV == 1) {
  if ($ARGV[0] eq 'stop') {
  system 'touch ' . $pidFileName;
    $logger->info("Application is stopping.");
  }
  $logger->info("Application is running on $runningEnv\n");
} else{
print "Usage: $0 start/stop";
exit 1;
}

unlink $pidFileName;

#init database connection
lib::MysqlDAOHandler::init();

#init redis connection
lib::RedisClientHandler::init();

#init queue connection
lib::QueueClientHandler::init();

#main thread pulling from mysql
#multiple thread downloading the file
#single thread split the file
#multiple threads execute the php import

##################################################################
# Main Processor
##################################################################

$logger->info("Start the Main thread.");

while (!-f $pidFileName) {
#keep running in main thread

$logger->info("Main-Thread - Scanning for tasks");

    lib::QueueClientHandler::fetchMessage();

sleep 15;
}

$logger->info("Main-Thread - JobsProducerApp stop running.");

__END__

References:
http://sillycat.iteye.com/blog/2304196
http://sillycat.iteye.com/blog/2304197
分享到:
评论

相关推荐

    java 正则表达试

    import org.apache.oro.text.regex.Perl5Compiler; import org.apache.oro.text.regex.Perl5Matcher; import org.apache.oro.text.regex.Perl5Substitution; import org.apache.oro.text.regex.Util;

    UiPath - Matches and Regex - Simple and Complete Tutorial.srt

    UiPath - Matches and Regex - Simple and Complete Tutorial.srt

    perl-scripts

    文件名列表"perl scripts"没有提供具体的文件名,但通常一个Perl脚本库会包含各种示例,比如文件I/O操作、网络请求、数据库交互、日期和时间处理、XML或JSON解析等。在这些脚本中,你可以学习如何打开、读取、写入或...

    PCRE(Perl Compatible Regular Expressions)

    PCRE(Perl Compatible Regular Expressions)是一个Perl库,包括 perl 兼容的正规...测试了一下,同样一个程序,使用boost::regex编译时需要3秒,而使用pcre不到1秒。因此改用pcre来解决C语言中使用正则表达式的问题

    Perl语言入门(Learning Perl).第五版.PDF

    Perl语言是一种功能强大的脚本编程语言,以其在文本处理、系统管理、网络编程以及Web开发等领域中的广泛应用而闻名。"Learning Perl" 是Perl语言的经典入门教程,第五版更是经过了多年的实践与反馈优化,旨在为初学...

    Perl 语言编程 Perl 语言编程

    Perl,全称“Practical Extraction and Reporting Language”,是一种高级的、通用的、解释型的、动态的编程语言。它的设计融合了多种语言的特点,尤其在文本处理和系统管理方面表现出色,因此在早期互联网时代被...

    regex.h regex.cpp

    3. `regfree()`:释放内存。使用完正则表达式后,必须调用此函数来释放由`regcomp()`分配的内存资源。 `regex.cpp` 源文件可能是实现了这些函数的C++代码,通常包括了对`regex.h`中声明的函数的具体实现。在C++环境...

    The Regex Coach - interactive regular expressions

    The Regex Coach is a graphical application for Windows which can be used to experiment with (Perl-compatible) regular expressions interactively. It has the following features: It shows whether a ...

    PERL 编程24学时教程(PDF).zip

    Perl,全称Practical Extraction and Reporting Language,是一种高级的、通用的、解释型、动态的编程语言。这个“PERL 编程24学时教程”可能是为了帮助初学者在24小时内掌握Perl编程基础而设计的一套系统性学习资料...

    boost.regex手册

    Boost库是C++编程语言的一个开源库,其中的regex模块提供了强大的正则表达式处理功能,支持多种正则表达式语法风格,如POSIX基本和扩展语法、Perl语法等。 1. **Boost库与Boost.regex** Boost库是一个由C++程序员...

    用 Perl 实现的有用的单行程序(pdf)

    ### 使用Perl实现的一行程序详解 #### 一、概述 本文档主要介绍在Linux系统下如何利用Perl语言编写实用的一行脚本。这些脚本能够处理一些简单的任务,特别是那些用传统的Shell命令难以解决的问题。例如,判断一个...

    C++Boost的Regex库用法

    程序员可以通过Boost Regex库编写Perl风格的正则表达式来搜索、匹配和提取文本信息。例如,一个正则表达式可以用来检查一个字符串是否符合特定的格式,如电子邮箱地址、信用卡号码、电话号码等。 在使用Boost Regex...

    Notepad++ 6.6.8 绿色 textFx xmltool regex

    XMLtool提供了验证XML文档、格式化XML代码、折叠XML元素以及提取XML节点等功能,极大地提高了XML开发者的效率。它可以帮助用户检查XML文档的语法正确性,同时通过格式化使XML代码更易读。 "regex"即正则表达式,是...

    正则表达式测试,替换,实时转换软件RegexTester

    3. **多种正则表达式语法支持**:RegexTester兼容多种正则表达式语法,包括Perl、.NET、Java等常见语法,满足不同编程环境的需求。 4. **教程和参考**:软件通常会包含一些正则表达式的参考资料,如regex1.html、...

    Perl Expect参考手册(英文)

    ### Perl Expect 参考手册详解 #### 概述 Perl Expect 模块是 Perl 的一个扩展,用于自动化交互式程序的测试与管理。通过模拟键盘输入,它能够控制那些需要人工干预的应用程序或脚本。这使得用户能够在没有人为...

    正则表达式工具(检测_保存) RegExr.rar

    3. **功能完备**:RegExr包含了丰富的元字符、预定义字符类、量词、分组、捕获、非捕获、反向引用、零宽断言等正则表达式元素,以及它们的使用方法和示例,帮助用户全面掌握正则表达式语法。 4. **学习资源**:...

    正则表达式测试工具RegexTester 中文版

    7. **兼容性**:RegexTester通常会支持多种正则表达式引擎,包括Perl、JavaScript、PCRE(Perl兼容正则表达式)等,这样你可以在一个环境中测试不同语言下的正则表达式。 8. **调试功能**:对于复杂的正则表达式,...

    正则表达式测试工具regex101

    正则表达式(Regular Expression,简称regex)是用于匹配字符串的一种模式,广泛应用于文本处理、数据验证、搜索和替换等场景。在IT行业中,掌握正则表达式是提高工作效率的关键技能之一。`regex101`是一个在线的...

Global site tag (gtag.js) - Google Analytics