`
aaron_ch
  • 浏览: 178178 次
  • 性别: Icon_minigender_1
  • 来自: 苏州
社区版块
存档分类
最新评论

Regular Express

    博客分类:
  • Perl
阅读更多

Regular expressions

A regular expression is contained in slashes, and matching occurs with the =~ operator. The following expression is true if the string the appears in variable $sentence.
$sentence =~ /the/
The RE is case sensitive, so if
$sentence = "The quick brown fox";
then the above match will be false. The operator !~ is used for spotting a non-match. In the above example
$sentence !~ /the/
is true because the string the does not appear in $sentence.

 


The $_ special variable

We could use a conditional as
if ($sentence =~ /under/)
{
	print "We're talking about rugby\n";
}
which would print out a message if we had either of the following
$sentence = "Up and under";
$sentence = "Best winkles in Sunderland";
But it's often much easier if we assign the sentence to the special variable $_ which is of course a scalar. If we do this then we can avoid using the match and non-match operators and the above can be written simply as
if (/under/)
{
	print "We're talking about rugby\n";
}
The $_ variable is the default for many Perl operations and tends to be used very heavily.

 


More on REs

In an RE there are plenty of special characters, and it is these that both give them their power and make them appear very complicated. It's best to build up your use of REs slowly; their creation can be something of an art form.

Here are some special RE characters and their meaning

.	# Any single character except a newline
^	# The beginning of the line or string
$	# The end of the line or string
*	# Zero or more of the last character
+	# One or more of the last character
?	# Zero or one of the last character
and here are some example matches. Remember that should be enclosed in /.../ slashes to be used.
t.e	# t followed by anthing followed by e
	# This will match the
	#                 tre
	#                 tle
	# but not te
	#         tale
^f	# f at the beginning of a line
^ftp	# ftp at the beginning of a line
e$	# e at the end of a line
tle$	# tle at the end of a line
und*	# un followed by zero or more d characters
	# This will match un
	#                 und
	#                 undd
	#                 unddd (etc)
.*	# Any string without a newline. This is because
	# the . matches anything except a newline and
	# the * means zero or more of these.
^$	# A line with nothing in it.

There are even more options. Square brackets are used to match any one of the characters inside them. Inside square brackets a - indicates "between" and a ^ at the beginning means "not":

[qjk]		# Either q or j or k
[^qjk]		# Neither q nor j nor k
[a-z]		# Anything from a to z inclusive
[^a-z]		# No lower case letters
[a-zA-Z]	# Any letter
[a-z]+		# Any non-zero sequence of lower case letters
At this point you can probably skip to the end and do at least most of the exercise. The rest is mostly just for reference.

A vertical bar | represents an "or" and parentheses (...) can be used to group things together:

jelly|cream	# Either jelly or cream
(eg|le)gs	# Either eggs or legs
(da)+		# Either da or dada or dadada or...

Here are some more special characters:

\n		# A newline
\t		# A tab
\w		# Any alphanumeric (word) character.
		# The same as [a-zA-Z0-9_]
\W		# Any non-word character.
		# The same as [^a-zA-Z0-9_]
\d		# Any digit. The same as [0-9]
\D		# Any non-digit. The same as [^0-9]
\s		# Any whitespace character: space,
		# tab, newline, etc
\S		# Any non-whitespace character
\b		# A word boundary, outside [] only
\B		# No word boundary

Clearly characters like $, |, [, ), \, / and so on are peculiar cases in regular expressions. If you want to match for one of those then you have to preceed it by a backslash. So:

\|		# Vertical bar
\[		# An open square bracket
\)		# A closing parenthesis
\*		# An asterisk
\^		# A carat symbol
\/		# A slash
\\		# A backslash
and so on.

 


Some example REs

As was mentioned earlier, it's probably best to build up your use of regular expressions slowly. Here are a few examples. Remember that to use them for matching they should be put in /.../ slashes
[01]		# Either "0" or "1"
\/0		# A division by zero: "/0"
\/ 0		# A division by zero with a space: "/ 0"
\/\s0		# A division by zero with a whitespace:
		# "/ 0" where the space may be a tab etc.
\/ *0		# A division by zero with possibly some
		# spaces: "/0" or "/ 0" or "/  0" etc.
\/\s*0		# A division by zero with possibly some
		# whitespace.
\/\s*0\.0*	# As the previous one, but with decimal
		# point and maybe some 0s after it. Accepts
		# "/0." and "/0.0" and "/0.00" etc and
		# "/ 0." and "/  0.0" and "/   0.00" etc.
分享到:
评论

相关推荐

    Microsoft Regular Express Tester

    VBS编写的一个微软正则表达式验证程序(绝对原创)

    regular express 中文版

    下面是译文作者写的前言 ------------------------------------------------------------------ 半年前我对正则表达式产生了兴趣,在网上查找过不少资料,看过不少的教程,最后在使用一个正则表达式工具RegexBuddy...

    Regular Express(正则表达式)

    windows环境 功能:调试,编辑正则表达式进行(可视化)

    Implement Data Validation with Regular Express to QTreeWidget

    1080P高清视频。 讲述了PyQt5中QTreeWidget 使用正则表达式实现隐含输入控制,从而使节点内容仅允许输出数字和字母。这个视频还从讲述了如何实现双击编辑条目的功能,国内的网站上目前没有找到类似的介绍文章或视频...

    YARET - Yet Another Regular Express Tool-开源

    YARET,全称“Yet Another Regular Express Tool”,是一个开源的正则表达式工具,为用户提供了一个简单易用的界面来测试和调试他们的正则表达式。 正则表达式是编程和数据处理中的核心概念之一,它通过一套特殊的...

    Mastering Regular Express

    正文中: 《精通正则表达式》是一本深入探讨正则表达式的权威指南,它针对的是那些希望在编程和文本处理领域中充分利用正则表达式功能的开发者。"regex"标签明确了本书的核心主题,即正则表达式,这是一种强大的...

    C++ BOOST 正则表达式使用教程

    Regex 代表 Regular Express。C++ 中使用 Boost 库的 regex 类来实现正则表达式。 正则表达式的基础知识 正则表达式是一种模式匹配语言,用于描述字符串的模式。它可以用来验证输入字符串是否符合某种模式,提取...

    正则表达式(一)从初学到精通正则表达式

    Regex 代表 Regular Express。本文将用<<regex>>来表示一段具体的正则表达式。一段文本就是最基本的模式,简单的匹配相同的文本。 不同类型的正则表达式引擎 正则表达式引擎是一种可以处理正则表达式的软件。通常...

    正则表达式搜索栏「Regular Expression Search Bar」-crx插件

    默认Chrome查找栏的改进版本,支持使用正则表达式。 通过按Ctrl + f并输入要匹配的文本或正则表达式,可以在您访问的所有网页上查找文本。可以通过将正则表达式包装在正斜杠(/)中并在其后附加标志来输入正则表达式...

    GMS_Express.pdf

    非Go版本设备则必须预装Regular Version,除非设备处于热身状态(hot seat)。在某些情况下,还要求必须具备Google Discover(信息流)功能。 6. GMS Express的合作伙伴要求: GMS Express和GMS Express Plus的合作...

    LinqToRegex:LINQ to Regex 库提供了对 .NET 正则表达式的语言集成访问

    LINQ 到正则表达式 LINQ to Regex 库提供对 .NET 正则表达式的语言集成访问。 它允许您直接在代码中创建和使用正则表达式,并开发复杂的表达式,同时保持其可读性和可维护性。...using Pihrtsoft.Text.RegularExpress

    skill-testing-ml:语音应用程序的统一测试框架

    技能测试员这是什么单元测试Alexa技能的工具。这个怎么运作用YAML编写测试,如下所示: ---- test: "Sequence ... "Simple" - response.card.title: "Space Facts" - response.card.content: "/.*/" # Regular expressi

    JavaScript Regular Expressions(PACKT,2015)

    This book starts by exploring what a pattern actually is and how regular expressions express these patterns to match and manipulate user data. You then move on to learning about the use of character ...

    express-dummy-image:Express 中间件用于提供虚拟图像

    表达虚拟形象Express 中间件用于提供虚拟图像要求用法 var express = require ( 'express' )var dummy = require ( 'express-dummy-image' )var app =...安装npm install express-dummy-image 执照NotoSans-Regular.ttf

    C#代码实现短信验证码接口示例

    本文实例为大家分享了C#实现短信验证码接口示例,供大家参考,具体内容如下 using System; using System.Collections.Generic; using System.Linq; using System.Text;...using System.Text.RegularExpress

    正则表达式regular expression详述(二)

    正则表达式(Regular Expression,简称regex)是用于匹配字符串中模式的一种模式语言。在JavaScript中,正则表达式主要用于字符串处理,如搜索、替换、分割等操作。本篇文章主要探讨的是正则表达式在JavaScript中的...

    Shell正则表达式之grep、sed、awk实操笔记

    - **功能**: 在文件`regular_express.txt`中搜索包含字符串"the"的所有行,并显示行号。 ##### 2. 反向搜索特定字符串 "the" - **命令**: `grep -vn 'the' regular_express.txt` - **功能**: 搜索不包含"the"的所有...

    正则表达式GitBook.zip

    正则表达式是一种强大的文本处理工具,用于匹配、查找、替换和分析字符串模式。在Python中,正则表达式被广泛应用于数据清洗、文本分析、输入验证等多个领域。本资源"正则表达式GitBook.zip"提供了一本以Python语言...

Global site tag (gtag.js) - Google Analytics