Regular expressions
Groovy In Action.pdf
|
Symbol
|
Meaning
|
.
|
Any character
|
^
|
Start of line (or start of document, when in single-line mode)
|
$
|
End of line (or end of document, when in single-line mode)
|
\d
|
Digit character
|
\D
|
Any character except digits
|
\s
|
Whitespace character
|
\S
|
Any character except whitespace
|
\w
|
Word character
|
\W
|
Any character except word characters
|
\b
|
Word boundary
|
()
|
Grouping
|
(x|y)
|
x or y as in (Groovy|Java|Ruby)
|
\1
|
Backmatch to group one; for example, find doubled characters with (.)\1
|
x*
|
Zero or more occurrences of x
|
x+
|
One or more occurrences of x
|
x?
|
Zero or one occurrence of x
|
x{m,n}
|
At least m and at most n occurrences of x
|
x{m}
|
Exactly m occurrences of x
|
[a-f]
|
Character class containing the characters a, b, c, d, e, f
|
[^a]
|
Character class containing any character except a
|
[aeiou]
|
Character class representing lowercase vowels(元音)
|
[a-z&&[^aeiou]]
|
Lowercase consonants协调一致的
|
[a-zA-Z0-9]
|
Uppercase or lowercase letter or digit
|
[+|-]?(\d+(\.\d*)?)|(\.\d+)
|
Positive or negative floating-point number
|
^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$
|
Simple email validation
|
(?is:x)
|
Switches mode when evaluating x; i turns on ignoreCase, s is single-line mode
|
(?=regex)
|
Positive lookahead正向前瞻
|
(?<=text)
|
Positive lookbehind
|
Beginning Groovy and Grails (Jun 2008).pdf
Table 2-1. Summary of Regular-Expression Constructs
Construct Matches
Characters
x
|
The character x
|
\\
|
The backslash character
|
\t
|
The tab character (\u0009)
|
\n
|
The newline (line feed) character (\u000A)
|
\r
|
The carriage return character (\u000D)
|
\f
|
The form feed character (\u000C)
|
\e
|
The escape character (\u001B)
|
Character Classes
[abc]
|
a, b, or c (simple class)
|
[^abc]
|
Any character except a, b, or c (negation)
|
[a-zA-Z]
|
a through z or A through Z, inclusive (range)
|
[a-d[m-p]]
|
a through d, or m through p: [a-dm-p] (union)
|
[a-z&&[def]]
|
d, e, or f (intersection)
|
[a-z&&[^bc]]
|
a through z, except for b and c: [ad-z] (subtraction)
|
[a-z&&[^m-p]]
|
a through z, and not m through p: [a-lq-z] (subtraction)
|
Predefined Character Classes
.
|
Any character (may or may not match line terminators)
|
\d
|
A digit: [0-9]
|
\D
|
A nondigit: [^0-9]
|
\s
|
A whitespace character: [\t\n\x0B\f\r]
|
\S
|
A non-whitespace character: [^\s]
|
\w
|
A word character: [a-zA-Z_0-9]
|
\W
|
A nonword character: [^\w]
|
Boundary Matchers
^
|
The beginning of a line
|
$
|
The end of a line
|
\b
|
A word boundary
|
\B
|
A nonword boundary
|
\A
|
The beginning of the input
|
\G
|
The end of the previous match
|
\Z
|
The end of the input but for the final terminator, if any
|
\z
|
The end of the input
|
Greedy Quantifiers
X?
|
X, once or not at all
|
X*
|
X, zero or more times
|
X+
|
X, one or more times
|
X{n}
|
X, exactly n times
|
X{n,}
|
X, at least n times
|
X{n,m}
|
X, at least n but not more than m times
|
Reluctant Quantifiers
X??
|
X, once or not at all
|
X*?
|
X, zero or more times
|
X+?
|
X, one or more times
|
X{n}?
|
X, exactly n times
|
X{n,}?
|
X, at least n times
|
X{n,m}?
|
X, at least n but not more than m times
|
Possessive Quantifiers
X?+
|
X, once or not at all
|
X*+
|
X, zero or more times
|
X++
|
X, one or more times
|
X{n}+
|
X, exactly n times
|
X{n,}+
|
X, at least n times
|
X{n,m}+
|
X, at least n but not more than m times
|
Logical Operators
XY
|
X followed by Y
|
X|Y
|
Either X or Y
|
(X)
|
X, as a capturing group
|
http://groovy.codehaus.org/Regular+Expressions
http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html
http://groovy.codehaus.org/Tutorial+4+-+Regular+expressions+basics and
http://groovycodehaus.org/Tutorial+5+-+Capturing+regex+groups
For a complete list of regular expressions, see http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html .
|
some text --Exactly “some text”.
some\s+text --The word “some” followed by one or more whitespace characters followed by the word “text”.
^\d+(\.\d+)? (.*) -Our introductory example: headings of level one or two. ^ denotes a line start, \d a digit,
\d+ one or more digits. Parentheses are used for grouping. The question mark makes the first group optional. The second group contains the title, made of a dot for any character and a star for any number of such characters.
\d\d/\d\d/\d\d\d\d --A date formatted as exactly two digits followed by slash, two more digits followed
by a slash, followed by exactly four digits.
$ 的作用:
def reg1 = ~/^[A-Z]{1}[a-zA-Z0-9]+$/
assert "Ad1WRldd" =~ reg1 true
assert "Ad1@#!ldd" =~ reg1 false
但如果没有 $
def reg1 = ~/^[A-Z]{1}[a-zA-Z0-9]+$/
assert "Ad1@#!ldd" =~ reg1 true
groovy正则表达式(google 笔记本中…)
Sometimes the slashy syntax interferes with other valid Groovy expressions such as line comments (注释//)or numerical expressions with multiple slashes for division(除号/). When in doubt, put parentheses () around your pattern like (/pattern/). Parentheses force the parser to interpret the content as an expression.
■ The regex find operator =~
■ The regex match operator ==~
■ The regex pattern operator ~String
assert "abc" == /abc/
assert "\\d" == /\d/
def reference = "hello"
assert reference == /$reference/
assert "\$" == /$/
twister = 'she sells sea shells at the sea shore of seychelles'
// twister must contain a substring of size 3
// that starts with s and ends with a
assert twister =~ /s.a/ ß Regex find operator as usable in if
finder = (twister =~ /s.a/) ß Find expression evaluates to a matcher object
assert finder instanceof java.util.regex.Matcher
// twister must contain only words delimited by single spaces
assert twister ==~ /(\w+ \w+)*/ ß Regex match operator
WORD = /\w+/
matches = (twister ==~ /($WORD $WORD)*/) ß Match expression evaluates to a Boolean
assert matches instanceof java.lang.Boolean
assert (twister ==~ /s.e/) == false ß Match is full, not partial like find
wordsByX = twister.replaceAll(WORD, 'x')
assert wordsByX == 'x x x x x x x x x x'
words = twister.split(/ /) ß Split returns a list of words
assert words.size() == 10
assert words[0] == 'she'
使用建议:
■ When things get complex (note, this is when, not if), comment verbosely.
■ Use the slashy syntax instead of the regular string syntax, or you will get lost in a forest of backslashes.
■ Don’t let your pattern look like a toothpick puzzle. Build your pattern from subexpressions like WORD in listing 3.5.
■ Put your assumptions to the test. Write some assertions or unit tests to test your regex against static strings. Please don’t send us any more flowers for this advice; an email with the subject “assertion saved my life today” will suffice.
"\$abc." =~ /\$(.*)\./
//"\$abc."=~ \\\$(.*)\\.
'''${abc.}tttt${abc.}''' =~ /\$\{(.*)\.\}/
ddd = ~/\$\{(\w*)\.\}/
println ddd.class <--class java.util.regex.Pattern, 显示构造pattern 对象(同时构建相应的状态机(参考性能一节))
regex = /\$\{(\w*)\.\}/ <-- () 分组?
println regex.class <-- class java.lang.String,匹配时(=~ )隐式构建pattern 对象
str = '''${abc.}tttt${def.}'''
str.eachMatch(regex){match->
println match[0] <-- ${abc.} abc ${def.} def
println match[1]
}
regex2 = /\$\{\w*\.\}/ <-- 去掉了 () ok
cloze = str.replaceAll(regex2){ ch ->
println "all match = "+ch
ch=44
}
println "cloze = "+cloze <-- all match = ${abc.} all match = ${def.} cloze = 44tttt44234
def message = '${Hello}, ${world}'
def reg = '\\$\\{\\w*\\}'
def reg2= /\$\{\w*\}/
def rs = message.replaceAll(reg) { ch ->
println "all match = "+ch
ch=11
}
println "rs = "+rs
matcher = 'a b c' =~ /\S/
assert matcher[0] == 'a'
assert matcher[1..2] == 'bc'
assert matcher.count == 3
matcher = 'a:1 b:2 c:3' =~ /(\S+):(\S+)/ ß 注意这两例子中的区别,分组的matcher 值是不同的
assert matcher.hasGroup()
assert matcher[0] == ['a:1', 'a', '1']
('xy' =~ /(.)(.)/).each { all, x, y -> ß 也可以打印出来
assert all == 'xy'
assert x == 'x'
assert y == 'y'
}
myFairStringy = 'The rain in Spain stays mainly in the plain!'
// words that end with 'ain': \b\w*ain\b
BOUNDS = /\b/
rhyme = /$BOUNDS\w*ain$BOUNDS/
found = ''
myFairStringy.eachMatch(rhyme) { match -> string.eachMatch (pattern_string)
found += match[0] + ' '
}
assert found == 'rain Spain plain '
found = ''
(myFairStringy =~ rhyme).each { match -> matcher.each(closure)
found += match + ' '
}
assert found == 'rain Spain plain '
cloze = myFairStringy.replaceAll(rhyme){ it-'ain'+'___' } string.replaceAll (pattern_string, closure)
assert cloze == 'The r___ in Sp___ stays mainly in the pl___!'
Ø 性能
Listing 3.7 Increase performance with pattern reuse.
twister = 'she sells sea shells at the sea shore of seychelles'
// some more complicated regex:
// word that starts and ends with same letter
regex = /\b(\w)\w*\1\b/
start = System.currentTimeMillis()
100000.times{
twister =~ regex ß Find operator with implicit pattern construction
}
first = System.currentTimeMillis() – start
start = System.currentTimeMillis()
pattern = ~regex ß Explicit pattern construction
100000.times{
pattern.matcher(twister) ß Apply the pattern on a String
}
second = System.currentTimeMillis() - start
assert first > second * 1.20
模式操作符(pattern operator) ~String
The pattern operator 把字符串转换成java.util.regex.Pattern 对象. For a given string, this pattern object can be asked for a matcher object.
The rationale behind this construction is that patterns are internally backed by a so-called finite state machine that does all the high-performance magic. This machine is compiled when the pattern object is created(当模式对象创建的时候这个有限状态机就编译完成). The more complicated the pattern, the longer the creation takes. In contrast, the matching process as performed by the machine is extremely fast.
模式操作符可以把模式创建时间从模式匹配时分离出来,这样可以达到reuse 有限状态机的目的。因此,
由上面例子可以看出,first 是在每次匹配时构造一次pattern,而second在匹配前显示pattern-creation , 可以节省时间
Ø Patterns for classification
Pattern object 有个 isCase(String ) 方法
Listing 3.8 Patterns in grep() and switch()
assert (~/..../).isCase('bear')
switch('bear'){
case ~/..../ : assert true; break
default : assert false
}
beasts = ['bear','wolf','tiger','regex']
assert beasts.grep(~/..../) == ['bear','wolf']
TIP Classifications read nicely in switch and grep. The direct use of classifier.isCase(candidate) happens rarely, but when it does, it is best read from right to left: “candidate is a case of classifier”.
分享到:
相关推荐
【Groovy正则表达式基础1】 Groovy是一种基于Java平台的动态编程语言,它扩展了Java的功能,包括对正则表达式的处理。在Groovy中,正则表达式是通过`~`运算符或者使用斜杠(`/`)来定义的。这种特性使得Groovy在处理...
### Groovy正则表达式详解 #### 一、引言 在编程领域中,正则表达式(Regular Expressions)是一种强大的文本处理工具,被广泛应用于字符串搜索与替换等操作中。Groovy作为一种灵活的脚本语言,在处理文本时尤其...
### Groovy正则表达式详解 #### 一、Groovy正则表达式的定义与使用 在Groovy中,正则表达式(Regular Expression)是一种强大的工具,用于处理文本中的模式匹配、查找、替换等操作。Groovy支持多种方式定义正则...
在Groovy中,数据搜索功能和正则表达式是两个非常重要的工具,尤其在处理文本和数据解析时。在这篇关于“Groovy入门”的第四讲中,我们将深入探讨这两个主题。 ### 1. 数据搜索功能 在Groovy中,数据搜索通常涉及...
早期我用c#开发了一个正则表达式工具,而现在这个版本是我用java实现的。 众所周知,java在桌面应用方面一直是短板,c#则有着天然的优势。然作为一个java开发人员,采用java的编程思想来实现此功能还是很有必要的。
### Android Studio 3.X 正则表达式更新指南 随着Android Studio不断迭代更新,开发者们在升级至新版本时往往需要对旧项目进行相应的配置调整。对于从早期版本升级到Android Studio 3.X的项目来说,一个常见的问题...
在这个“groovy代码-测试正则表达式”的主题中,我们将深入探讨Groovy中的正则表达式使用及其相关知识点。 1. **正则表达式基础** 正则表达式(Regular Expression)是由字符和特殊符号组成的模式,用于描述字符串...
本教程主要参考自网上最普遍的《正则表达式30分钟入门教程》(http://deerchao.net/tutorials/regex/regex.htm),这是一个非常优秀的教程,深入浅出讲解了正则表达式的基本概念,更加深入的内容可以参考CSDN上过客...
3. **代码生成**:对于多种编程语言(如Perl、PostgreSQL、Delphi Prism、Groovy和MySQL等,这些语言的配置文件在压缩包内均有体现),RegexBuddy可以自动生成对应的源代码片段,方便用户快速将正则表达式集成到项目...
了解 Groovy 对 Java 语法的简化变形,学习 Groovy 的核心功能,例如本地集合、内置正则表达式和闭包。编写第一个 Groovy 类,然后学习如何使用 JUnit 轻松地进行测试。借助功能完善的 Groovy 开发环境和使用技能,...
了解 Groovy 对 Java 语法的简化变形,学习 Groovy 的核心功能,例如本地集合、内置正则表达式和闭包。编写第一个 Groovy 类,然后学习如何使用 JUnit 轻松地进行测试。借助功能完善的 Groovy 开发环境和使用技能,...
这里的`regex`参数是一个正则表达式,用于定义分割字符串的规则。例如,如果你有一个逗号分隔的字符串,你可以这样做: ```groovy def str = "apple,banana,grape" def parts = str.split(",") println parts // ...
- **正则表达式**: Groovy内置了强大的正则表达式支持。 通过以上内容的学习,读者将能够了解Groovy的基本概念、主要特性和开发环境设置方法,同时也能掌握Groovy的基本语法和一些进阶技巧。这些知识点为后续深入...
- **内置正则表达式**:Groovy内置了对正则表达式的强大支持,无需导入额外库即可实现文本匹配和替换等操作,简化了字符串处理的复杂度。 - **闭包**:闭包是Groovy中一个非常强大的特性,允许函数作为第一类公民,...
Groovy内置了对正则表达式的支持,提供了大量的方法来匹配、查找和替换字符串中的模式。 异常处理: Groovy中的异常处理与Java类似,可以使用try-catch-finally语句块来捕获和处理异常。Groovy还提供了一些便捷的...
17. **运算符**:Groovy支持多种运算符,如算术运算符、条件运算符、字符串运算符、正则表达式运算符、集合运算符和对象运算符等。运算符重载、位运算符和复合赋值运算符也是重点。 18. **赋值运算符**:Groovy的...
此外,Groovy还内置了强大的正则表达式引擎,支持正则表达式的创建、匹配和替换,使得文本处理任务变得简单高效。 ### 数字操作 Groovy的数字类型同样强大,支持整数、浮点数、大整数、分数等多种类型,并提供了...