Groovy正则表达式(转)

dky_rl

浏览: 67691 次
性别:
来自: 黑龙江

最近访客更多访客>>

goahead2010

tyzqqq

zyc254

suncong1024

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Groovy

正则表达式 Groovy J2SE Grails C#

Regular expressions

Groovy In Action.pdf

Symbol

Meaning

Any character

Start of line (or start of document, when in single-line mode)

End of line (or end of document, when in single-line mode)

Digit character

Any character except digits

Whitespace character

Any character except whitespace

Word character

Any character except word characters

Word boundary

()

Grouping

(x|y)

x or y as in (Groovy|Java|Ruby)

Backmatch to group one; for example, find doubled characters with (.)\1

Zero or more occurrences of x

One or more occurrences of x

Zero or one occurrence of x

x{m,n}

At least m and at most n occurrences of x

x{m}

Exactly m occurrences of x

[a-f]

Character class containing the characters a, b, c, d, e, f

[^a]

Character class containing any character except a

[aeiou]

Character class representing lowercase vowels(元音)

[a-z&&[^aeiou]]

Lowercase consonants协调一致的

[a-zA-Z0-9]

Uppercase or lowercase letter or digit

[+|-]?(\d+(\.\d*)?)|(\.\d+)

Positive or negative floating-point number

^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$

Simple email validation

(?is:x)

Switches mode when evaluating x; i turns on ignoreCase, s is single-line mode

(?=regex)

Positive lookahead正向前瞻

(?<=text)

Positive lookbehind

Beginning Groovy and Grails (Jun 2008).pdf

Table 2-1. Summary of Regular-Expression Constructs

Construct Matches

Characters

The character x

The backslash character

The tab character (\u0009)

The newline (line feed) character (\u000A)

The carriage return character (\u000D)

The form feed character (\u000C)

The escape character (\u001B)

Character Classes

[abc]

a, b, or c (simple class)

[^abc]

Any character except a, b, or c (negation)

[a-zA-Z]

a through z or A through Z, inclusive (range)

[a-d[m-p]]

a through d, or m through p: [a-dm-p] (union)

[a-z&&[def]]

d, e, or f (intersection)

[a-z&&[^bc]]

a through z, except for b and c: [ad-z] (subtraction)

[a-z&&[^m-p]]

a through z, and not m through p: [a-lq-z] (subtraction)

Predefined Character Classes

Any character (may or may not match line terminators)

A digit: [0-9]

A nondigit: [^0-9]

A whitespace character: [\t\n\x0B\f\r]

A non-whitespace character: [^\s]

A word character: [a-zA-Z_0-9]

A nonword character: [^\w]

Boundary Matchers

The beginning of a line

The end of a line

A word boundary

A nonword boundary

The beginning of the input

The end of the previous match

The end of the input but for the final terminator, if any

The end of the input

Greedy Quantifiers

X, once or not at all

X, zero or more times

X+

X, one or more times

X{n}

X, exactly n times

X{n,}

X, at least n times

X{n,m}

X, at least n but not more than m times

Reluctant Quantifiers

X??

X, once or not at all

X*?

X, zero or more times

X+?

X, one or more times

X{n}?

X, exactly n times

X{n,}?

X, at least n times

X{n,m}?

X, at least n but not more than m times

Possessive Quantifiers

X?+

X, once or not at all

X*+

X, zero or more times

X++

X, one or more times

X{n}+

X, exactly n times

X{n,}+

X, at least n times

X{n,m}+

X, at least n but not more than m times

Logical Operators

X followed by Y

X|Y

Either X or Y

(X)

X, as a capturing group

http://groovy.codehaus.org/Regular+Expressions

http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html

http://groovy.codehaus.org/Tutorial+4+-+Regular+expressions+basics and

http://groovycodehaus.org/Tutorial+5+-+Capturing+regex+groups

For a complete list of regular expressions, see http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html .

some text --Exactly “some text”.

some\s+text --The word “some” followed by one or more whitespace characters followed by the word “text”.

^\d+(\.\d+)? (.*) -Our introductory example: headings of level one or two. ^ denotes a line start, \d a digit,

\d+ one or more digits. Parentheses are used for grouping. The question mark makes the first group optional. The second group contains the title, made of a dot for any character and a star for any number of such characters.

\d\d/\d\d/\d\d\d\d --A date formatted as exactly two digits followed by slash, two more digits followed

by a slash, followed by exactly four digits.

$ 的作用：

def reg1 = ~/^[A-Z]{1}[a-zA-Z0-9]+$/

assert "Ad1WRldd" =~ reg1 true

assert "Ad1@#!ldd" =~ reg1 false

但如果没有 $

def reg1 = ~/^[A-Z]{1}[a-zA-Z0-9]+$/

assert "Ad1@#!ldd" =~ reg1 true

groovy正则表达式（google 笔记本中…）

Sometimes the slashy syntax interferes with other valid Groovy expressions such as line comments (注释//)or numerical expressions with multiple slashes for division(除号/). When in doubt, put parentheses () around your pattern like (/pattern/). Parentheses force the parser to interpret the content as an expression.

■ The regex find operator =~

■ The regex match operator ==~

■ The regex pattern operator ~String

assert "abc" == /abc/

assert "\\d" == /\d/

def reference = "hello"

assert reference == /$reference/

assert "\$" == /$/

twister = 'she sells sea shells at the sea shore of seychelles'

// twister must contain a substring of size 3

// that starts with s and ends with a

assert twister =~ /s.a/ ß Regex find operator as usable in if

finder = (twister =~ /s.a/) ß Find expression evaluates to a matcher object

assert finder instanceof java.util.regex.Matcher

// twister must contain only words delimited by single spaces

assert twister ==~ /(\w+ \w+)*/ ß Regex match operator

WORD = /\w+/

matches = (twister ==~ /($WORD $WORD)*/) ß Match expression evaluates to a Boolean

assert matches instanceof java.lang.Boolean

assert (twister ==~ /s.e/) == false ß Match is full, not partial like find

wordsByX = twister.replaceAll(WORD, 'x')

assert wordsByX == 'x x x x x x x x x x'

words = twister.split(/ /) ß Split returns a list of words

assert words.size() == 10

assert words[0] == 'she'

使用建议：

■ When things get complex (note, this is when, not if), comment verbosely.

■ Use the slashy syntax instead of the regular string syntax, or you will get lost in a forest of backslashes.

■ Don’t let your pattern look like a toothpick puzzle. Build your pattern from subexpressions like WORD in listing 3.5.

■ Put your assumptions to the test. Write some assertions or unit tests to test your regex against static strings. Please don’t send us any more flowers for this advice; an email with the subject “assertion saved my life today” will suffice.

"\$abc." =~ /\$(.*)\./

//"\$abc."=~ \\\$(.*)\\.

'''${abc.}tttt${abc.}''' =~ /\$\{(.*)\.\}/

ddd = ~/\$\{(\w*)\.\}/

println ddd.class <--class java.util.regex.Pattern, 显示构造pattern 对象(同时构建相应的状态机(参考性能一节))

regex = /\$\{(\w*)\.\}/ <-- () 分组?

println regex.class <-- class java.lang.String，匹配时(=~ )隐式构建pattern 对象

str = '''${abc.}tttt${def.}'''

str.eachMatch(regex){match->

println match[0] <-- ${abc.} abc ${def.} def

println match[1]

}

regex2 = /\$\{\w*\.\}/ <-- 去掉了 () ok

cloze = str.replaceAll(regex2){ ch ->

println "all match = "+ch

ch=44

}

println "cloze = "+cloze <-- all match = ${abc.} all match = ${def.} cloze = 44tttt44234

def message = '${Hello}, ${world}'

def reg = '\\$\\{\\w*\\}'

def reg2= /\$\{\w*\}/

def rs = message.replaceAll(reg) { ch ->

println "all match = "+ch

ch=11

}

println "rs = "+rs

matcher = 'a b c' =~ /\S/

assert matcher[0] == 'a'

assert matcher[1..2] == 'bc'

assert matcher.count == 3

matcher = 'a:1 b:2 c:3' =~ /(\S+):(\S+)/ ß 注意这两例子中的区别，分组的matcher 值是不同的

assert matcher.hasGroup()

assert matcher[0] == ['a:1', 'a', '1']

('xy' =~ /(.)(.)/).each { all, x, y -> ß 也可以打印出来

assert all == 'xy'

assert x == 'x'

assert y == 'y'

}

myFairStringy = 'The rain in Spain stays mainly in the plain!'

// words that end with 'ain': \b\w*ain\b

BOUNDS = /\b/

rhyme = /$BOUNDS\w*ain$BOUNDS/

found = ''

myFairStringy.eachMatch(rhyme) { match -> string.eachMatch (pattern_string)

found += match[0] + ' '

}

assert found == 'rain Spain plain '

found = ''

(myFairStringy =~ rhyme).each { match -> matcher.each(closure)

found += match + ' '

}

assert found == 'rain Spain plain '

cloze = myFairStringy.replaceAll(rhyme){ it-'ain'+'___' } string.replaceAll (pattern_string, closure)

assert cloze == 'The r___ in Sp___ stays mainly in the pl___!'

Ø 性能

Listing 3.7 Increase performance with pattern reuse.

twister = 'she sells sea shells at the sea shore of seychelles'

// some more complicated regex:

// word that starts and ends with same letter

regex = /\b(\w)\w*\1\b/

start = System.currentTimeMillis()

100000.times{

twister =~ regex ß Find operator with implicit pattern construction

}

first = System.currentTimeMillis() – start

start = System.currentTimeMillis()

pattern = ~regex ß Explicit pattern construction

100000.times{

pattern.matcher(twister) ß Apply the pattern on a String

}

second = System.currentTimeMillis() - start

assert first > second * 1.20

模式操作符(pattern operator) ~String

The pattern operator 把字符串转换成java.util.regex.Pattern 对象. For a given string, this pattern object can be asked for a matcher object.

The rationale behind this construction is that patterns are internally backed by a so-called finite state machine that does all the high-performance magic. This machine is compiled when the pattern object is created(当模式对象创建的时候这个有限状态机就编译完成). The more complicated the pattern, the longer the creation takes. In contrast, the matching process as performed by the machine is extremely fast.

模式操作符可以把模式创建时间从模式匹配时分离出来，这样可以达到reuse 有限状态机的目的。因此，

由上面例子可以看出，first 是在每次匹配时构造一次pattern，而second在匹配前显示pattern-creation , 可以节省时间

Ø Patterns for classification

Pattern object 有个 isCase(String ) 方法

Listing 3.8 Patterns in grep() and switch()

assert (~/..../).isCase('bear')

switch('bear'){

case ~/..../ : assert true; break

default : assert false

}

beasts = ['bear','wolf','tiger','regex']

assert beasts.grep(~/..../) == ['bear','wolf']

TIP Classifications read nicely in switch and grep. The direct use of classifier.isCase(candidate) happens rarely, but when it does, it is best read from right to left: “candidate is a case of classifier”.

分享到：