`
orcl_zhang
  • 浏览: 242161 次
  • 性别: Icon_minigender_1
  • 来自: 杭州
社区版块
存档分类
最新评论

Regular Expressions

    博客分类:
  • ruby
阅读更多
                               Regular expressions (“regexps”) match strings.
/abc/ =~ "abc"
                               When a match is successful, the return value
   0
֒→                             is the position of the first matching character.
                               An if construct will count a successful match as
puts 'match' if /abc/ =~ "abc"
                               true.
   match
֒→
                               The matching substring can be anywhere in the
/abc/ =~ "cbaabc"
                               string.
   3
֒→
                               When the string doesn’t match, the result is nil.
/abc/ =~ "ab!c"
   nil
֒→
                               There may be more than one match in the string.
/abc/ =~ "abc and abc"
                               Matching always returns the index of the first
   0
֒→                             match.
                               Case matters.
/cow/ =~ "Cow"
   nil
֒→
                               The regular expression doesn’t have to be on the
"foofarah" =~ /foo/
                               left.
   0
֒→

10.1 Special Characters
                        You can anchor the match to the beginning of
/^abc/ =~ "!abc"
                        the string with ˆ (the caret character, sometimes
   nil
֒→                      called “hat”).
                        You can also anchor the match to the end
/abc$/ =~ "abc!"
                        of the string with a dollar sign character,
   nil
֒→                      often abbreviated “dollar.” Special characters
                        like the caret and dollar are what make regular
                        expressions more powerful than something like
                        "string".include?("ing").

\d Any digit
\D Any character except a digit
\s “whitespace”: space, tab, carriage return, line feed, or newline
\S Anything except whitespace
\w A “word character”: [A-Za-z0-9_]
\W Any character except a word character
                 Figure 10.1: Character Classes

                               A period (“dot”) matches any character.
/a.c/ =~ "does abc match?"
   5
֒→
                               The asterisk character (“star”) matches any
/ab*c/ =~ "does abbbbc match?"
                               number of occurrences of the character preced-
   5
֒→                             ing it.
                               “Any number” includes zero.
/ab*c/ =~ "does ac match?"
   5
֒→
                               Frequently, you’ll want to match one or more
/ab+c/ =~ "does ac match?"
                               occurrence but not zero. That’s done with the
   nil
֒→                             plus character.
                               The question mark character matches zero or
/ab?c/ =~ "does ac match?"
                               one occurrences but not more than one.
   5
֒→
                               Special characters can be combined. The com-
/a.*b/ =~ "a ! b ! i j k b"
                               bination of a dot and star is used to match any
   0
֒→                             number of any kind of character.
                               To match all characters in a character class,
/[0123456789]+/ =~ "number 55"
                               enclose them within square brackets.

   7
֒→
                        Character classes containing alphabetically
/[0-9][a-f]/ =~ "5f"
                        ordered runs of characters can be abbreviated
   0
֒→                      with the dash.
                        Within brackets, characters like the dot, plus,
/[.]/ =~ "b"
                        and star are not special.
   nil
֒→
                        Outside of brackets, special characters can be
/\[a\]\+/ =~ "[a]+"
                        stripped of their powers by “escaping” them with
   0
֒→                      a backslash.
                        To include open and close brackets inside of
/^[\[=\]]+$/ =~ '=]=[='
                        brackets, escape them with a backslash. This
   0
֒→                      expression matches any sequence of one or more
                        characters, all of which must be either [, ], or =.
                        (The two anchors ensure that there are no char-
                        acters before or after the matching characters.)

                        Putting a caret at the beginning of a character
/[^ab]/ =~ "z"
                        class causes the set to contain all characters
   0
֒→                      except the ones listed.
                        Some character classes are so common they’re
/=\d=[x\d]=/ =~ "=5=x="
                        given abbreviations. \d is the same character
   0
֒→                      class as [0-9]. Other characters can be added
                        to the abbreviation, in which case brackets are
                        needed. See Figure 10.1, on the previous page,
                        for a complete list of abbreviations.
10.2 Grouping and Alternatives

                              Parentheses can group sequences of characters
/(ab)+/ =~ "ababab"
                              so that special characters apply to the whole
   0
֒→                            sequence.
                              Special characters can appear within groups.
/(ab*)+/ =~ "aababbabbb"
                              Here, the group containing one a and any num-
   0
֒→                            ber of b’s is repeated one or more times.
                              The vertical bar character is used to allow alter-
/a|b/ =~ "a"
                              natives. Here, either a or b match.
   0
֒→
                              A vertical bar divides the regular expression into
/^Fine birds|cows ate\.$/ =~
                              two smaller regular expressions. A match means
      "Fine birds ate seeds."
                              that either the entire left regexp matches or the
   0
֒→                            entire right one does.
                              This regular expression does not mean “Match
                              either 'Fine birds ate.' or 'Fine cows ate.'” It actu-
                              ally matches either a string beginning with "Fine
                              birds" or one ending in "cows ate."


                                 This regular expression matches only the two
/^Fine (birds|cows) ate\.$/ =~
                                 alternate sentences, not the infinite number of
       "Fine birds ate seeds."
                                 possibilities the previous example’s regexp does.
   nil
֒→
10.3 Taking Strings Apart
                                 Like the =~ operator, match returns nil if there’s
re = /(\w+), (\w+), or (\w+)/
                                 no match. If there is, it returns a MatchData
s = 'Without a Bob, ox, or bin!'
                                 object. You can pull information out of that
match = re.match(s)
                                 object.
֒→ #<MatchData:0x323c44>
                                 A MatchData is indexable. Its zeroth element is
match[0]
                                 the entire match.
֒→ "Bob, ox, or bin"
                                 Each following element stores the result of what
match[1]
                                 a group matched, counting from left to right.
֒→ "Bob"

                              Groups are often used to pull apart strings and
"#{match[3]} and #{match[1]}"
                              construct new ones.
֒→ "bin and Bob"
                              pre_match returns any portion of the string
match.pre_match
                              before the part that matched.
֒→ "Without a "
                              post_match returns any portion of the string
match.post_match
                              after the part that matched. match.pre_match,
֒→ "!"                        match[0], and match.post_match can be added
                              together to reconstruct the original string.
                              The plus and star special characters are greedy:
str = "a bee in my bonnet"
                              they match as many characters as they can.
/a.*b/.match(str)[0]
                              Expect that to catch you by surprise sometimes.
֒→ "a bee in my b"
                              You can make plus and star match as few char-
/a.*?b/.match(str)[0]
                              acters as they can by suffixing them with a ques-
֒→ "a b"                      tion mark.

                                 You can use a regular expression to slice a
"has 5 and 3" [/\d+/]
                                 string. The result is the first substring that
֒→ "5"                           matches the regular expression.
10.4 Variables Behind the Scenes
                                 Both =~ and match set some variables. All begin
re = /(\w+), (\w+), or (\w+)/
                                 with $. Each parenthesized group gets its own
s = 'Without a Bob, ox, or bin!'
                                 number, from $1 up through $9. You might
re =~ s
                                 expect $0 to name the entire string that matched,
[$1, $2, $3]
                                 but it’s already used for something else: the
֒→ ["Bob" , "ox" , "bin" ]       name of the program being executed.
                                 $& is the equivalent of match[0].
$&
֒→ "Bob, ox, or bin"
                                 These two variables are used to store the string
$‘ + $'
                                 before the match and the string after the match.
֒→ "Without a !"                 (The first is a backward quote / backtick; the
                                 second a normal quote.)


These variables are probably most often used to immediately do some-
thing with a string that’s “equal enough” to some pattern. Like this:
if name =~ /(.+), (.+)/
  name = "#{$2} #{$1}"
end
10.5 Regular Expression Options
                                  Normally, the period in a regular expression
/a.*b/ =~ "az\nzb"
                                  does not match the end-of-line character. There-
   nil
֒→                                fore, .* or .+ matches won’t span lines.

                        Adding the m (multiline) option makes a period
/a.*b/m =~ "az\nzb"
                        match end-of-line characters, so the regular
   0
֒→                      expression match can span lines.
                        This is a far too annoying way to do a case-
/[cC][aA][tT]/ =~ "Cat"
                        insensitive match.
   0
֒→
                        The i (insensitive) option is a better way.
/cat/i =~ "Cat"
   0
֒→
分享到:
评论

相关推荐

    Mastering Regular Expressions(3rd) 无水印pdf

    Mastering Regular Expressions(3rd) 英文无水印pdf 第3版 pdf所有页面使用FoxitReader和PDF-XChangeViewer测试都可以打开 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,...

    Introducing Regular Expressions pdf

    正则表达式(Regular Expressions)是一种强有力的文本匹配工具,用于在字符串中执行模式匹配和提取信息。从给出的文件内容来看,我们正在讨论一本关于正则表达式的电子书——《Introducing Regular Expressions》,...

    Mastering Regular Expressions(3rd Edition)

    《Mastering Regular Expressions》(第三版)是正则表达式领域的权威著作,由拥有近30年开发经验的专家Jeffrey E.F. Friedl撰写。这本书深入浅出地介绍了正则表达式的概念、语法以及实际应用,是编程者提升正则...

    SpeedTest_DelphiXE4 PerlRegEx 和 官方的 RegularExpressions 速度测试

    而Delphi自XE2版本起,内置了RegularExpressions组件,它基于.NET Framework的System.Text.RegularExpressions类库,提供了一套原生的正则表达式支持。虽然它可能没有PerlRegEx那么灵活,但对于大部分日常的正则...

    Wrox - Beginning Regular Expressions.rar

    《Wrox - Beginning Regular Expressions》是一本专为初学者设计的正则表达式入门教程。这本书深入浅出地介绍了正则表达式的基本概念、语法和应用,旨在帮助读者掌握这一强大的文本处理工具。 正则表达式(Regular ...

    Mastering Regular Expressions.pdf

    #### 标题:Mastering Regular Expressions - **主要内容**:本书深入探讨了正则表达式的高级用法和技术细节,旨在帮助读者掌握正则表达式的各个方面。 #### 描述:Mastering Regular Expressions.pdf - **内容...

    Regular Expressions Cookbook.pdf

    **"Regular Expressions Cookbook.pdf"** 这个标题明确指出本书的主题是正则表达式(Regular Expressions,简称 Regex)。正则表达式是一种强大的文本处理工具,被广泛应用于搜索、替换以及解析文本等任务中。...

    PCRE(Perl Compatible Regular Expressions)

    PCRE(Perl Compatible Regular Expressions)是一个Perl库,包括 perl 兼容的正规表达式库.这些在执行正规表达式模式匹配时用与Perl 5同样的语法和语义是很有用的。Boost太庞大了,使用boost regex后,程序的编译速度...

    Mastering Python Regular Expressions 无水印pdf

    Mastering Python Regular Expressions 英文无水印pdf pdf所有页面使用FoxitReader和PDF-XChangeViewer测试都可以打开 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请...

    Regular Expressions Cookbook, 2nd Edition

    正则表达式(Regular Expressions)是一种强大的文本处理工具,用于在字符串中执行搜索、替换、提取等操作,它是一种在计算机科学和编程领域广泛使用的工具。正则表达式被设计为一种模式,能够匹配一系列符合特定...

    Regular Expressions Pocket Primer

    To introduce readers to regular expressions in several technologies. While the material is primarily for people who have little or no experience with regular expressions, there is also some content ...

    pcre(Perl Compatible Regular Expressions)库源代码

    PCRE(Perl Compatible Regular Expressions)是一个Perl库,包括 perl 兼容的正则表达式库。这些在执行正规表达式模式匹配时用与Perl 5同样的语法和语义是很有用的。

    Mastering Regular Expressions, 3rd Edition

    书名:Mastering Regular Expressions, 3rd Edition 格式:CHM 语言:English 简介: Regular expressions are an extremely powerful tool for manipulating text and data. They are now standard ...

    SANS AUD 507 regularexpressions.pdf

    本部分内容主要介绍了正则表达式的相关知识,包括锚点、字符集、特殊字符、字符类、量词、模式修饰符、逃脱字符、正则表达式元字符、前后匹配、位置匹配等。 1. 锚点:锚点是正则表达式中的特殊字符,用于指定匹配...

Global site tag (gtag.js) - Google Analytics