`

正则表达式语法规则收集

阅读更多
 turnmissile 的 Blog http://blog.csdn.net/turnmissile/

Microsoft已经把正则表达式的规则收录在了msdn里面了,有兴趣的朋友可以自己去研究一下(ms-help://MS.MSDNQTR.2003OCT.1033/cpgenref/html/cpconRegularExpressionsLanguageElements.htm),这里罗列一些我找到的语法元素功能表,大家自己研究吧!

<!----><o:p> </o:p>

转意字符表

Escaped character<o:p></o:p>

Description<o:p></o:p>

ordinary characters

Characters other than . $ ^ { [ ( | ) * + ? \ match themselves.

\a

Matches a bell (alarm) \u0007.

\b

Matches a backspace \u0008 if in a [] character class; otherwise, see the note following this table.

\t

Matches a tab \u0009.

\r

Matches a carriage return \u000D.

\v

Matches a vertical tab \u000B.

\f

Matches a form feed \u000C.

\n

Matches a new line \u000A.

\e

Matches an escape \u001B.

\040

Matches an ASCII character as octal (up to three digits); numbers with no leading zero are backreferences if they have only one digit or if they correspond to a capturing group number. (For more information, see Backreferences.) For example, the character \040 represents a space.

\x20

Matches an ASCII character using hexadecimal representation (exactly two digits).

\cC

Match+es an ASCII control character; for example, \cC is control-C.

\u0020

Matches a Unicode character using hexadecimal representation (exactly four digits).

\

When followed by a character that is not recognized as an escaped character, matches that character. For example, \* is the same as \x2A.

Note   The escaped character \b is a special case. In a regular expression, \b denotes a word boundary (between \w and \W characters) except within a [] character class, where \b refers to the backspace character. In a replacement pattern, \b always denotes a backspace.

<o:p> </o:p>

字符集

A character class is a set of characters that will find a match if any one of the characters included in the set matches. The following table summarizes character matching syntax.<o:p></o:p>

Character class<o:p></o:p>

Description<o:p></o:p>

.

Matches any character except \n. If modified by the Singleline option, a period character matches any character. For more information, see Regular Expression Options.

[aeiou]

Matches any single character included in the specified set of characters.

[^aeiou]

Matches any single character not in the specified set of characters.

[0-9a-fA-F]

Use of a hyphen () allows specification of contiguous character ranges.

\p{name}

Matches any character in the named character class specified by {name}. Supported names are Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, IsBoxDrawing.

\P{name}

Matches text not included in groups and block ranges specified in {name}.

\w

Matches any word character. Equivalent to the Unicode character categories
[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \w is equivalent to [a-zA-Z_0-9].

\W

Matches any nonword character. Equivalent to the Unicode categories [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \W is equivalent to [^a-zA-Z_0-9].

\s

Matches any white-space character. Equivalent to the Unicode character categories [\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \s is equivalent to [ \f\n\r\t\v].

\S

Matches any non-white-space character. Equivalent to the Unicode character categories [^\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \S is equivalent to [^ \f\n\r\t\v].

\d

Matches any decimal digit. Equivalent to \p{Nd} for Unicode and [0-9] for non-Unicode, ECMAScript behavior.

\D

Matches any nondigit. Equivalent to \P{Nd} for Unicode and [^0-9] for non-Unicode, ECMAScript behavior.

You can find the Unicode category a character belongs to with the method <o:p></o:p>

<o:p> </o:p>

正则表达式选项

and ECMAScript are not allowed inline.<o:p></o:p>

RegexOption member<o:p></o:p>

Inline character<o:p></o:p>

Description<o:p></o:p>

None

N/A

Specifies that no options are set.

IgnoreCase

i

Specifies case-insensitive matching.

Multiline

m

Specifies multiline mode. Changes the meaning of ^ and $ so that they match at the beginning and end, respectively, of any line, not just the beginning and end of the whole string.

ExplicitCapture

n

Specifies that the only valid captures are explicitly named or numbered groups of the form (?<name>…). This allows parentheses to act as noncapturing groups without the syntactic clumsiness of (?:…).

Compiled

N/A

Specifies that the regular expression will be compiled to an assembly. Generates Microsoft intermediate language (MSIL) code for the regular expression; yields faster execution at the expense of startup time.

Singleline

s

Specifies single-line mode. Changes the meaning of the period character (.) so that it matches every character (instead of every character except \n).

IgnorePatternWhitespace

x

Specifies that unescaped white space is excluded from the pattern and enables comments following a number sign (#). (For a list of escaped white-space characters, see Character Escapes.) Note that white space is never eliminated from within a character class.

RightToLeft

N/A

Specifies that the search moves from right to left instead of from left to right. A regular expression with this option moves to the left of the starting position instead of to the right. (Therefore, the starting position should be specified as the end of the string instead of the beginning.) This option cannot be specified in midstream, to prevent the possibility of crafting regular expressions with infinite loops. However, the (?<) lookbehind constructs provide something similar that can be used as a subexpression.

RightToLeft changes the search direction only. It does not reverse the substring that is searched for. The lookahead and lookbehind assertions do not change: lookahead looks to the right; lookbehind looks to the left.<o:p></o:p>

ECMAScript

N/A

Specifies that ECMAScript-compliant behavior is enabled for the expression. This option can be used only in conjunction with the IgnoreCase and Multiline flags. Use of ECMAScript with any other flags results in an exception.

CultureInvariant

N/A

Specifies that cultural differences in language is ignored. See Performing Culture-Insensitive Operations in the RegularExpressions Namespace for more information.

<o:p> </o:p>

<o:p> </o:p>

Atomic Zero-Width Assertions

Assertion<o:p></o:p>

Description<o:p></o:p>

^

Specifies that the match must occur at the beginning of the string or the beginning of the line. For more information, see the Multiline option in Regular Expression Options.

$

Specifies that the match must occur at the end of the string, before \n at the end of the string, or at the end of the line. For more information, see the Multiline option in Regular Expression Options.

\A

Specifies that the match must occur at the beginning of the string (ignores the Multiline option).

\Z

Specifies that the match must occur at the end of the string or before \n at the end of the string (ignores the Multiline option).

\z

Specifies that the match must occur at the end of the string (ignores the Multiline option).

\G

Specifies that the match must occur at the point where the previous match ended. When used with Match.NextMatch(), this ensures that matches are all contiguous.

\b

Specifies that the match must occur on a boundary between \w (alphanumeric) and \W (nonalphanumeric) characters. The match must occur on word boundaries — that is, at the first or last characters in words separated by any nonalphanumeric characters.

\B

Specifies that the match must not occur on a \b boundary.

<o:p> </o:p>

<o:p> </o:p>

<o:p> </o:p>

数量

Quantifier<o:p></o:p>

Description<o:p></o:p>

*

Specifies zero or more matches; for example, \w* or (abc)*. Equivalent to {0,}.

+

Specifies one or more matches; for example, \w+ or (abc)+. Equivalent to {1,}.

?

Specifies zero or one matches; for example, \w? or (abc)?. Equivalent to {0,1}.

{n}

Specifies exactly n matches; for example, (pizza){2}.

{n,}

Specifies at least n matches; for example, (abc){2,}.

{n,m}

Specifies at least n, but no more than m, matches.

*?

Specifies the first match that consumes as few repeats as possible (equivalent to lazy *).

+?

Specifies as few repeats as possible, but at least one (equivalent to lazy +).

??

Specifies zero repeats if possible, or one (lazy ?).

{n}?

Equivalent to {n} (lazy {n}).

{n,}?

Specifies as few repeats as possible, but at least n (lazy {n,}).

{n,m}?

Specifies as few repeats as possible between n and m (lazy {n,m}).

<o:p> </o:p>

<o:p> </o:p>

组构造

Grouping constructs allow you to capture groups of subexpression

分享到:
评论

相关推荐

    正则表达式 正则表达式资料 正则表达式教程

    Ghost_Eye在其CSDN博客中分享了一些自己编写的正则表达式案例以及收集的优秀示例。这些内容对于那些正在寻找具体问题解决方案的开发者来说特别有用。 ### 三、如何高效学习正则表达式 #### 3.1 多实践 理论知识...

    正则表达式工具RegexBuddy3

    6. **兼容多种编程语言**:RegexBuddy3支持多种编程语言的正则表达式语法,如Java, JavaScript, .NET, PHP, Python等,这意味着你可以直接在工具中创建的正则表达式无缝应用到实际项目中。 7. **代码生成**:根据所...

    精通正则表达式~~~

    对未知正则表达式进行语法检查... 475 递归的正则表达式... 475 匹配嵌套括号内的文本... 475 不能回溯到递归调用之内... 477 匹配一组嵌套的括号... 478 PHP效率... 478 模式修饰符S:“研究”. 478 扩展...

    我收集的groovy 正则表达式参考

    本文将详细介绍Groovy中的正则表达式语法及其应用场景。 #### 二、基本符号及含义 Groovy中的正则表达式遵循标准的正则表达式规则,但也有其独特的特性。以下是一些常用的符号及其含义: 1. **`.`**:表示任何...

    易语言正则表达式分离汉字英文数字

    正则表达式(Regular Expression)是模式匹配的一种表示方式,能够有效地描述一组字符串的共同特征,用于在文本中查找、替换或提取符合特定规则的字符串。在易语言中,我们可以通过编写特定的正则表达式来实现汉字、...

    正则表达式资料 - 到处收集

    正则表达式的学习曲线可能会有些陡峭,因为它的语法复杂且强大。然而,一旦掌握,它将成为你处理文本问题的强大武器。通过实践和不断练习,你可以更熟练地运用正则表达式来解决各种文本处理问题,例如从大型日志文件...

    C#中常见正则表达式收集

    通过深入学习正则表达式的语法和使用,开发者可以创建更复杂、更精确的验证规则,提高程序的健壮性和用户体验。同时,C#提供的`System.Text.RegularExpressions`命名空间提供了丰富的正则操作方法,如`Match`、`...

    正则表达式收集

    总的来说,"正则表达式收集"这个主题涵盖的内容广泛,包括正则表达式的语法、使用技巧、常见应用场景以及与工具(如UEditor)的结合使用。学习并熟练掌握正则表达式对于任何IT从业者来说都是非常有价值的技能。

    无私分享(C#高级编程第6版):第08章字符串和正则表达式[收集].pdf

    正则表达式的语法包括元字符(如.、*、+、?、^、$等)、字符集、重复限定符、分组和命名捕获等。在C#中,主要通过Regex类的静态方法来使用正则表达式,如Match、Matches、Replace、Split等。 **8.3 小结** 这一章...

    JavaScript常用函数数、常用正则表达式收集___下载.zip

    在这个"JavaScript常用函数数、常用正则表达式收集___下载.zip"压缩包中,我们可以预见到一系列与JavaScript相关的实用工具函数和常用的正则表达式模式。 首先,让我们来探讨JavaScript中的常用函数。在JavaScript...

    js正则表达式 例子

    ### JavaScript正则表达式示例解析 #### 一、引言 正则表达式是文本匹配的一种强大工具,在JavaScript中广泛应用于数据验证、文本...掌握这些基本的正则表达式结构和语法,对于提高开发效率和代码质量都具有重要意义。

    ES6、正则表达式、类、函数.rar

    正则表达式是字符串处理中的强大工具,而类和函数则是面向对象编程的基础。下面我们将深入探讨这些知识点。 **ES6** 1. **let 和 const 声明**:ES6 引入了新的变量声明方式,`let` 可以在同一个作用域内多次声明,...

    匹配form表单中所有内容的正则表达式

    但需要注意的是,在不同的编程语言和环境中,正则表达式的具体语法可能会有所不同,某些特殊字符可能需要进行转义。 最后,值得一提的是,在处理复杂的HTML文档时,正则表达式虽然强大,但并不是最佳工具,尤其当...

    向大家推荐一个收集整理正则表达式的网站

    正则表达式的语法包含多种特殊字符和结构,例如: 1. **基本字符**:包括字母、数字、空格等常见字符。 2. **元字符**:如`.`代表任意字符,`^`表示行首,`$`表示行尾,`\d`代表数字,`\w`代表字母数字字符,`\s`...

    正则表达式初运用之认证界面的实现代码

    正则表达式是处理字符串的强大工具,它能通过定义一套规则来匹配字符串中的特定部分。在网页开发中,正则表达式经常被用于验证用户输入是否符合一定的格式要求。例如,在用户注册或登录时,为了确保用户名、密码、...

    编译原理C++语法分析器[收集].pdf

    本文档中,正则表达式被用于描述C++语言的语法规则,包括变量声明、函数声明、语句等。同时,本文档还介绍了确定化(化简)后的状态转换图,用于描述正则表达式的执行过程。 LL(1)分析是编译原理中的一种语法分析...

    网页爬虫好用

    正则表达式是文本处理中的强大语法,能够帮助用户精确匹配和提取网页中的特定数据。 PClawer 的核心功能在于其网页抓取能力。它能够遍历互联网上的网页,按照用户设定的规则下载和存储所需内容。在实际应用中,用户...

Global site tag (gtag.js) - Google Analytics