`

Java (java.util.regex)

阅读更多

1.4 Java (java.util.regex)

Java 1.4 supports regular expressions with Sun's java.util.regex package. Although there are competing packages available for previous versions of Java, Sun is poised to become the standard. Sun's package uses a Traditional NFA match engine. For an explanation of the rules behind a Traditional NFA engine, see Section 1.2 .

1.4.1 Supported Metacharacters

java.util.regex supports the metacharacters and metasequences listed in Table 1-10 through Table 1-14 . For expanded definitions of each metacharacter, see Section 1.2.1 .

Table 1-10. Character representations

Sequence

Meaning

\a

Alert (bell).

\b

Backspace, x08 , supported only in character class.

\e

ESC character, x1B .

\n

Newline, x0A .

\r

Carriage return, x0D .

\f

Form feed, x0C .

\t

Horizontal tab, x09 .

\0 octal

Character specified by a one-, two-, or three-digit octal code.

\x hex

Character specified by a two-digit hexadecimal code.

\u hex

Unicode character specified by a four-digit hexadecimal code.

\c char

Named control character.

Table 1-11. Character classes and class-like constructs

Class

Meaning

[...]

A single character listed or contained in a listed range.

[^...]

A single character not listed and not contained within a listed range.

.

Any character, except a line terminator (unless DOTALL mode).

\w

Word character, [a-zA-Z0-9_] .

\W

Non-word character, [^a-zA-Z0-9_] .

\d

Digit, [0-9] .

\D

Non-digit, [^0-9] .

\s

Whitespace character, [ \t\n\f\r\x0B] .

\S

Non-whitespace character, [^ \t\n\f\r\x0B] .

\p{ prop }

Character contained by given POSIX character class, Unicode property, or Unicode block.

\P{ prop }

Character not contained by given POSIX character class, Unicode property, or Unicode block.

Table 1-12. Anchors and other zero-width tests

Sequence

Meaning

^

Start of string, or after any newline if in MULTILINE mode.

\A

Beginning of string, in any match mode.

$

End of string, or before any newline if in MULTILINE mode.

\Z

End of string but before any final line terminator, in any match mode.

\z

End of string, in any match mode.

\b

Word boundary.

\B

Not-word-boundary.

\G

Beginning of current search.

(?= ... )

Positive lookahead.

(?! ... )

Negative lookahead.

(?<= ... )

Positive lookbehind.

(?<! ... )

Negative lookbehind.

Table 1-13. Comments and mode modifiers

Modifier/sequence

Mode character

Meaning

Pattern.UNIX_LINES

d

Treat \n as the only line terminator.

Pattern.DOTALL

s

Dot (.) matches any character, including a line terminator.

Pattern.MULTILINE

m

^ and $ match next to embedded line terminators.

Pattern.COMMENTS

x

Ignore whitespace and allow embedded comments starting with # .

Pattern.CASE_INSENSITIVE

i

Case-insensitive match for ASCII characters.

Pattern.UNICODE_CASE

u

Case-insensitive match for Unicode characters.

Pattern.CANON_EQ

 

Unicode "canonical equivalence" mode where characters or sequences of a base character and combining characters with identical visual representations are treated as equals.

(? mode )

 

Turn listed modes (idmsux ) on for the rest of the subexpression.

(?- mode )

 

Turn listed modes (idmsux ) off for the rest of the subexpression.

(? mode :...)

 

Turn listed modes (idmsux ) on within parentheses.

(?- mode :...)

 

Turn listed modes (idmsux ) off within parentheses.

#.. .

 

Treat rest of line as a comment in /x mode.

Table 1-14. Grouping, capturing, conditional, and control

Sequence

Meaning

(...)

Group subpattern and capture submatch into \1 ,\2 ,... and $1 , $2 ,....

\ n

Contains text matched by the n th capture group.

$ n

In a replacement string, contains text matched by the n th capture group.

(?:...)

Groups subpattern, but does not capture submatch.

(?> ... )

Disallow backtracking for text matched by subpattern.

... |...

Try subpatterns in alternation.

*

Match 0 or more times.

+

Match 1 or more times.

?

Match 1 or 0 times.

{ n }

Match exactly n times.

{ n ,}

Match at least n times.

{ x ,y }

Match at least x times, but no more than y times.

*?

Match 0 or more times, but as few times as possible.

+?

Match 1 or more times, but as few times as possible.

??

Match 0 or 1 times, but as few times as possible.

{ n ,}?

Match at least n times, but as few times as possible.

{ x ,y }?

Match at least x times, no more than y times, and as few times as possible.

*+

Match 0 or more times, and never backtrack.

++

Match 1 or more times, and never backtrack.

?+

Match 0 or 1 times, and never backtrack.

{ n }+

Match at least n times, and never backtrack.

{ n ,}+

Match at least n times, and never backtrack.

{ x ,y }+

Match at least x times, no more than y times, and never backtrack.

1.4.2 Regular Expression Classes and Interfaces

Java 1.4 introduces two main classes, java.util.regex.Pattern and java.util.regex.Matcher ; an exception, java.util.regex.PatternSyntaxException ; and a new interface, CharSequence . Additionally, Sun upgraded the String class to implement the CharSequence interface and to provide basic pattern-matching methods. Pattern objects are compiled regular expressions that can be applied to many strings. A Matcher object is a match of one Pattern applied to one string (or any object implementing CharSequence ).

Backslashes in regular expression String literals need to be escaped. So \n (newline) becomes \\n when used in a Java String literal that is to be used as a regular expression.

java.lang.String

Description

New methods for pattern matching.

Methods

boolean matches (String regex )

Return true if regex matches the entire String .

String[ ] split (String regex )

Return an array of the substrings surrounding matches of regex .

String [ ] split (String regex , int limit )

Return an array of the substrings surrounding the first limit -1 matches of regex .

String replaceFirst (String regex , String replacement )

Replace the substring matched by regex with replacement .

String replaceAll (String regex , String replacement )

Replace all substrings matched by regex with replacement .

java.util.regex.Pattern

extends Object and implements Serializable

Description

Models a regular expression pattern.

Methods

static Pattern compile(String regex )

Construct a Pattern object from regex .

static Pattern compile(String regex , int flags )

Construct a new Pattern object out of regex and the OR'd mode-modifier constants flags .

int flags( )

Return the Pattern 's mode modifiers.

Matcher matcher(CharSequence input )

Construct a Matcher object that will match this Pattern against input .

static boolean matches(String regex , CharSequence input )

Return true if regex matches the entire string input .

String pattern( )

Return the regular expression used to create this Pattern .

String[ ] split(CharSequence input )

Return an array of the substrings surrounding matches of this Pattern in input .

String[ ] split(CharSequence input , int limit )

Return an array of the substrings surrounding the first limit matches of this pattern in regex .

java.util.regex.Matcher

extends Object

Description

Models a regular expression pattern matcher and pattern matching results.

Methods

Matcher appendReplacement(StringBuffer sb , String replacement )

Append substring preceding match and replacement to sb .

StringBuffer appendTail(StringBuffer sb )

Appends substring following end of match to sb .

int end( )

Index of the first character after the end of the match.

int end(int group )

Index of the first character after the text captured by group .

boolean find( )

Find the next match in the input string.

boolean find(int start )

Find the next match after character position, start .

String group( )

Text matched by this Pattern .

String group(int group )

Text captured by capture group, group .

int groupCount( )

Number of capturing groups in Pattern .

boolean lookingAt( )

True if match is at beginning of input.

boolean matches( )

Return true if Pattern matches entire input string.

Pattern pattern( )

Return Pattern object used by this Matcher .

String replaceAll(String replacement )

Replace every match with replacement .

String replaceFirst(String replacement )

Replace first match with replacement .

Matcher reset( )

Reset this matcher so that the next match starts at the beginning of the input string.

Matcher reset(CharSequence input )

Reset this matcher with new input .

int start( )

Index of first character matched.

int start(int group )

Index of first character matched in captured substring, group .

java.util.regex.PatternSyntaxException

implements Serializable

Description

Thrown to indicate a syntax error in a regular expression pattern.

Methods

PatternSyntaxException(String desc , String regex , int index )

Construct an instance of this class.

String getDescription( )

Return error description.

int getIndex( )

Return error index.

String getMessage( )

Return a multiline error message containing error description, index, regular expression pattern, and indication of the position of the error within the pattern.

String getPattern( )

Return the regular expression pattern that threw the exception.

java.lang.CharSequence

implemented by CharBuffer, String, StringBuffer

Description

Defines an interface for read-only access so that regular expression patterns may be applied to a sequence of characters.

Methods

char charAt(int index )

Return the character at the zero-based position, index .

int length( )

Return the number of characters in the sequence.

CharSequence subSequence(int start , int end )

Return a subsequence including the start index and excluding the end index.

String toString( )

Return a String representation of the sequence.

1.4.3 Unicode Support

This package supports Unicode 3.0, although \w , \W , \d , \D , \s , and \S support only ASCII. You can use the equivalent Unicode properties \p{L} , \P{L} , \p{Nd} , \P{Nd} , \p{Z} , and \P{Z} . The word boundary sequences, \b and \B , do understand Unicode.

For supported Unicode properties and blocks, see Table 1-2 . This package supports only the short property names, such as \p{Lu} , and not \p{Lowercase_Letter} . Block names require the In prefix and support only the name form without spaces or underscores; for example, \p{InGreekExtended} , not \p{In_Greek_Extended} or \p{In Greek Extended} .

1.4.4 Examples

Example 1-5. Simple match
//Match Spider-Man, Spiderman, SPIDER-MAN, etc.
public class StringRegexTest {
  public static void main(String[  ] args) throws Exception {
    String dailybugle = "Spider-Man Menaces City!";

    //regex must match entire string
    String regex = "(?i).*spider[- ]?man.*";
  
    if (dailybugle.matches(regex)) {
      //do something
    }  
  }
}

Example 1-6. Match and capture group
//Match dates formatted like MM/DD/YYYY, MM-DD-YY,...
import java.util.regex.*;

public class MatchTest {
  public static void main(String[  ] args) throws Exception {
    String date = "12/30/1969";
    Pattern p = 
      Pattern.compile("(\\d\\d)[-/](\\d\\d)[-/](\\d\\d(?:\\d\\d)?)");

    Matcher m = p.matcher(date);

    if (m.find(  )) {
      String month = m.group(1);
      String day   = m.group(2);
      String year  = m.group(3);
    }
  } 
}

Example 1-7. Simple substitution
//Convert <br> to <br /> for XHTML compliance
import java.util.regex.*;

public class SimpleSubstitutionTest {
  public static void main(String[  ] args) {
    String text = "Hello world. <br>";
    
    try {
      Pattern p = Pattern.compile("<br>", Pattern.CASE_INSENSITIVE);
      Matcher m = p.matcher(text);

      String result = m.replaceAll("<br />");
    }
    catch (PatternSyntaxException e) {
      System.out.println(e.getMessage(  ));
    }
    catch (Exception e) { System.exit(  ); }

  }

}

Example 1-8. Harder substitution
//urlify - turn URL's into HTML links
import java.util.regex.*;

public class Urlify {
  public static void main (String[  ] args) throws Exception {
   String text = "Check the website, http://www.oreilly.com/catalog/repr.";
   String regex =                                                
        "\\b                         # start at word\n" 
     +  "                            # boundary\n"
     +  "(                           # capture to $1\n"
     +  "(https?|telnet|gopher|file|wais|ftp) : \n"
     +  "                            # resource and colon\n"
     +  "[\\w/\\#~:.?+=&%@!\\-] +?   # one or more valid\n"
     +  "                            # characters\n"
     +  "                            # but take as little\n" 
     +  "                            # as possible\n"
      +  ")\n"                                                               
     +  "(?=                         # lookahead\n"
     +  "[.:?\\-] *                  # for possible punc\n"
     +  "(?: [^\\w/\\#~:.?+=&%@!\\-] # invalid character\n"
     +  "| $ )                       # or end of string\n"  
     +  ")";

    Pattern p = Pattern.compile(regex,  
        Pattern.CASE_INSENSITIVE + Pattern.COMMENTS);
    Matcher m = p.matcher(text);
    String result = m.replaceAll("<a href=\"$1\">$1</a>");
  } 
}

1.4.5 Other Resources

分享到:
评论

相关推荐

    java 正则表达试

    jakarta-oro.jar 及代码 import org.apache.oro.text.regex.MalformedPatternException; import org.apache.oro.text.regex.MatchResult; import org.apache.oro.text.regex...import org.apache.oro.text.regex.Util;

    java 正则表达式应用jar包 regex-smart.jar

    在Java中,正则表达式是通过java.util.regex包提供的接口和类来实现的。`regex-smart.jar`这个库显然是为了简化开发者在Java项目中使用正则表达式的流程,它提供了一系列内置的验证、提取和清洗方法,使得处理字符串...

    article-regex-primer.rar_The Few

    Reading the javadoc forjava.util.regex. Pattern is a must to see how the Java regex patterns aredi erent from other languages such as Perl. Most of the functions discussed herin are from thejava....

    pattern-dissector:探索 java.util.regex.Pattern 类的内部结构

    探索 Java 正则表达式语法的更多细节,并了解Pattern类中的引擎如何实际解释正则表达式。 不是通过文档(通过合同)推断正则表达式的含义,这允许我们直接验证引擎如何解释正则表达式。 自该项目开始(2014 年 2 ...

    java百度编辑器提交过滤标签方法

    java.util.regex.Pattern p_script; java.util.regex.Matcher m_script; java.util.regex.Pattern p_style; java.util.regex.Matcher m_style; java.util.regex.Pattern p_html; java.util.regex.Matcher m_html; ...

    java 正则表达式 Java Regex.rar

    在Java中,正则表达式(Regex)是通过Pattern类和Matcher类来实现的,这两个类位于java.util.regex包中。下面我们将深入探讨Java正则表达式的基本概念、语法、常见使用方法以及如何在实际开发中应用。 1. **基本...

    28个java常用的工具类

    15. **`java.util.regex.Pattern`** 和 **`java.util.regex.Matcher`**: 正则表达式处理。 16. **`java.util.Scanner`**: 从各种输入源读取基本类型和字符串。 17. **`java.util.Properties`**: 用于存储配置信息...

    java.util.Scanner应用详解_.docx

    ### Java.util.Scanner 应用详解 #### 一、概述 `java.util.Scanner` 类是 Java 标准库中的一个实用工具类,它提供了一种方便的方式来读取基本类型的原始数据和字符串。`Scanner` 类可以读取从控制台输入的数据、...

    java SE API

    java.util.regex java.util.zip javax.accessibility javax.activity javax.crypto javax.crypto.interfaces javax.crypto.spec javax.imageio javax.imageio.event javax.imageio.metadata javax.imageio...

    30个常用java工具类

    7. **`java.util.regex.Pattern`** 和 **`Matcher`**:处理正则表达式,用于文本匹配和替换。 8. **`java.util.concurrent`** 包:提供并发工具类,如`ExecutorService`、`Future`和`Semaphore`,帮助管理线程和...

    最最常用的 100 个 Java类分享

    23. `java.util.regex.Pattern`:Pattern类用于编译正则表达式,用于匹配字符串。 24. `java.io.Serializable`:Serializable接口用于对象序列化,允许对象的状态被保存和恢复。 25. `java.util.LinkedList`:...

    spring jdbctemplate 封裝

    import java.util.regex.PatternSyntaxException; import javax.sql.DataSource; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import org.springframework.beans....

    wsdl文件解析

    import java.util.regex.*; import org.dom4j.Attribute; import org.dom4j.Document; import org.dom4j.Element; //import org.dom4j.io.OutputFormat; import org.dom4j.io.SAXReader; //import org.dom4j....

    Java常用工具类大全,工作5年精心整理.zip

    11. **`java.util.regex`包**:支持正则表达式,可以用于字符串的匹配、替换和分割。 12. **`java.util.stream`**:Java 8引入的流API,用于处理集合数据,提供了函数式编程风格,如map、filter、reduce等操作。 ...

    java工具类集合

    10. `java.util.regex` 包: - 提供正则表达式相关的类和接口,如`Pattern`和`Matcher`,用于文本匹配和操作。 11. `java.util.stream`: - Java 8引入的流API,支持函数式编程风格,可以方便地进行数据处理,如...

    Javase-6.0_中文API_HTML(最新更新)

    java.util.regex java.util.spi java.util.zip javax.accessibility javax.activation javax.activity javax.annotation javax.annotation.processing javax.crypto javax.crypto.interfaces javax.crypto...

    java工具类.zip

    此外,`java.util.regex`包提供了正则表达式相关的工具,用于字符串匹配和替换。 8. **数学运算**: `java.lang.Math`类提供了一系列数学运算方法,如平方根、指数、对数、随机数生成等。`java.util.Random`类则...

    java工具类

    20. **`java.util.regex`** 包:正则表达式处理,用于字符串的匹配和替换。 21. **`java.util.Comparator`**:用于自定义排序规则,可以比较对象并定义比较逻辑。 22. **`java.util.ArrayList`** 和 **`java.util....

    java正则表达式.zip

    在Java中,正则表达式主要通过`java.util.regex`包来实现,提供了Pattern和Matcher两个核心类。 **1. Pattern类** Pattern类是Java正则表达式的起点,它将一个正则表达式编译成一个模式对象。这个编译过程可以优化...

    jdk 中文版

    java.util.regex java.util.spi java.util.zip javax.accessibility javax.activation javax.activity javax.annotation javax.annotation.processing javax.crypto javax.crypto.interfaces javax.crypto...

Global site tag (gtag.js) - Google Analytics