`

Java Regex - Matcher (java.util.regex.Matcher)

阅读更多

The Java Matcher class (java.util.regex.Matcher) is used to search through a text for multiple occurrences of a regular expression. You can also use a Matcher to search for the same regular expression in different texts.

 

The Java Matcher class has a lot of useful methods. I will cover the core methods of the Java Matcher class in this tutorial. For a full list, see the official JavaDoc for the Matcher class.

Java Matcher Example

 

Here is a quick Java Matcher example so you can get an idea of how the Matcher class works:

 

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class MatcherExample {

    public static void main(String[] args) {

        String text    =
            "This is the text to be searched " +
            "for occurrences of the http:// pattern.";

        String patternString = ".*http://.*";

        Pattern pattern = Pattern.compile(patternString);

        Matcher matcher = pattern.matcher(text);
        boolean matches = matcher.matches();
    }
}

 

First a Pattern instance is created from a regular expression, and from the Pattern instance a Matcher instance is created. Then the matches() method is called on the Matcher instance. The matches() returns true if the regular expression matches the text, and false if not.

 

You can do a whole lot more with the Matcher class. The rest is covered throughout the rest of this tutorial. The Pattern class is covered separately in my Java Regex Pattern tutorial.

Creating a Matcher

 

Creating a Matcher is done via the matcher() method in the Pattern class. Here is an example:

 

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class CreateMatcherExample {

    public static void main(String[] args) {

        String text    =
            "This is the text to be searched " +
            "for occurrences of the http:// pattern.";

        String patternString = ".*http://.*";

        Pattern pattern = Pattern.compile(patternString);

        Matcher matcher = pattern.matcher(text);
    }
}

 

At the end of this example the matcher variable will contain a Matcher instance which can be used to match the regular expression used to create it against different text input.

matches()

 

The matches() method in the Matcher class matches the regular expression against the whole text passed to the Pattern.matcher() method, when the Matcher was created. Here is a Matcher.matches() example:

 

String patternString = ".*http://.*";
Pattern pattern = Pattern.compile(patternString);

boolean matches = matcher.matches();

 

If the regular expression matches the whole text, then the matches() method returns true. If not, the matches() method returns false.

 

You cannot use the matches() method to search for multiple occurrences of a regular expression in a text. For that, you need to use the find(), start() and end() methods.

lookingAt()

 

The Matcher lookingAt() method works like the matches() method with one major difference. The lookingAt() method only matches the regular expression against the beginning of the text, whereas matches() matches the regular expression against the whole text. In other words, if the regular expression matches the beginning of a text but not the whole text, lookingAt() will return true, whereas matches() will return false.

 

Here is a Matcher.lookingAt() example:

 

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class CreateMatcherExample {

    public static void main(String[] args) {

        String text    =
                "This is the text to be searched " +
                "for occurrences of the http:// pattern.";

        String patternString = "This is the";

        Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);
        Matcher matcher = pattern.matcher(text);

        System.out.println("lookingAt = " + matcher.lookingAt());
        System.out.println("matches   = " + matcher.matches());
    }
}

 

This example matches the regular expression "this is the" against both the beginning of the text, and against the whole text. Matching the regular expression against the beginning of the text (lookingAt()) will return true.

 

Matching the regular expression against the whole text (matches()) will return false, because the text has more characters than the regular expression. The regular expression says that the text must match the text "This is the" exactly, with no extra characters before or after the expression.

find() + start() + end()

 

The Matcher find() method searches for occurrences of the regular expressions in the text passed to the Pattern.matcher(text) method, when the Matcher was created. If multiple matches can be found in the text, the find() method will find the first, and then for each subsequent call to find() it will move to the next match.

 

The methods start() and end() will give the indexes into the text where the found match starts and ends. Actually end() returns the index of the character just after the end of the matching section. Thus, you can use the return values of start() and end() inside a String.substring() call.

 

Here is a Java Matcher find(), start() and end() example:

 

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class MatcherFindStartEndExample {

    public static void main(String[] args) {

        String text    =
                "This is the text which is to be searched " +
                "for occurrences of the word 'is'.";

        String patternString = "is";

        Pattern pattern = Pattern.compile(patternString);
        Matcher matcher = pattern.matcher(text);

        int count = 0;
        while(matcher.find()) {
            count++;
            System.out.println("found: " + count + " : "
                    + matcher.start() + " - " + matcher.end());
        }
    }
}

 

This example will find the pattern "is" four times in the searched string. The output printed will be this:

 

found: 1 : 2 - 4
found: 2 : 5 - 7
found: 3 : 23 - 25
found: 4 : 70 - 72

reset()

 

The Matcher reset() method resets the matching state internally in the Matcher. In case you have started matching occurrences in a string via the find() method, the Matcher will internally keep a state about how far it has searched through the input text. By calling reset() the matching will start from the beginning of the text again.

 

There is also a reset(CharSequence) method. This method resets the Matcher, and makes the Matcher search through the CharSequence passed as parameter, instead of the CharSequence the Matcher was originally created with.

group()

 

Imagine you are searching through a text for URL's, and you would like to extract the found URL's out of the text. Of course you could do this with the start() and end() methods, but it is easier to do so with the group functions.

 

Groups are marked with parentheses in the regular expression. For instance:

 

(John)

 

This regular expression matches the text John. The parentheses are not part of the text that is matched. The parentheses mark a group. When a match is found in a text, you can get access to the part of the regular expression inside the group.

 

You access a group using the group(int groupNo) method. A regular expression can have more than one group. Each group is thus marked with a separate set of parentheses. To get access to the text that matched the subpart of the expression in a specific group, pass the number of the group to the group(int groupNo) method.

 

The group with number 0 is always the whole regular expression. To get access to a group marked by parentheses you should start with group numbers 1.

 

Here is a Matcher group() example:

 

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class MatcherGroupExample {

    public static void main(String[] args) {

        String text    =
                  "John writes about this, and John writes about that," +
                          " and John writes about everything. "
                ;

        String patternString1 = "(John)";

        Pattern pattern = Pattern.compile(patternString1);
        Matcher matcher = pattern.matcher(text);

        while(matcher.find()) {
            System.out.println("found: " + matcher.group(1));
        }
    }
}

 

This example searches the text for occurrences of the word John. For each match found, group number 1 is extracted, which is what matched the group marked with parentheses. The output of the example is:

 

found: John
found: John
found: John

Multiple Groups

 

As mentioned earlier, a regular expression can have multiple groups. Here is a regular expression illustrating that:

 

(John) (.+?)

 

This expression matches the text "John" followed by a space, and then one or more characters. You cannot see it in the example above, but there is a space after the last group too.

 

This expression contains a few characters with special meanings in a regular expression. The . means "any character". The + means "one or more times", and relates to the . (any character, one or more times). The ? means "match as small a number of characters as possible".

 

Here is a full code example:

 

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class MatcherGroupExample {

    public static void main(String[] args) {

        String text    =
                  "John writes about this, and John Doe writes about that," +
                          " and John Wayne writes about everything."
                ;

        String patternString1 = "(John) (.+?) ";

        Pattern pattern = Pattern.compile(patternString1);
        Matcher matcher = pattern.matcher(text);

        while(matcher.find()) {
            System.out.println("found: " + matcher.group(1) +
                               " "       + matcher.group(2));
        }
    }
}

 

Notice the reference to the two groups, marked in bold. The characters matched by those groups are printed to System.out. Here is what the example prints out:

 

found: John writes
found: John Doe
found: John Wayne

Groups Inside Groups

 

It is possible to have groups inside groups in a regular expression. Here is an example:

 

((John) (.+?))

 

Notice how the two groups from the examples earlier are now nested inside a larger group. (again, you cannot see the space at the end of the expression, but it is there).

 

When groups are nested inside each other, they are numbered based on when the left paranthesis of the group is met. Thus, group 1 is the big group. Group 2 is the group with the expression John inside. Group 3 is the group with the expression .+? inside. This is important to know when you need to reference the groups via the groups(int groupNo) method.

 

Here is an example that uses the above nested groups:

 

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class MatcherGroupsExample {

    public static void main(String[] args) {

        String text    =
                  "John writes about this, and John Doe writes about that," +
                          " and John Wayne writes about everything."
                ;

        String patternString1 = "((John) (.+?)) ";

        Pattern pattern = Pattern.compile(patternString1);
        Matcher matcher = pattern.matcher(text);

        while(matcher.find()) {
            System.out.println("found: <"  + matcher.group(1) +
                               "> <"       + matcher.group(2) +
                               "> <"       + matcher.group(3) + ">");
        }
    }
}

 

Here is the output from the above example:

 

found: <John writes> <John> <writes>
found: <John Doe> <John> <Doe>
found: <John Wayne> <John> <Wayne>

 

Notice how the value matched by the first group (the outer group) contains the values matched by both of the inner groups.

replaceAll() + replaceFirst()

 

The Matcher replaceAll() and replaceFirst() methods can be used to replace parts of the string the Matcher is searching through. The replaceAll() method replaces all matches of the regular expression. The replaceFirst() only replaces the first match.

 

Before any matching is carried out, the Matcher is reset, so that matching starts from the beginning of the input text.

 

Here are two examples:

 

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class MatcherReplaceExample {

    public static void main(String[] args) {

        String text    =
                  "John writes about this, and John Doe writes about that," +
                          " and John Wayne writes about everything."
                ;

        String patternString1 = "((John) (.+?)) ";

        Pattern pattern = Pattern.compile(patternString1);
        Matcher matcher = pattern.matcher(text);

        String replaceAll = matcher.replaceAll("Joe Blocks ");
        System.out.println("replaceAll   = " + replaceAll);

        String replaceFirst = matcher.replaceFirst("Joe Blocks ");
        System.out.println("replaceFirst = " + replaceFirst);
    }
}

 

And here is what the example outputs:

 

replaceAll   = Joe Blocks about this, and Joe Blocks writes about that,
    and Joe Blocks writes about everything.
replaceFirst = Joe Blocks about this, and John Doe writes about that,
    and John Wayne writes about everything.

 

The line breaks and indendation of the following line is not really part of the output. I added them to make the output easier to read.

 

Notice how the first string printed has all occurrences of John with a word after, replaced with the string Joe Blocks. The second string only has the first occurrence replaced.

appendReplacement() + appendTail()

 

The Matcher appendReplacement() and appendTail() methods are used to replace string tokens in an input text, and append the resulting string to a StringBuffer.

 

When you have found a match using the find() method, you can call the appendReplacement(). Doing so results in the characters from the input text being appended to the StringBuffer, and the matched text being replaced. Only the characters starting from then end of the last match, and until just before the matched characters are copied.

 

The appendReplacement() method keeps track of what has been copied into the StringBuffer, so you can continue searching for matches using find() until no more matches are found in the input text.

 

Once the last match has been found, a part of the input text will still not have been copied into the StringBuffer. This is the characters from the end of the last match and until the end of the input text. By calling appendTail() you can append these last characters to the StringBuffer too.

 

Here is an example:

 

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class MatcherReplaceExample {

    public static void main(String[] args) {

        String text    =
                  "John writes about this, and John Doe writes about that," +
                          " and John Wayne writes about everything."
                ;

        String patternString1 = "((John) (.+?)) ";

        Pattern      pattern      = Pattern.compile(patternString1);
        Matcher      matcher      = pattern.matcher(text);
        StringBuffer stringBuffer = new StringBuffer();

        while(matcher.find()){
            matcher.appendReplacement(stringBuffer, "Joe Blocks ");
            System.out.println(stringBuffer.toString());
        }
        matcher.appendTail(stringBuffer);

        System.out.println(stringBuffer.toString());
    }
}

 

Notice how appendReplacement() is called inside the while(matcher.find()) loop, and appendTail() is called just after the loop.

 

The output from this example is:

 

Joe Blocks
Joe Blocks about this, and Joe Blocks
Joe Blocks about this, and Joe Blocks writes about that, and Joe Blocks
Joe Blocks about this, and Joe Blocks writes about that, and Joe Blocks
    writes about everything.

 

The line break in the last line is inserted by me, to make the text more readable. In the real output there would be no line break.

As you can see, the StringBuffer is built up by characters and replacements from the input text, one match at a time

 

转载自:http://tutorials.jenkov.com/java-regex/matcher.html

分享到:
评论

相关推荐

    java 正则表达式应用jar包 regex-smart.jar

    在Java中,正则表达式是通过java.util.regex包提供的接口和类来实现的。`regex-smart.jar`这个库显然是为了简化开发者在Java项目中使用正则表达式的流程,它提供了一系列内置的验证、提取和清洗方法,使得处理字符串...

    commons-fileupload-1.2.jar和commons-io-1.3.2.jar

    新建一个servlet: FileUpload.java用于文件上传: package com.drp.util.servlet; import java.io.IOException; import java.io.PrintWriter; import javax.servlet.ServletException; import javax.servlet....

    article-regex-primer.rar_The Few

    Java Regex Primer Since version 1.4, Java has had support for ... Matcher class with a few fromjava.util.regex. Pattern. Reading this text in conjunction with the javadoc of those classes is advised.

    java 正则表达试

    jakarta-oro.jar 及代码 import org.apache.oro.text.regex.MalformedPatternException; import org.apache.oro.text.regex.MatchResult; import org.apache.oro.text.regex...import org.apache.oro.text.regex.Util;

    java 正则表达式大全 菜鸟也能玩转

    在Java中,`java.util.regex`包提供了支持正则表达式的类库,主要包括`Pattern`、`Matcher`和`PatternSyntaxException`等几个核心类。 #### 二、基础语法 1. **字符匹配**: - `.`:匹配任意单个字符。 - `\d`:...

    jakarta-oro-2.0.8.zip

    - 源代码文件:.java文件,包含了项目的所有源代码。 - 类库文件:编译后的.jar文件,可以直接在Java项目中引用。 - 文档:如API文档、用户指南等,帮助开发者了解和使用该库。 - 示例代码:示例应用程序或测试用例...

    jakarta-oro-2.0.8.zip_jakarta oro.jar 2.0_jakarta-oro-2_jakarta-

    在Java标准库中,虽然内置了java.util.regex包来处理正则表达式,但Jakarta ORO提供了额外的功能和性能提升,特别在处理大量文本和复杂模式匹配时更为明显。Jakarta ORO库的核心特性包括: 1. **高性能匹配引擎**:...

    《java学习》-java学习笔记.zip

    Java提供了`java.util.regex`包,包含了Pattern和Matcher类来支持正则表达式的编译和匹配操作。理解如何构建和使用正则表达式对于数据验证、文本提取等场景至关重要。 2. **Java基础(javaBasic1.md、javaBasic2.md...

    java_api(1.5--1.7).zip

    正则表达式是文本处理和模式匹配的强大工具,Java API提供了`java.util.regex`包来支持正则表达式的操作。这个包包含`Pattern`、`Matcher`等类,允许开发者进行字符串的查找、替换、分割等操作。 设计模式是解决...

    java常用工具类——个人总结

    - `java.util.regex.Pattern` 和 `java.util.regex.Matcher` 支持正则表达式匹配和操作。 6. **网络工具类**: - `java.net.URL` 和 `java.net.URLConnection` 用于处理网络连接和资源获取。 - `java.net.Socket...

    java-util-matcher:Java 字符串匹配器实现

    Java中的`Matcher`类是Java.util.regex包的一部分,它在字符串处理中扮演着核心角色,尤其是在模式匹配和字符串搜索方面。`Matcher`是`Pattern`类的实例,它执行特定正则表达式对输入字符串的实际匹配操作。让我们...

    java 正则表达式 Java Regex.rar

    在Java中,正则表达式(Regex)是通过Pattern类和Matcher类来实现的,这两个类位于java.util.regex包中。下面我们将深入探讨Java正则表达式的基本概念、语法、常见使用方法以及如何在实际开发中应用。 1. **基本...

    java基础知识学习教程-10常用类库.pptx

    Java 中提供了两个正则表达式类:Pattern 和 Matcher,位于 java.util.regex 包中。Pattern 类用于表示正则表达式的模式,提供了多种方法,如编译正则表达式、匹配字符串等。Matcher 类用于匹配字符串,提供了多种...

    java正则表达式截取demo下载即可运行

    在Java中,`java.util.regex`包提供了正则表达式相关的类,如`Pattern`和`Matcher`。`Pattern`类用于编译正则表达式,而`Matcher`类则是用来在特定输入字符串中查找与正则表达式匹配的序列。 1. **Pattern类**: -...

    java统计字符串出现次数算法--StringCounter(算法源码)

    import java.util.regex.Matcher; import java.util.regex.Pattern; public class StringCounter { /** * 正则统计字符串出现次数 * * @param source * @param regexNew * @return int */ public ...

    JAVA正则表达式--Pattern和Matcher

    Java正则表达式的实现主要依赖于`java.util.regex`包中的`Pattern`和`Matcher`两个核心类。本文将深入探讨这两个类的功能以及如何使用它们来进行字符串的匹配和操作。 #### 二、Pattern 类详解 `Pattern` 类代表了...

    javaregex.chm

    Java中的正则表达式支持是通过`java.util.regex`包提供的,主要包括`Pattern`、`Matcher`和`PatternSyntaxException`三个核心类。`Pattern`类用于编译正则表达式并创建模式,`Matcher`类则负责匹配这些模式到目标...

    基于java的邮箱地址验证 jaev.zip

    在Java中,进行邮箱地址验证通常涉及到正则表达式(Regular Expressions)和Java的内置类如`java.util.regex.Pattern`和`java.util.regex.Matcher`。正则表达式是一种强大的文本处理工具,可以用来匹配、查找、替换...

    无涯教程(LearnFk)-Java-正则教程离线版.pdf

    在Java中进行正则表达式操作主要依赖于java.util.regex包,这个包提供了编译和匹配正则表达式的功能。 1. java.util.regex包中的核心类: - Pattern类:这个类的对象代表编译后的正则表达式。Pattern类的实例没有...

    java正则表达式替换字符串

    在Java中,`java.util.regex`包提供了处理正则表达式的类,主要包括`Pattern`和`Matcher`。 - **Pattern**:编译后的正则表达式对象,是不可变的。 - **Matcher**:用于执行匹配操作的对象,可以多次复用。 ##### ...

Global site tag (gtag.js) - Google Analytics