- 浏览: 174439 次
- 性别:
- 来自: 北京
文章分类
最新评论
Often unknown, or heralded as confusing, regular expressions have defined the standard for powerful text manipulation and search. Without them, many of the applications we know today would not function. This two-part series explores the basics of regular expressions in Java, and provides tutorial examples in the hopes of spreading love for our pattern-matching friends. (Read part one.)
Part 2: Look-ahead & Configuration flags
Have you ever wanted to find something in a string, but you only wanted to find it when it came before another pattern in that string? Or maybe you wanted to find a piece of text that was not followed by another piece of text? Normally, with standard string searching, you would have to write a somewhat complex function to perform the exact operation you wanted. This can, however, all be done on one line using regular expressions. This chapter will also cover the configuration flags available to modify some of the behaviors of the regular expression patterns language.
This article is part two in the series: “Regular Expressions.” Read part one for more information on basic matching, grouping, extracting, and substitution.
1. Look-ahead & Look-behind
Look-ahead and look-behind operations use syntax that could be confused with grouping (See Ch. 1 – Basic Grouping,) but these patterns do not capture values; therefore, using these constructs, no values will be stored for later retrieval, and they do not affect group numbering. Look-ahead operations look forward, starting from their location in the pattern, continuing to the end of the input. Look-behind expressions do not search backwards, but instead start at the beginning of the pattern and continue up to/until the look-behind expression. E.g.: The statement “my dog is (?!(green|red))\\w+” asserts that ‘green’ will not the word to the look-ahead’s direct right. In other words: My dog is not green or red, but my dog is blue.
Look-ahead/behind constructs (non-capturing)
(?:X) X, as a non-capturing group
(?=X) X, via zero-width positive look-ahead
(?!X) X, via zero-width negative look-ahead
(?<=X) X, via zero-width positive look-behind
(?<!X) X, via zero-width negative look-behind
(?<X) X, as an independent, non-capturing group
So what does this all mean? What does a look-ahead really do for me? Say, for example, we wanted to know if our input string contains the word “incident” but that the word “theft” should not be found anywhere. We can use a negative look-ahead to ensure that there are no occurrences.
“(?!.*theft).*incident.*”
This expression exhibits the following behavior:
"There was a crime incident" matches
"The incident involved a theft" does not match
"The theft was a serious incident" does not match
A more complex example is password validation. Let’s say we want to ensure that a password is made at least 8, but at most 12 alphanumeric characters, and at least two numbers, in any position. We will need to use a look-ahead expression in order to enforce a requirement of two numbers. This look-ahead expression will require any number of characters to be followed by a single digit, followed by any number of characters, and another single digit. E.g.: …4…2, or …42, or 42…, or 4…2.
1.1. Example
Sample code
import java.util.ArrayList;
import java.util.List;
public class LookaheadDemo {
public static void main(String[] args) {
List<String> input = new ArrayList<String>();
input.add("password");
input.add("p4ssword");
input.add("p4ssw0rd");
input.add("p45sword");
for (String ssn : input) {
if (ssn.matches("^(?=.*[0-9].*[0-9])[0-9a-zA-Z]{8,12}$")) {
System.out.println(ssn + ": matches");
} else {
System.out.println(ssn + ": does not match");
}
}
}
}
This produces the following output:
password: does not match
p4ssword: does not match
p4ssw0rd: matches
p45sword: matches
Try this example online with our Visual Java Regex Tester
Dissecting the pattern:
"^(?=.*[0-9].*[0-9])[0-9a-zA-Z]{8,12}$"
^ match the beginning of the line
(?=.*[0-9].*[0-9]) a look-ahead expression, requires 2 digits to be present
.* match n characters, where n >= 0
[0-9] match a digit from 0 to 9
[0-9a-zA-Z] match any numbers or letters
{8,12} match 8 to 12 of whatever is specified by the last group
$ match the end of the line
Multiple look-ahead operations do not evaluate in a specific order. They must all be satisfied equally, and if they logically contradict each other, the pattern will never match.
Visual Regex Tester
To get a more visual look into how regular expressions work, try our visual java regex tester.
2. Configuring the Matching Engine
Pattern configuration flags for Java appear very similar to look-ahead operations. Flags are used to configure case sensitivity, multi-line matching, and more. Flags can be specified in collections, or as individual statements. Again, these expressions are not literal, and do not capture values.
2.1. Configuration flags
(?idmsux-idmsux) Turns match flags on - off for entire expression
(?idmsux-idmsux:X) X, as a non-capturing group with the given flags on – off
2.2. Case insensitivity mode
(?i) Toggle case insensitivity (default: off, (?-i)) for the text in this group only
2.3. UNIX lines mode
(?d) Enables UNIX line mode (default: off, (?-d))
In this mode, only the '\n' line terminator is recognized in the behavior of ., ^, and $
2.4. Multi-line mode
(?m) Toggle treat newlines as whitespace (default: off, (?-m))
The ^ and $ expressions will no longer match to the beginning and end of a line,
respectively, but will match the beginning and end of the entire input sequence/string.
2.5. Dot-all mode
(?s) Toggle dot ‘.’ matches any character (default: off, (?-s))
Normally, the dot character will match everything except newline characters.
2.6. Unicode-case mode
(?u) Toggle Unicode standard case matching (default: off, (?-u)
By default, case-insensitive matching assumes that only characters
in the US-ASCII charset are being matched.
2.7. Comments mode
(?x) Allow comments in pattern (default: off, (?-x))
In this mode, whitespace is ignored, and embedded comments starting with '#'
are ignored until the end of a line.
2.8. Examples
2.8.1. Global toggle
In order to toggle flags for the entire expression, the statement must be at the head of the expression.
"(?idx)^I\s lost\s my\s .+ #this comment and all spaces will be ignored"
The above expression will ignore case, and will set the dot ‘.’ character to include newlines.
Try this example online with our Visual Java Regex Tester
2.8.2. Local toggle
In order to toggle flags for the a single non-capturing group, the group must adhere to the following syntax
"(?idx:Cars)[a-z]+
The above expression will ignore case within the group, but adhere to case beyond.
Try this example online with our Visual Java Regex Tester
2.8.3. Applied in Java
Sample code
public class ConfigurationDemo {
public static void main(String[] args) {
String input = "My dog is Blue.\n" +
"He is not red or green.";
Boolean controlResult = input.matches("(?=.*Green.*).*Blue.*");
Boolean caseInsensitiveResult = input.matches("(?i)(?=.*Green.*).*Blue.*");
Boolean dotallResult = input.matches("(?s)(?=.*Green.*).*Blue.*");
Boolean configuredResult = input.matches("(?si)(?=.*Green.*).*Blue.*");
System.out.println("Control result was: " + controlResult);
System.out.println("Case ins. result was: " + caseInsensitiveResult);
System.out.println("Dot-all result was: " + dotallResult);
System.out.println("Configured result was: " + configuredResult);
}
}
This produces the following output:
Control result was: false
Case insensitive result was: false
Dot-all result was: false
Configured result was: true
Dissecting the pattern:
"(?si)(?=.*Green.*).*Blue.*"
(?si) turn on case insensitivity and dotall modes
(?=.*Green.*) ‘Green’ must be found somewhere to the right of this look-ahead
.*Blue.* ‘Blue’ must be found somewhere in the input
We had to enable multi-line and case-insensitive modes for our pattern to match. The look-ahead in this example is very similar to the pattern itself, and in this case, the pattern could be substituted for another look-ahead. Because we don’t care in which order we find these two items, the way this is written, substituting “(?=.*Blue.*)” for “.*Blue.*” would be an acceptable change; however, if we did care in which order we wanted to find these colors, we would need to be more precise with our ordering. If we wanted to ensure that the ‘Green’ came after ‘Blue’ we would need to move the look-ahead as seen below, and so on.
"(?si).*Blue.*(?=.*Green.*)"
3. Conclusion
Regular expressions provide an extremely flexible and powerful text processing system. Try to imagine doing this work using String.substring(…) or String.indexOf(…), with loops, nested loops, and dozens of if statements. I don’t even want to try… so play around! Think about using regular expressions next time you find yourself doing text or pattern manipulation with looping and other painful methods. Let us know how you do.
This article is part two in the series: “Guide to Regular Expressions in Java.” Read part one for more information on basic matching, grouping, extracting, and substitution.
Part 2: Look-ahead & Configuration flags
Have you ever wanted to find something in a string, but you only wanted to find it when it came before another pattern in that string? Or maybe you wanted to find a piece of text that was not followed by another piece of text? Normally, with standard string searching, you would have to write a somewhat complex function to perform the exact operation you wanted. This can, however, all be done on one line using regular expressions. This chapter will also cover the configuration flags available to modify some of the behaviors of the regular expression patterns language.
This article is part two in the series: “Regular Expressions.” Read part one for more information on basic matching, grouping, extracting, and substitution.
1. Look-ahead & Look-behind
Look-ahead and look-behind operations use syntax that could be confused with grouping (See Ch. 1 – Basic Grouping,) but these patterns do not capture values; therefore, using these constructs, no values will be stored for later retrieval, and they do not affect group numbering. Look-ahead operations look forward, starting from their location in the pattern, continuing to the end of the input. Look-behind expressions do not search backwards, but instead start at the beginning of the pattern and continue up to/until the look-behind expression. E.g.: The statement “my dog is (?!(green|red))\\w+” asserts that ‘green’ will not the word to the look-ahead’s direct right. In other words: My dog is not green or red, but my dog is blue.
Look-ahead/behind constructs (non-capturing)
(?:X) X, as a non-capturing group
(?=X) X, via zero-width positive look-ahead
(?!X) X, via zero-width negative look-ahead
(?<=X) X, via zero-width positive look-behind
(?<!X) X, via zero-width negative look-behind
(?<X) X, as an independent, non-capturing group
So what does this all mean? What does a look-ahead really do for me? Say, for example, we wanted to know if our input string contains the word “incident” but that the word “theft” should not be found anywhere. We can use a negative look-ahead to ensure that there are no occurrences.
“(?!.*theft).*incident.*”
This expression exhibits the following behavior:
"There was a crime incident" matches
"The incident involved a theft" does not match
"The theft was a serious incident" does not match
A more complex example is password validation. Let’s say we want to ensure that a password is made at least 8, but at most 12 alphanumeric characters, and at least two numbers, in any position. We will need to use a look-ahead expression in order to enforce a requirement of two numbers. This look-ahead expression will require any number of characters to be followed by a single digit, followed by any number of characters, and another single digit. E.g.: …4…2, or …42, or 42…, or 4…2.
1.1. Example
Sample code
import java.util.ArrayList;
import java.util.List;
public class LookaheadDemo {
public static void main(String[] args) {
List<String> input = new ArrayList<String>();
input.add("password");
input.add("p4ssword");
input.add("p4ssw0rd");
input.add("p45sword");
for (String ssn : input) {
if (ssn.matches("^(?=.*[0-9].*[0-9])[0-9a-zA-Z]{8,12}$")) {
System.out.println(ssn + ": matches");
} else {
System.out.println(ssn + ": does not match");
}
}
}
}
This produces the following output:
password: does not match
p4ssword: does not match
p4ssw0rd: matches
p45sword: matches
Try this example online with our Visual Java Regex Tester
Dissecting the pattern:
"^(?=.*[0-9].*[0-9])[0-9a-zA-Z]{8,12}$"
^ match the beginning of the line
(?=.*[0-9].*[0-9]) a look-ahead expression, requires 2 digits to be present
.* match n characters, where n >= 0
[0-9] match a digit from 0 to 9
[0-9a-zA-Z] match any numbers or letters
{8,12} match 8 to 12 of whatever is specified by the last group
$ match the end of the line
Multiple look-ahead operations do not evaluate in a specific order. They must all be satisfied equally, and if they logically contradict each other, the pattern will never match.
Visual Regex Tester
To get a more visual look into how regular expressions work, try our visual java regex tester.
2. Configuring the Matching Engine
Pattern configuration flags for Java appear very similar to look-ahead operations. Flags are used to configure case sensitivity, multi-line matching, and more. Flags can be specified in collections, or as individual statements. Again, these expressions are not literal, and do not capture values.
2.1. Configuration flags
(?idmsux-idmsux) Turns match flags on - off for entire expression
(?idmsux-idmsux:X) X, as a non-capturing group with the given flags on – off
2.2. Case insensitivity mode
(?i) Toggle case insensitivity (default: off, (?-i)) for the text in this group only
2.3. UNIX lines mode
(?d) Enables UNIX line mode (default: off, (?-d))
In this mode, only the '\n' line terminator is recognized in the behavior of ., ^, and $
2.4. Multi-line mode
(?m) Toggle treat newlines as whitespace (default: off, (?-m))
The ^ and $ expressions will no longer match to the beginning and end of a line,
respectively, but will match the beginning and end of the entire input sequence/string.
2.5. Dot-all mode
(?s) Toggle dot ‘.’ matches any character (default: off, (?-s))
Normally, the dot character will match everything except newline characters.
2.6. Unicode-case mode
(?u) Toggle Unicode standard case matching (default: off, (?-u)
By default, case-insensitive matching assumes that only characters
in the US-ASCII charset are being matched.
2.7. Comments mode
(?x) Allow comments in pattern (default: off, (?-x))
In this mode, whitespace is ignored, and embedded comments starting with '#'
are ignored until the end of a line.
2.8. Examples
2.8.1. Global toggle
In order to toggle flags for the entire expression, the statement must be at the head of the expression.
"(?idx)^I\s lost\s my\s .+ #this comment and all spaces will be ignored"
The above expression will ignore case, and will set the dot ‘.’ character to include newlines.
Try this example online with our Visual Java Regex Tester
2.8.2. Local toggle
In order to toggle flags for the a single non-capturing group, the group must adhere to the following syntax
"(?idx:Cars)[a-z]+
The above expression will ignore case within the group, but adhere to case beyond.
Try this example online with our Visual Java Regex Tester
2.8.3. Applied in Java
Sample code
public class ConfigurationDemo {
public static void main(String[] args) {
String input = "My dog is Blue.\n" +
"He is not red or green.";
Boolean controlResult = input.matches("(?=.*Green.*).*Blue.*");
Boolean caseInsensitiveResult = input.matches("(?i)(?=.*Green.*).*Blue.*");
Boolean dotallResult = input.matches("(?s)(?=.*Green.*).*Blue.*");
Boolean configuredResult = input.matches("(?si)(?=.*Green.*).*Blue.*");
System.out.println("Control result was: " + controlResult);
System.out.println("Case ins. result was: " + caseInsensitiveResult);
System.out.println("Dot-all result was: " + dotallResult);
System.out.println("Configured result was: " + configuredResult);
}
}
This produces the following output:
Control result was: false
Case insensitive result was: false
Dot-all result was: false
Configured result was: true
Dissecting the pattern:
"(?si)(?=.*Green.*).*Blue.*"
(?si) turn on case insensitivity and dotall modes
(?=.*Green.*) ‘Green’ must be found somewhere to the right of this look-ahead
.*Blue.* ‘Blue’ must be found somewhere in the input
We had to enable multi-line and case-insensitive modes for our pattern to match. The look-ahead in this example is very similar to the pattern itself, and in this case, the pattern could be substituted for another look-ahead. Because we don’t care in which order we find these two items, the way this is written, substituting “(?=.*Blue.*)” for “.*Blue.*” would be an acceptable change; however, if we did care in which order we wanted to find these colors, we would need to be more precise with our ordering. If we wanted to ensure that the ‘Green’ came after ‘Blue’ we would need to move the look-ahead as seen below, and so on.
"(?si).*Blue.*(?=.*Green.*)"
3. Conclusion
Regular expressions provide an extremely flexible and powerful text processing system. Try to imagine doing this work using String.substring(…) or String.indexOf(…), with loops, nested loops, and dozens of if statements. I don’t even want to try… so play around! Think about using regular expressions next time you find yourself doing text or pattern manipulation with looping and other painful methods. Let us know how you do.
This article is part two in the series: “Guide to Regular Expressions in Java.” Read part one for more information on basic matching, grouping, extracting, and substitution.
发表评论
-
java中byte, int的转换 [转]
2013-12-24 16:09 2313转自:http://freewind886.blog.163. ... -
【转】 出现java.lang.UnsupportedClassVersionError 错误的原因
2013-08-14 09:45 1569转自:http://blog.csdn.net/shendl ... -
Java多线程sleep(),join(),interrupt(),wait(),notify()
2013-07-23 23:40 8881. sleep() & interrupt() ... -
[转]java 正则表达式 非捕获组(特殊构造)
2012-08-21 16:52 3130原文出处:http://blog.chenlb.com/ ... -
Java 正则 flags
2012-08-20 14:09 843几下几个常用的正则Pattern的标示 Pattern ...
相关推荐
### 常用Java正则表达式知识点 #### 一、引言 正则表达式是一种强大的工具,用于处理文本并查找模式。多种编程语言,包括Perl、PHP、Python、JavaScript以及Java等均内置了对正则表达式的支持。本文将详细介绍Java...
Java正则表达式匹配软件是基于Java编程语言开发的一款工具,专门用于处理字符串的模式匹配、查找、替换等任务。正则表达式(Regular Expression)是一种强大的文本处理工具,广泛应用于数据验证、文本搜索和替换等多...
Java正则表达式匹配工具是IT领域中一种强大的文本处理工具,它利用正则表达式(Regular Expression)的规则来查找、替换或者提取文本中的特定模式。正则表达式是一种特殊的字符序列,能够帮助程序员或者用户高效地...
### Java正则表达式匹配全角空格 在Java编程中,处理字符串是非常常见的需求之一。其中,使用正则表达式来对字符串进行拆分、替换等操作是一种非常高效的方法。本文将详细介绍如何使用Java中的`split()`方法,并...
java通过正则表达式匹配获取MAC(支持windows和Linux)
java正则常用匹配工具包 ---- 有意者请下载,谢谢!
Java正则表达式是Java编程语言中用于处理字符串的强大工具,它基于模式匹配的概念,能够高效地进行文本搜索、替换和解析。在Java中,正则表达式主要通过`java.util.regex`包来实现,提供了Pattern和Matcher两个核心...
Java正则表达式验证IP地址 Java正则表达式验证IP地址是指使用Java语言中的正则表达式来验证IP地址是否符合标准。IP地址是指在网络通信中用来标识设备的地址,它是一种逻辑地址,通过它可以找到网络中的设备。在...
Java正则表达式是Java语言中用于处理字符串的强大工具,它允许程序员进行复杂的字符串匹配、查找和替换操作。正则表达式(Regular Expression)是一种模式匹配语言,通过特定的语法来描述字符串的模式,用于在文本中...
Java正则表达式测试工具是面向开发者和爱好者的一款实用程序,它可以帮助用户验证和调试他们的正则表达式。在Java编程环境中,正则表达式是一个强大的字符串处理工具,广泛用于数据验证、文本搜索和替换等任务。这款...
在易语言中,正则表达式类是一个非常重要的工具,用于处理字符串的模式匹配和查找。在处理中文文本时,这个功能尤为关键,因为中文字符的编码和处理方式与英文有所不同。 正则表达式是用于匹配字符串模式的一种强大...
Java使用正则表达式提取XML节点内容的方法示例主要介绍了Java使用正则表达式提取XML节点内容的方法,结合具体实例形式分析了java针对xml格式字符串的正则匹配相关操作技巧。 一、正则表达式简介 正则表达式是指一...
在Java编程语言中,正则表达式是一种强大的文本处理工具,可以用来匹配、查找、替换和解析字符串。这里我们关注的是如何使用正则表达式来匹配所有包含在花括号 `{}` 中的字符串,并将其提取出来。这在处理模板引擎、...
本篇将围绕“使用Java正则表达式分析处理日志”这一主题,探讨如何利用Java的正则表达式功能来提取、过滤和操作日志数据。 首先,我们需要理解正则表达式的基本概念。正则表达式(Regular Expression)是一种模式...
java超时取消正则表达式匹配方法,代码超时处理,设置代码执行时间,超棒的工具类 lambda,Callable,ExecutorService,超过执行5秒退出
正则表达式(Regular Expression,简称regex)是用于匹配字符串的一种模式,广泛应用于文本处理、数据验证、数据提取等IT领域。在这个“常用正则表达式HTML,JAVA合集”中,我们主要关注的是HTML和Java环境下的正则...
常用的绿色的正则匹配工具,平常一直用的。很好用分享一下。
下面是一个简单的示例,展示了如何使用Java中的正则表达式进行文本匹配: ```java package testreg; import java.util.regex.Matcher; import java.util.regex.Pattern; public class RegexStudy { public ...
Java正则表达式是一种强大的文本处理工具,广泛用于验证字符串、查找特定模式和替换文本。在Java中,正则表达式提供了简洁而灵活的方式来处理字符串,使得编程人员能够以更高效的方式实现各种文本操作。 正则表达式...
java使用正则表达式进行校验验证,主要使用了Pattern和Matcher类,直接main方法运行就可以,亲测有效