codingstandards

浏览: 4765982 次
性别:
来自: 上海

最近访客更多访客>>

ProgramFans

xchao

tntxia

wangyan419

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

Bash字符串处理（与Java对照） - 21.字符串正则匹配

博客分类：

Bash基础

Bash Java String regex

Bash字符串处理（与Java对照） - 21.字符串正则匹配

In Java

正则表达式查询

String.matches方法

boolean matches(String regex)

通知此字符串是否匹配给定的正则表达式。

String str = "123456";
String re = "\\d+";
if (str.matches(re)) {
    // do something
}

Pattern类和Matcher类

String str = "abc efg ABC";
String re = "a|f"; //表示a或f
Pattern p = Pattern.compile(re);
Matcher m = p.matcher(str);
boolean rs = m.find();

如果str中有re，那么rs为true，否则为flase。如果想在查找时忽略大小写，则可以写成Pattern p = Pattern.compile(re, Pattern.CASE_INSENSITIVE);

正则表达式提取

String re = ".+\\(.+)$";
String str = "c:\\dir1\\dir2\\name.txt";
Pattern p = Pattern.compile(re);
Matcher m = p.matcher(str);
boolean rs = m.find();
for (int i = 1; i <= m.groupCount(); i++) {
    System.out.println(m.group(i));
}

以上的执行结果为name.txt，提取的字符串储存在m.group(i)中，其中i最大值为m.groupCount();

正则表达式分割

String re = "::";
Pattern p = Pattern.compile(re);
String[] r = p.split("xd::abc::cde");

执行后，r就是{"xd","abc","cde"}，其实分割时还有跟简单的方法：

String str="xd::abc::cde";
String[] r = str.split("::");

正则表达式替换（删除）

String re = "a+"; //表示一个或多个a
Pattern p = Pattern.compile(re);
Matcher m = p.matcher("aaabbced a ccdeaa");
String s = m.replaceAll("A");

结果为"Abbced A ccdeA"
　　
如果写成空串，既可达到删除的功能，比如：

String re = "a+"; //表示一个或多个a
Pattern p = Pattern.compile(re);
Matcher m = p.matcher("aaabbced a ccdeaa");
String s = m.replaceAll("");

结果为"bbced ccde"

String.replaceAll 和 String.replaceFirst 是可执行正则表达式替换（删除）的简易做法。但String.replace不是按正则表达式来进行的。

JavaDoc class String 写道

String replace(char oldChar, char newChar)
    Returns a new string resulting from replacing all occurrences of oldChar in this string with newChar.
String replace(CharSequence target, CharSequence replacement)
    Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence.
String replaceAll(String regex, String replacement)
    Replaces each substring of this string that matches the given regular expression with the given replacement.
String replaceFirst(String regex, String replacement)
    Replaces the first substring of this string that matches the given regular expression with the given replacement.

Java中常用的正则表达式元字符

. 代表任意字符
? 表示前面的字符出现0次或1次
+ 表示前面的字符出现1次或多次
* 表示前面的字符出现0次或多次
{n} 表示前面的字符出现正好n次
{n,} 表示前面的字符出现n次或以上
{n,m} 表示前面的字符出现n次到m次
\d 等于 [0-9] 数字
\D 等于 [^0-9] 非数字
\s 等于 [ \t\n\x0B\f ] 空白字元
\S 等于 [^ \t\n\x0B\f ] 非空白字元
\w 等于 [a-zA-Z_0-9] 数字或是英文字
\W 等于 [^a-zA-Z_0-9] 非数字与英文字
^ 表示每行的开头

$ 表示每行的结尾

In Bash

关于Linux下正则表达式的说明，详见 http://codingstandards.iteye.com/blog/1195592

Bash对正则表达式的支持

Bash v3 内置对正则表达式匹配的支持，操作符为 =~。（Bash Version 3）

[[ "$STR" =~ "$REGEX" ]]

man bash 写道

[[ expression ]]
       An additional binary operator, =~, is available, with the same precedence as == and !=. When it is
       used, the string to the right of the operator is considered an extended regular expression and matched
       accordingly (as in regex(3)). The return value is 0 if the string matches the pattern, and 1 otherwise.
       If the regular expression is syntactically incorrect, the conditional expression’s return value is 2.
       If the shell option nocasematch is enabled, the match is performed without regard to the case of alpha-
       betic characters. Substrings matched by parenthesized subexpressions within the regular expression are
       saved in the array variable BASH_REMATCH. The element of BASH_REMATCH with index 0 is the portion of
       the string matching the entire regular expression. The element of BASH_REMATCH with index n is the por-
       tion of the string matching the nth parenthesized subexpression.

在Bash中二元操作符 =~ 进行扩展的正则表达式匹配。如果匹配，返回值为0，否则1，如果正则表达式错误，返回2。如果shell选项nocasematch没有开启，那么匹配时区分大小写。在正则表达式中小括号包围的子表达式的匹配结果保存在BASH_REMATCH中，它是个数组，${BASH_REMATCH[0]}是匹配的整个字符串，${BASH_REMATCH[1]}是匹配的第一个子表达式的字符串，其他以此类推。

以下脚本来自 http://www.linuxjournal.com/content/bash-regular-expressions 很好的展示了Bash3.0中内置的正则表达式匹配功能。

#!/bin/bash

if [[ $# -lt 2 ]]; then
    echo "Usage: $0 PATTERN STRINGS..."
    exit 1
fi
regex=$1
shift
echo "regex: $regex"
echo

while [[ $1 ]]
do
    if [[ $1 =~ $regex ]]; then
        echo "$1 matches"
        i=1
        n=${#BASH_REMATCH[*]}
        while [[ $i -lt $n ]]
        do
            echo "  capture[$i]: ${BASH_REMATCH[$i]}"
            let i++
        done
    else
        echo "$1 does not match"
    fi
    shift
done

[root@jfht ~]# ./bashre.sh 'aa(b{2,3}[xyz])cc' aabbxcc aabbcc
regex: aa(b{2,3}[xyz])cc

aabbxcc matches
capture[1]: bbx
aabbcc does not match
[root@jfht ~]#

在grep/egrep命令中进行正则表达式匹配

使用Basic RE

格式1：echo "$STR" | grep -q "$REGEX"

格式2：grep -q "$REGEX" <<<"$STR"

使用Extended RE

格式3：echo "$STR" | egrep -q "$REGEX"

格式4：egrep -q "$REGEX" <<<"$STR"

注意：grep/egrep加上-q参数是为了减少输出，根据退出码判断是否匹配，退出码为0时表示匹配。

man grep 写道

Egrep is the same as grep -E.

       -E, --extended-regexp
              Interpret PATTERN as an extended regular expression (see below).

       -e PATTERN, --regexp=PATTERN
              Use PATTERN as the pattern; useful to protect patterns beginning with -.

       -q, --quiet, --silent
              Quiet; do not write anything to standard output. Exit immediately with zero status if any match is
              found, even if an error was detected. Also see the -s or --no-messages option.

匹配手机号码，模式为：1[3458][0-9]{9} 或 1[3458][0-9]\{9\}

[root@jfht ~]# echo "13012345678" | egrep '1[3458][0-9]{9}'
13012345678
[root@jfht ~]# echo "13012345678" | grep '1[3458][0-9]{9}'
[root@jfht ~]# echo "13012345678" | grep '1[3458][0-9]\{9\}'
13012345678
[root@jfht ~]#

STR="13024184301"
REGEX="1[3458][0-9]{9}"
if echo "$STR" | egrep -q "$REGEX"; then
    echo "matched"
else
    echo "not matched"
fi

[root@jfht ~]# STR="13024184301"
[root@jfht ~]# REGEX="1[3458][0-9]{9}"
[root@jfht ~]# if echo "$STR" | egrep -q "$REGEX"; then
> echo "matched"
> else
> echo "not matched"
> fi
matched
[root@jfht ~]#

使用expr match进行正则表达式匹配

expr match "$STR" "$REGEX"

expr "$STR" : "$REGEX"

打印与正则表达式匹配的长度。

man expr 写道

STRING : REGEXP
anchored pattern match of REGEXP in STRING
match STRING REGEXP
same as STRING : REGEXP

[root@jfht ~]# STR=Hello
[root@jfht ~]# REGEX=He
[root@jfht ~]# expr "$STR" : "$REGEX"
2

[root@jfht ~]# REGEX=".*[aeiou]"
[root@jfht ~]# expr "$STR" : "$REGEX"
5

注意：贪婪匹配！

[root@jfht ~]# REGEX=ll
[root@jfht ~]# expr "$STR" : "$REGEX"
0

另外，expr match 也可以实现根据正则表达式取子串。

expr match "$STR" ".*$$SUB$.*"

expr "$STR" : ".*$$SUB$.*"

注意与上面不同的是，结果是子串，而不是匹配的长度。

[root@jfht ~]# STR="某某是2009年进公司的"

想从此字符串中提取出数字来，下面是尝试的过程。
[root@jfht ~]# SUB="[0-9]+"
[root@jfht ~]# expr "$STR" : ".*$$SUB$.*"

[root@jfht ~]# SUB="[0-9]\+"
[root@jfht ~]# expr "$STR" : ".*$$SUB$.*"
9
[root@jfht ~]# SUB="[0-9]*"
[root@jfht ~]# expr "$STR" : ".*$$SUB$.*"

[root@jfht ~]# SUB="[0-9]\*"
[root@jfht ~]# expr "$STR" : ".*$$SUB$.*"

上面的写法都无法做到提取完整的年份，因为在正则匹配的时候是贪婪匹配，前面.*已经把能匹配的全部吃掉了。
[root@jfht ~]# expr "$STR" : "[^0-9]*$[0-9]\+$.*"
2009

网上问题：形如"someletters_12345_moreleters.ext"的文件名，以一些字母开头、跟上下划线、跟上5个数字、再跟上下划线、以一些字母及扩展名结尾。现在需要将数字提取出来，保存到一个变量中。

[root@jfht ~]# echo someletters_12345_moreleters.ext | cut -d'_' -f 2
12345

[root@jfht ~]# expr match 'someletters_12345_moreleters.ext' '.\+_$.\+$_.*'
12345

[root@jfht ~]# FILE=someletters_12345_moreleters.ext
[root@jfht ~]# NUM=$(expr match "$FILE" '.\+_$.\+$_.*')
[root@jfht ~]# echo $NUM
12345

本文链接：http://codingstandards.iteye.com/blog/1208526 （转载请注明出处）

返回目录：Java程序员的Bash实用指南系列之字符串处理（目录）

上节内容：Bash字符串处理（与Java对照） - 20.查找子串的位置

下节内容：Bash字符串处理（与Java对照） - 22.判断字符串是否数字串

5
顶

2
踩

分享到：

Bash字符串处理（与Java对照） - 22.判断字 ... | 好资料推荐：Linux 101 Hacks 英文原版+中 ...

2011-10-24 09:07
浏览 11118
评论(1)
分类:操作系统
查看更多

1 楼 superlittlefish 2011-10-24

学习了,不错啊.

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Bash字符串处理（与Java对照） - 21.字符串正则匹配

Bash字符串处理（与Java对照） - 21.字符串正则匹配

In Java

正则表达式查询

正则表达式提取

正则表达式分割

正则表达式替换（删除）

Java中常用的正则表达式元字符

In Bash

Bash对正则表达式的支持

在grep/egrep命令中进行正则表达式匹配

使用expr match进行正则表达式匹配

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Bash字符串处理（与Java对照） - 21.字符串正则匹配

Bash字符串处理（与Java对照） - 21.字符串正则匹配

In Java

正则表达式查询

正则表达式提取

正则表达式分割

正则表达式替换（删除）

Java中常用的正则表达式元字符

In Bash

Bash对正则表达式的支持

在grep/egrep命令中进行正则表达式匹配

使用expr match进行正则表达式匹配

评论

发表评论

相关推荐

Bash字符串处理（与Java对照） - 22.判断字符串是否数字串

Bash字符串处理（与Java对照） - 20.查找子串的位置

Bash字符串处理（与Java对照） - 19.查找字符的位置

Bash字符串处理（与Java对照） - 18.格式化字符串

Bash字符串处理（与Java对照） - 17.判断是否以另外的字符串结尾

Bash字符串处理（与Java对照） - 16.判断是否以另外的字符串开头

Bash字符串处理（与Java对照） - 15.计算子串出现的次数

Bash字符串处理（与Java对照） - 14.判断是否包含另外的字符串（多达6种方法）

Bash字符串处理（与Java对照） - 13.字符串数组连接（以指定分隔符合并）

Bash字符串处理（与Java对照） - 12.字符串连接

Bash字符串处理（与Java对照） - 11.比较两个字符串大小（字典顺序、数值比较）

Bash字符串处理（与Java对照） - 10.判断两个字符串是否相等（不等）

Bash字符串处理（与Java对照） - 9.获取字符串指定位置的字符、遍历字符串中的字符

Bash字符串处理（与Java对照） - 8.计算字符串长度

Bash字符串处理（与Java对照） - 7.字符串与默认值

Bash字符串处理（与Java对照） - 6.判断字符串是否为空（不为空）

Bash字符串处理（与Java对照） - 5.字符串输入（读取字符串）

Bash字符串处理（与Java对照） - 4.字符串输出

Bash字符串处理（与Java对照） - 3.给（字符串）变量赋值

Bash字符串处理（与Java对照） - 2.字符串的表示方式（字符串常量）

最近访客更多访客>>