- 浏览: 99235 次
-
文章分类
最新评论
全面搜索正则表达式 grep and Regular Expressions
grep and Regular Expressions
Pattern Matching—grep
z grep searches files for lines matching a
given pattern
z Normally each line that contains an
instance of the pattern is copied to the
standard output
z Usage: grep [options] pattern files
z If no file is specified, stdin is taken as the
input
grep简介:
grep (global search regular expression(RE) and print out the line,全面搜索正则表达式并把搜索到的行打印出来)是一种强大的文本搜索工具
,它能使用正则表达式搜索文本,并把匹配的行打印出来。Unix的grep家族包括grep、egrep和fgrep。egrep和fgrep的命令只跟grep有很小不同。egrep
是grep的扩展,支持更多的re元字符, fgrep就是fixed grep或fast grep,它们把所有的字母都看作单词,也就是说,正则表达式中的元字符表示回其
自身的字面意义,不再特殊。linux使用GNU版本的grep。它功能更强,可以通过-G、-E、-F命令行选项来使用egrep和fgrep的功能。
grep的工作方式是这样的,它在一个或多个文件中搜索字符串模板;如果模板包括空格,则必须用符号应用;模板后的所有字符串将被看作文件名;搜索
的结果被送到屏幕,不影响原文件内容。 grep可用于shell脚本,因为grep通过返回一个状态值来说明搜索的状态,如果模板搜索成功,则返回0,如果
搜索不成功,则返回1,如果搜索的文件不存在,则返回2。我们利用这些返回值可以进行一些自动化的文本处理工作。
grep选项:
grep的命令格式:
grep [options] PATTERN [file...]
grep [options] [-e PATTREN |-f FILE] [FILE...]
grep选项:
-c 只输出匹配行的计数
-i 不区分大小写(只使用于单字符)
-h 查询多个文件时不显示文件名
-l 查询多个文件时只输出包含匹配字符的文件名
-n 显示匹配行及行号
-s 不显示不存在或无匹配文本的错误信息
-v 显示不包含匹配文本的所有行。
-f 从文件中提取模板,空文件包含0个模板,所以什么都不匹配
-L 打印不匹配的字符清单
● –q 取消显示,只返回退出状态,0则表示找到了匹配的行
-w 如果被\<和\>引用,就把表达式做为一个单词搜索
-b 打印匹配行所在的块号码。
范例: (1):范例文本:为使范例易懂,我们下面的范例大部分使用名为hosts的文本,其文本内容如下: genesis1 192.168.100.134
genesis2 192.168.100.137
genesis3 192.168.100.138
genesis /home/app_gen
liu cai lin
(2):查询多个文件:
● 在当前目录下所有.txt文件中查找字符串"m48"
$grep “m48” *.txt
(3):行匹配:
● 显示包含字符串的行数: $grep -c "genesis" hosts
grep正则表达式元字符集:
^ 锚定行的开始 如:'^grep'匹配所有以grep开头的行。
$ 锚定行的结束 如:'grep$'匹配所有以grep结尾的行。
. 匹配一个非换行符的字符 如:'gr.p'匹配gr后接一个任意字符,然后是p。
* 匹配零个或多个先前字符 如:'*grep'匹配所有一个或多个空格后紧跟grep的行。 .*一起用代表任意字符。
[] 匹配一个指定范围内的字符,如'[Gg]rep'匹配Grep和grep。
[^] 匹配一个不在指定范围内的字符,如:'[^A-FH-Z]rep'匹配不包含A-R和T-Z的一个字母开头,紧跟rep的行。
\(..\) 标记匹配字符,如'\(love\)',love被标记为1。
\ 锚定单词的开始,如:'\匹配包含以grep开头的单词的行。
\> 锚定单词的结束,如'grep\>'匹配包含以grep结尾的单词的行。
x\{m\} 重复字符x,m次,如:'0\{5\}'匹配包含5个o的行。
x\{m,\} 重复字符x,至少m次,如:'o\{5,\}'匹配至少有5个o的行。
x\{m,n\}重复字符x,至少m次,不多于n次,如:'o\{5,10\}'匹配5--10个o的行。
\w 匹配文字和数字字符,也就是[A-Za-z0-9],如:'G\w*p'匹配以G后跟零个或多个文字或数字字符,然后是p。
\b 单词锁定符,如: '\bgrep\b'只匹配grep。
常用的 grep 选项有:
-c 只输出匹配行的个数。
-i 不区分大小写(只适用于单字符)。
-h 查询多文件时不显示文件名。
-l 查询多文件时只输出包含匹配字符的文件名。
-n 显示匹配行及行号。
-s 不显示不存在或无匹配文本的错误信息。
-v 显示不包含匹配文本的所有行。
-V 显示软件版本信息
使用grep匹配时最好用双引号引起来,防止被系统误认为参数或者特殊命令,也可以匹配多个单词。
关于匹配的实例:
grep -c "48" test.txt 统计所有以“48”字符开头的行有多少
grep -i "May" test.txt 不区分大小写查找“May”所有的行)
grep -n "48" test.txt 显示行号;显示匹配字符“48”的行及行号,相同于 nl test.txt |grep 48)
grep -v "48" test.txt 显示输出没有字符“48”所有的行)
grep "471" test.txt 显示输出字符“471”所在的行)
grep "48;" test.txt 显示输出以字符“48”开头,并在字符“48”后是一个tab键所在的行
grep "48[34]" test.txt 显示输出以字符“48”开头,第三个字符是“3”或是“4”的所有的行)
grep "^[^48]" test.txt 显示输出行首不是字符“48”的行)
grep "[Mm]ay" test.txt 设置大小写查找:显示输出第一个字符以“M”或“m”开头,以字符“ay”结束的行)
grep "K…D" test.txt 显示输出第一个字符是“K”,第二、三、四是任意字符,第五个字符是“D”所在的行)
grep "[A-Z][9]D" test.txt 显示输出第一个字符的范围是“A-D”,第二个字符是“9”,第三个字符的是“D”的所有的行
grep "[35]..1998" test.txt 显示第一个字符是3或5,第二三个字符是任意,以1998结尾的所有行
grep "4\{2,\}" test.txt 模式出现几率查找:显示输出字符“4”至少重复出现两次的所有行
grep "9\{3,\}" test.txt 模式出现几率查找:显示输出字符“9”至少重复出现三次的所有行
grep "9\{2,3\}" test.txt 模式出现几率查找:显示输出字符“9”重复出现的次数在一定范围内,重复出现2次或3次所有行
grep -n "^$" test.txt 显示输出空行的行号
ls -l |grep "^d" 如果要查询目录列表中的目录 同:ls -d *
ls -l |grep "^d[d]" 在一个目录中查询不包含目录的所有文件
ls -l |grpe "^d…..x..x" 查询其他用户和用户组成员有可执行权限的目录集合
file name : cars
Toyota Camry 88 Red 50000 15000
Chevy nova 79 Green 63000 5000
ford escort 81 Blue 80000 5000
Honda Civic 83 red 45000 8000
toyota tercel 86 Yellow 140000 9500
Pfatt Credo 58 Black 215000 600
Ford Bronco 87 Pink 99000 9800
Chevy Nomad 83 blue 118000 6000
ford Mustang 67 White 58000 12000
honda Accord 85 red 40000 3000
Toyota corona 71 Blue 180000 2500
Ford Futura 95 White 50 35000
Toyota Camry 94 Red 10000 26000
Holden Apollo 80 Brown 20000 5000
ford Mustang 67 White 58000 12000
Holden Apollo 80 Brown 20000 5000
Ford Bronco 87 Pink 99000 9800
Holden Apollo 80 Brown 20000 5000
Holden Apollo 80 Brown 20000 5000
honda Accord 85 red 40000 3000
Honda Civic 83 red 45000 8000
Holden Apollo 80 Brown 20000 5000
toyota tercel 86 Yellow 140000 9500
Holden Apollo 80 Brown 20000 5000
Holden Apollo 80 Brown 20000 5000
Chevy Nomad 83 blue 118000 6000
ford Mustang 67 White 58000 12000
honda Accord 85 red 40000 3000
Holden Apollo 80 Brown 20000 5000
Example of grep
$ grep Ford cars
Ford Bronco 87 Pink 99000 9800
Ford Futura 95 White 50 35000
Ford Bronco 87 Pink 99000 9800
Regular Expressions
z Regular expressions are character
sequences that describe a family of
matching strings.
z They are used in many Unix tools: grep,
awk, sed, vi, lex are examples.
z Regular expressions are formed out of a
sequence of normal and special (meta)
characters.
RE rules
z Any character other than a metacharacter
matches itself
z Any character (including metacharacters)
preceded by a backslash matches the
character (remember how to escape
metacharacters?)
Metacharacters
z . matches any single character (for sh- ?)
z ^ matches the beginning of a line
z $ matches the end of a line
z A regular expression followed by an
asterisk (*) matches zero of more
occurrences of the preceding regular
expression
[ ] defines a set of characters
y It matches any single character in the set.
y Examples: [A-Z], [aeiou]
x range of values specified with a dash - as in A-Z.
y Note: Characters *, ^, $, and \ lose their
special meaning inside the square brackets.
y If the first character in the bracket is ^, then
it matches any single character not in the
set. [^aeiou] means match any character other then
a, e, I, o and u.
Metacharacters for
Extended REs
z Available for egrep
z A regular expression (RE) followed by a +
matches one or more matches of the
regular expression.
z An RE followed by a ? matches zero or
one match of the regular expression.
z r1|r2 will match if there is a match for r1
or for r2
Rules of matching
z r1r2
y Two regular expressions concatenated match
a match of the first followed by a match of
the second
z An RE matches the longest possible string
starting as far towards the beginning as
possible
Rules (continued)
z If an RE is composed of two RE's, the first
will match as long a string as possible, but
will not exclude a match of the second.
z Quoted parentheses \( and \) can be used
for grouping RE's.
Examples:
RE matches
thing thing anywhere in the line
^thing thing at the beginning of the line
thing$ thing at the end of the line
^thing$ Line that contains only thing
[tT]hing thing or Thing anywhere in the line
thing[0-9] thing followed by a digit anywhere
thing[^0-9] thing followed by any character other than a
digit
thing.*thing thing followed by any number of characters
followed by thing
List the names of all subdirectories in the
current directory
ls -l | grep '^d'
z List the files others can read
ls -l | grep '^.......r..'
z List all words in /usr/dict/words that have
all the vowels in order (e.g. the word
facetious)
grep '[^aeiou]*a[^aeiou]*e[^aeiou]*i[^aeiou]*o[^aeiou]*u[^aeiou]*' /usr/dict/words
Major options for grep
z -c displays only a count of matching lines
z -i ignores the upper and lower case
distinctions in pattern matching
z -l lists only the names of files containing
matching lines
z -n precedes each matching line with its
line number in the file
z -v displays all lines that do not match
More Examples
z Check whether srini is logged on
x who | grep srini
z List all filenames that does not end in h
x ls | grep -v '.*h’ (think another way without using -v option)
z Print message headers of all messages in
the mailbox
x grep From $MAIL (what is the value of MAIL variable?)
sed: stream editing
sed [ - n ] [ -e script] [ -f scriptfile ] {inputfile}*
z Edits an input stream according to a given
set of editing commands (called sed
scripts).
z The editing command may be given on
the command line (default)
y or in a scriptfile (using -f option)
sed options (continued)
z For multiple editing commands on the
command line
y precede each editing command with -e flag
z -n option is to output only edited lines
y the default is to output every line
sed commands
address a\
text
Append text after the line specified
by the address
addressRange c\
text
Replace the text specified by
addressRange with text
addressRange d Delete the text specified by
addressRange
address i\
text
Insert text after the line specified by
address
address r fileName Append the contents of fileName
after the line specified by address
addressRange
s/expr/str/
Substitute the first occurrence of RE
expr by the string str
addressRange
s/expr/str/g
Substitute every occurrence of the
RE expr by the string str
Commands (continued)
z addressRange p
y Print the specified line(s), usually used with
the -n option
z q
y Quit after printing the current line
z !
y Don't do
z There is more... Refer man pages
Address specification
z An address is
y a line number, or
y a regular expression
x enclosed within two slashes, or
y a $ indicating the last line
z If no address is specified
x The command is applied to all the lines in input
Data file for sed examples
2 25 114 register
5 20 188 sphere
12 29 176 trapeg
1 25 110 sphere
10 40 193 whereis
29 114 671 total
Substitution
$ sed 's/sphere/SPHERE/' sfile
2 25 114 register
5 20 188 SPHERE
12 29 176 trapeg
1 25 110 SPHERE
10 40 193 whereis
29 114 671 total
Substitution within an address range
$ sed '1,3s/sphere/SPHERE/' sfile
2 25 114 register
5 20 188 SPHERE
12 29 176 trapeg
1 25 110 sphere
10 40 193 whereis
29 114 671 total
Specifying address using RE
$ sed '/sphere/s/1/ONE/' sfile
2 25 114 register
5 20 ONE88 sphere
12 29 176 trapeg
ONE 25 110 sphere
10 40 193 whereis
29 114 671 total
Global substitution
$ sed '/sphere/s/1/ONE/g' sfile
2 25 114 register
5 20 ONE88 sphere
12 29 176 trapeg
ONE 25 ONEONE0 sphere
10 40 193 whereis
29 114 671 total
Specifying address range using RE
$ sed '/r.*g/,/r.*g/s/1/ONE/g' sfile
2 25 ONEONE4 register
5 20 ONE88 sphere
ONE2 29 ONE76 trapeg
1 25 110 sphere
10 40 193 whereis
29 114 671 total
Specifying address range using RE
$ sed '/sphere/,/trapeg/s/1/ONE/g' sfile
2 25 114 register
5 20 ONE88 sphere
ONE2 29 ONE76 trapeg
ONE 25 ONEONE0 sphere
ONE0 40 ONE93 whereis
29 ONEONE4 67ONE total
Substitution using a pattern
$ sed '/sp.*e/s//CONE/' sfile
2 25 114 register
5 20 188 CONE
12 29 176 trapeg
1 25 110 CONE
10 40 193 whereis
29 114 671 total
Do default action and quit early
$ sed 2q sfile
2 25 114 register
5 20 188 sphere
z Printing only certain lines
$ sed -n 3p sfile
2 25 114 register
5 20 188 sphere
12 29 176 trapeg
Don't print lines matching the
address
$ sed -n '/9/!p' sfile
2 25 114 register
5 20 188 sphere
1 25 110 sphere
using brackets
$ sed 's/\(sphere\)/Shake \1/' sfile
2 25 114 register
5 20 188 Shake sphere
12 29 176 trapeg
1 25 110 Shake sphere
10 40 193 whereis
29 114 671 total
z The brackets \( and \) can be used to tag
regular expressions. They can be referred
later as \1, \2 etc.
Replace all occurrences of Fred and George in
a file data to George and Fred
sed 's/\(Fred\) and \(George\)/\2 and \1/g' data
z Using multiple commands on command
line
y Capitalise all occurrences of a and e
sed -e s/a/A/g -e s/e/E/g data
tr—Character translation
(Glass, p. 280)
z Translates characters from one character
set to another
tr string1 string2
z string1 specifies the source character set
and string2 the target character set
z tr maps all characters in its standard input
from character set string1 to character set
string2
tr—Example
$ tr a-z A-Z < trdata
GO CART
RACING
$ tr abc DEF < trdata
go FDrt
rDFing
Varying string lengths
z If length of string2 is less than the length of
string1
y string2 is padded by repeating its last
character
z Example
tr a-c DE < trdata
go EDrt
rDEing
-c Complements string1 before
performing the mapping
z -d Causes every character is string1 to be
deleted from standard input
z -s Causes every repeated output character
to be condensed into a single instance
Replace every character (including new
lines) other than a by X
$ tr -c a X < trdata
XXXXaXXXXaXXXXX$
z Delete all a-c characters
$ tr -d a-c < trdata
go rt
ring
Replace characters other than a to z by a
new line (i.e. \012)
$ tr -c a-z '\012'
go
cart
racing
z Suppress repeated characters
$ tr -s acr efe < trdata
go fet
efing
相关推荐
1. **基本正则表达式**(Basic Regular Expressions, BRE):最早的正则表达式版本,主要用于Unix的grep命令。 2. **扩展正则表达式**(Extended Regular Expressions, ERE):增加了更多的功能,提高了灵活性。 3. ...
正则表达式(Regular Expressions, 简称 REs)是一种强大的文本处理工具,最初由数学家Stephen Kleene于1956年提出,并逐渐发展成为一种广泛应用于各种编程语言和技术领域的通用语法。虽然正则表达式在初学者眼中...
1. 《Mastering Regular Expressions》:权威的正则表达式指南,深入浅出,适合初学者和进阶者。 2. 《正则表达式实战》:实例丰富,侧重于实际应用。 3. 《Linux命令行与shell脚本编程大全》:虽不专门讲解正则,但...
- 阅读正则表达式相关的书籍和文档,如《Mastering Regular Expressions》。 - 实践编写正则表达式,通过解决实际问题来提高技能。 7. **注意事项**: - 正则表达式可能因语言和环境的不同而略有差异,需了解所...
- **grep/egrep**:Unix/Linux 下的命令行工具,用于在文件中搜索匹配正则表达式的行。 #### 五、正则表达式的高级主题 对于更深入的学习者来说,还可以探索以下高级主题: - **反向引用**:在正则表达式中使用...
学习C#中的正则表达式,意味着学习如何使用System.Text.RegularExpressions命名空间下的Regex类。这个类是.NET中用于执行正则表达式操作的核心,它让.NET开发人员能够利用正则表达式语言的强大功能来实现复杂的模式...
在Linux shell中,正则表达式主要有三种类型:基本正则表达式(Basic Regular Expressions,BREs)、扩展正则表达式(Extended Regular Expressions,EREs)和Perl正则表达式(Perl Regular Expressions,PREs或pcre...
大部分示例以vi的替换命令或grep的文件搜索命令的形式呈现,但这些只是代表性示例,其概念可以应用于sed、awk、perl等支持正则表达式的其他程序。你可以查看“正则表达式在各种工具中的使用”来了解不同工具中正则...
作为文本处理的三大工具之一,`grep`以其强大的文本搜索能力而著称,尤其当它与正则表达式相结合时,能够极大地提高工作效率。 #### grep简介 `grep`是一个用于在文件中搜索指定模式(pattern)并打印出匹配行的...
8. 相关资源:Jeffrey Frieldl 的《Mastering Regular Expressions》是一本关于正则表达式的经典书籍,推荐阅读。 Java 正则表达式是一种强大的字符串处理技术,能够满足大多数字符串处理需求。通过了解 Java 正则...
在.NET Framework中,System.Text.RegularExpressions类提供了丰富的API来执行正则表达式操作,包括匹配、替换、分割字符串等。 在ASP.NET中,正则表达式常用于验证用户输入,例如验证邮箱格式、电话号码等。通过...
- **egrep (grep -E)**:使用扩展的正则表达式(Extended Regular Expressions, ERE)。这使得用户可以在正则表达式中使用更多语法,例如 `{n,m}` 和无限重复符 `*`、`+` 等,但需要特别注意的是,在 ERE 中 `\b`、`...
通过`System.Text.RegularExpressions`命名空间中的类,可以方便地使用正则表达式进行字符串操作。例如,`Regex`类提供了多种静态方法,如`Match`、`Matches`、`Replace`等,可以用来执行匹配、查找所有匹配项、替换...
正则表达式 - 正则表达式工具 JGsoft RegexBuddy v3.4.2 Retail JGsoft RegexBuddy 是一款正则表达式工具。它是你使用正则表达式时的最好的助手。容易创建完全符合你的要求的正则表达式。清除地理解其他人写的复杂...
- 在文本编辑器或命令行工具(如 Vim 或 grep)中,使用正则表达式进行查找和替换。 - 在表单验证中,可以使用正则表达式检查用户输入的有效性。 10. **学习与进阶** - 学习正则表达式需要掌握基础规则,然后...
- C#通过`System.Text.RegularExpressions`命名空间提供正则表达式支持,如`Regex`类。 - `Regex.IsMatch()`、`Regex.Match()`、`Regex.Replace()`等方法对应于JavaScript的`test()`, `match()`和`replace()`。 -...