awk学习笔记

ych4865

浏览: 6842 次
性别:
来自: 杭州

最近访客更多访客>>

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

awk

awk 学习笔记：
------------
·模式：
开始-主输入循环-结束

·主输入循环：模式匹配-记录-字段

·变量表达的字段：
[root@localhost ~]# echo a b c|awk 'BEGIN {one=1;two=2} {print $(one + two)}'
c

·-f选项：
[root@localhost ~]# awk -f list.awk name.log
andy,3233
steve,77899
[root@localhost ~]# cat list.awk
BEGIN { FS = " " }
{print $1 "," $2}
[root@localhost ~]# cat name.log
andy 3233
steve 77899

·分割符：
两个连续的分隔符间的字段值为空串。

·多个分隔符：（使用正则表达式表示）
[root@localhost ~]# awk -F "[:,]" '{print $1,$3}' name.log
andy xx
steve yy
[root@localhost ~]# cat name.log
andy,3233:xx
steve,77899:yy

·awk中对于变量赋值，空格为字符串间的连接操作符。

·模式匹配+算术运算：
使用awk，对于空行进行计数的脚本：
[root@localhost ~]# awk -f null_row.awk name.log
2
[root@localhost ~]# cat name.log
andy,3233:xx
steve,77899:yy

end,0:z
[root@localhost ~]# cat null_row.awk
/^$/ {
        ++x
}
END {
        print x
}

·算术运算：
求取平均值脚本：
[root@localhost ~]# awk -f avg_grade.awk grade.log
andy 79.75
lily 62.75
lucy 70
steve 75.75
ann 76
[root@localhost ~]# cat avg_grade.awk
{
total = $2 + $3 + $4 + $5
avg = total / 4
print $1,avg
}
[root@localhost ~]# cat grade.log
andy 86 78 99 56
lily 66 70 59 56
lucy 77 78 69 56
steve 82 78 87 56
ann 80 78 90 56

·系统变量
FS:输入字段分割符
OFS:输出字段分割符
RS:输入记录分隔符
ORS:输出记录分割符
NF:当前记录中字段的总数
NR:当前的记录数
FILENAME:当前输入处理的文件名

·横跨处理多行记录
[root@localhost ~]# awk -f multi_row.awk multi_row.log
andy steve 678-888
lucy lee 890-777
[root@localhost ~]# cat multi_row.log
andy steve
hangzhou
789 street
678-888

lucy lee
hangzhou
301 street
890-777
[root@localhost ~]# cat multi_row.awk
BEGIN {
        FS = "\n"
        RS = ""
}
{ print $1,$NF }

·关系操作符（主要用于模式匹配）
大于-等于-小于-匹配-不匹配：
等于的例子：
[root@localhost ~]# awk -F "[:,]" 'NF == 3 {print $1}' name.log
andy
steve
end
[root@localhost ~]# cat name.log
andy,3233:xx
steve,77899:yy

end,0:z

匹配的例子：
[root@localhost ~]# awk -F "[:,]" '$1 ~ /andy/ {print $1,$3}' name.log
andy xx

·逻辑（布尔）操作符
逻辑或-与-非
与的例子：
[root@localhost ~]# awk -F "[:,]" '$1 ~ /andy/ && $2 == 3233 {print $1,$2}' name.log
andy 3233

·格式化打印
print：打印字符串和字段，自动在句末加入回车；
printf：格式化输出，不会在句末自动加入回车，需要明确指定“\n”
[root@localhost ~]# awk '{printf "|%-10s|%3d\n",$1,$2}' grade.log
|andy      | 86
|lily      | 66
|lucy      | 77
|steve     | 82
|ann       | 80
[root@localhost ~]# cat grade.log
andy 86 78 99 56
lily 66 70 59 56
lucy 77 78 69 56
steve 82 78 87 56
ann 80 78 90 56

·向awk脚本传递参数（变量one、two）
[root@localhost ~]# awk -f var_input.awk one=1 two=2 grade.log
87 80
67 72
78 80
83 80
81 80
[root@localhost ~]# cat grade.log
andy 86 78 99 56
lily 66 70 59 56
lucy 77 78 69 56
steve 82 78 87 56
ann 80 78 90 56
[root@localhost ~]# cat var_input.awk
{
$2 += one
$3 += two
print $2,$3
}
另外一种获取方法：（使用-v选项，POSIX awk支持）
[root@localhost ~]# awk -v one=1 -v two=2 -f var_input.awk grade.log
87 80
67 72
78 80
83 80
81 80

#####进阶部分#####
·条件语句（基本上合C语言语法类似，下同）
一个例子：计算平均分，并显示为等级：
[root@localhost ~]# awk -f gtoclass.awk grade.log
andy      79.75    C
lily      62.75    D
lucy      70.00    C
steve     75.75    C
ann       76.00    C
[root@localhost ~]# cat grade.log
andy 86 78 99 56
lily 66 70 59 56
lucy 77 78 69 56
steve 82 78 87 56
ann 80 78 90 56
[root@localhost ~]# cat gtoclass.awk
{
total = $2 + $3 + $4 + $5
avg = total / ( NF -1 )
if (avg >= 90) class = "A"
else if (avg >= 80) class = "B"
else if (avg >= 70) class = "C"
else if (avg >= 60) class = "D"
else class = "E"
printf "%-10s%5.2f%5s\n",$1,avg,class
}

·循环语句
for while do-while
求平均值的另外一种方法：
[root@localhost ~]# awk -f for_avg.awk grade.log
andy 79.75
lily 62.75
lucy 70
steve 75.75
ann 76
[root@localhost ~]# cat grade.log
andy 86 78 99 56
lily 66 70 59 56
lucy 77 78 69 56
steve 82 78 87 56
ann 80 78 90 56
[root@localhost ~]# cat for_avg.awk
{
total = 0
for ( i =2; i <= NF; ++i )
        total += $i
avg = total / (NF - 1)
print $1,avg
}

使用while时的类似方法：
[root@localhost ~]# awk -f while_avg.awk grade.log
andy 79.75
lily 62.75
lucy 70
steve 75.75
ann 76
[root@localhost ~]# cat while_avg.awk
{
total = 0
i = 2
while ( i <= NF )
{
        total += $i
        i += 1
}
avg = total / (NF - 1)
print $1,avg
}

可以参考的一个综合的求阶乘的例子：
[root@localhost ~]# ./fact
enter number:5
the fact of 5 is 120
[root@localhost ~]# cat fact
awk '
BEGIN {
        printf "enter number:"
}

$1 ~ /^[0-9]+$/ {
        number = $1
        if ( number == 0 )
                fact = 1
        else
                fact = number
        for ( x = number - 1;x > 1;x-- )
                fact *= x
        printf "the fact of %d is %g \n",number,fact
        exit
}

{printf "invalid.enter number again:"}
' -

·影响程序流控制的语句
break：作用于循环语句，跳出循环
continue：作用于循环语句，跳出，进入下一个循环
next：作用于主输入循环，读取下一行输入行，返回到脚本顶部
exit：作用于主输入循环，退出主输入循环，将控制转移给END规则

·数组
选用NR作为数组下标的例子：
[root@localhost ~]# cat array_avg.awk
{
total = 0
for ( i =2; i <= NF; ++i )
        total += $i
avg[NR] = total / (NF - 1)
}

END {
for ( x = 1;x <= NR;x++ )
        class_avg_total += avg[x]
class_avg = class_avg_total / NR
for ( x = 1;x <=NR;x++ )
        if ( avg[x] >= class_avg )
                ++above_avg
        else
                ++below_avg
print "class avg:",class_avg
print "above avg:",above_avg
print "below avg:",below_avg
}
[root@localhost ~]# awk -f array_avg.awk grade.log
class avg: 72.85
above avg: 3
below avg: 2

·关联数组
数组的下标可以为字符串或数值
可以用于构造类似Python中的字典数据结构。
进行关联数组访问时，其输出数组条目顺序是随机的。
使用关联数组进行达到分数等级的个数的一个例子，综合以上各个内容：
[root@localhost ~]# awk -f fin_array_avg.awk grade.log
andy 79.75 C
lily 62.75 D
lucy 70 C
steve 75.75 C
ann 76 C
class avg: 72.85
above avg: 3
below avg: 2
C: 4
D: 1
[root@localhost ~]# cat fin_array_avg.awk
{
total = 0
for ( i =2; i <= NF; ++i )
        total += $i

avg_arr[NR] = total / (NF - 1)
avg = avg_arr[NR]

if (avg >= 90) class = "A"
else if (avg >= 80) class = "B"
else if (avg >= 70) class = "C"
else if (avg >= 60) class = "D"
else class = "E"
++class_arr[class]
print $1,avg,class
}

END {
for ( x = 1;x <= NR;x++ )
        class_avg_total += avg_arr[x]
class_avg = class_avg_total / NR

for ( x = 1;x <=NR;x++ )
        if ( avg_arr[x] >= class_avg )
                ++above_avg
        else
                ++below_avg
print "class avg:",class_avg
print "above avg:",above_avg
print "below avg:",below_avg

for ( class_x in class_arr )
        print class_x ":",class_arr[class_x] | "sort"
}

·split进行数组创建
[root@localhost ~]# echo 2 |awk -f split_month.awk
2 II
[root@localhost ~]# cat split_month.awk
BEGIN {
split("I,II,III,IV,V",numb,",")
}

$1 > 0 && $1 <= 5 {
print $1,numb[$1]
exit
}

{
        print "valid number."
        exit
}

·多维数组
模拟引用多维数组
简单的例子：
BEGIN {
for （i = 1；i <= 2; ++i）
for ( j = 1; j <=2; ++j)
bitmap[i,j] = "O"
}
...

·系统变量数组
ARGV:awk＋附带的参数（-f参数除外）组成的数组，下标从0开始；
ARGC：参数数组个数。

·函数
算术函数：
三角函数-对数函数-平方根函数-随机函数-取整函数
举例如下：
[root@localhost ~]# awk -f int_func.awk
33.3333
33
[root@localhost ~]# cat int_func.awk
BEGIN {
print 100/3
print int(100/3)
exit
}

字符串函数：
gsub(r,s,t):在t中将r匹配到所有字符串，匹配为s ；（t未输入时，默认为$0）
sub(r,s,t):在t中将r首次匹配到字符串，匹配为s ；（t未输入时，默认为$0）

index(s,t):返回t在s中的位置；否则返回0；
match（s，r）：返回正则表达式r在s中的起始位置；否则返回0；

substr（s，p，n）：返回s中p位置之后的那个字符；（n未输入时，返回p之后剩余的字符）
split（s，a，sep）：将s根据sep分割后写入数组a中，返回数组元素个数；

length（s）：返回s的长度；（s未输入时，默认为$0）
sprintf（“fmt”，expr）对于expr使用fmt格式说明

tolower（s）：大写转化为小写
toupper（s）：小写转化为大写

一些例子：
index和substr综合：
将第一个单词的首字符变为大写：
[root@localhost ~]# echo abc |awk -f index_substr.awk
Abc
[root@localhost ~]# cat index_substr.awk
BEGIN {
upper="ABCDEFGH"
lower="abcdfegh"
}

{
firstchar=substr($1,1,1)
if (charplace = index(lower,firstchar))
        $1 = substr(upper,charplace,1)substr($1,2)
print $0
}

替换函数gsub和sub：
gsub的正则表达式替换，&为复制模式字符，\为转义符：（基本规则类似sed）
[root@localhost ~]# echo abc|awk '{gsub(/ab/,"\\start & end\\",$1);print $1}'
\start ab end\c

另外一个例子：
[root@localhost ~]# cat grade.log
andy 86 78 99 56
lily 66 70 59 56
lucy 77 78 69 56
steve 82 78 87 56
ann 80 78 90 56
[root@localhost ~]# cat sub_gsub.awk
awk '
{
gsub(/"/,"")
if (sub(/an/,"An")) print
if (sub(/li/,"Li")) print
}' $*
[root@localhost ~]# ./sub_gsub.awk grade.log
Andy 86 78 99 56
Lily 66 70 59 56
Ann 80 78 90 56

match函数：
函数两个系统变量：RSTART,RLENGTH;
RSTART:匹配子串的开始位置；此为返回值，不匹配时，返回为0
RLENGTH:匹配字符串的字符数。

如使用match进行正则表达式的字符大小匹配，不同awk版本的实现不一致；
遇到不区分大小写的情况，如下：
[root@localhost ~]# cat test.awk
/[a-z]/{
print $0
}
[root@localhost ~]# echo ABC|awk -f test.awk
ABC
[root@localhost ~]# echo abc|awk -f test.awk
abc
[root@localhost ~]# echo 123|awk -f test.awk
[root@localhost ~]#

自定义函数：
在脚本的模式规则定义之前声明；
在脚本内表达式可以使用的地方都可以调用自定义函数。
调取函数时，调取函数输入的参数，在被调函数内，使用其的“拷贝”值，不影响该参数调取函数的继续使用；
在被调函数内定义的变量，在调取函数中也可以使用；想要避免该情况，将该变量在被调函数声明时作为形参进行声明即可。
一个简单的例子：
[root@localhost ~]# echo 123 |awk -f fun-test.awk
call fun_test: with test parameter.
name: TEST
[root@localhost ~]# cat fun-test.awk
function fun_test(name)
{
print "name:",name
}

{
print "call fun_test: with test parameter."
fun_test("TEST")
}

·其他函数：
getline函数：
getline和next类似都是读取下一行，但是，有不同点：
next：读取下一行后，返回输入循环首语句；
getline：读取下一行，程序继续进行，不会返回循环首语句。
[root@localhost ~]# awk -f getline-test.awk grade.log
ann 80 78 90 56
[root@localhost ~]# cat getline-test.awk
/steve/{
getline
print $0
}
[root@localhost ~]# cat grade.log
andy 86 78 99 56
lily 66 70 59 56
lucy 77 78 69 56
steve 82 78 87 56
ann 80 78 90 56

使用getline读取文件和管道输入:
[root@localhost ~]# echo |awk -f getline-input.awk
andy 86 78 99 56
lily 66 70 59 56
lucy 77 78 69 56
steve 82 78 87 56
ann 80 78 90 56
root     :0           2012-12-03 22:05
[root@localhost ~]# cat getline-input.awk
{
while(getline < "grade.log" > 0)
print

"who"|getline me
print me
close("who")
}

system函数：
[root@localhost ~]# awk -f getfile.awk
please input file name:grade.log
andy 86 78 99 56
lily 66 70 59 56
lucy 77 78 69 56
steve 82 78 87 56
ann 80 78 90 56
cat end.
[root@localhost ~]# cat getfile.awk
BEGIN{
printf "please input file name:"
getline < "-"
file=$0
system("cat " file)
print "cat end."
}

print输出重定向到文件和管道：
一个简单的例子：
[root@localhost ~]# awk -f printfile.awk
please input file name:grade.log
grade.log
cat filename end.
file: grade.log word count:
10
[root@localhost ~]# cat printfile.awk
BEGIN{
printf "please input file name:"
getline < "-"
file=$0
print file >"filename"
system("cat filename")
print "cat filename end."
print "file:",file,"word count:"
print file |"wc -c"
}

参考资料：
sed & awk：Doughherty & Robbins.

分享到：