sed 学习记录

watershitter

浏览: 43734 次
性别:
来自: 北京

最近访客更多访客>>

今晚吃肉

zhaojian0910

黑白电影

Suriy

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

sed

记录一下常用的sed 的要点

sed 的执行过程：
In default operation, sed cyclically copies a line of input, less its terminating newline character, into a pattern space (unless there is something left after a D command), applies in sequence all commands whose addresses select that pattern space, and at the end of the script copies the pattern space to standard output (except when -n is specified) and deletes the pattern space. Whenever the pattern space is written to standard output or a named file, sed will immediately follow it with a newline character.
Some of the commands use a hold space to save all or part of the pattern space for subsequent retrieval. The pattern and hold spaces will each be able to hold at least 8192 bytes.
--------------------------------------------------------------------

重要的空间名词：
input ：输入一般指待处理的文件
pattern space: ‘模式空间' sed 利用命令对读入的每行进行处理的 ‘空间’
output: 打印出来的空间
hold space：Some of the commands use a hold space to save all or part of the pattern space for subsequent retrieval.
貌似是 sed 执行过程中用来临时存储和替换的空间！只有 g G h H x 五个命令用到.
从 g，和G 的效果来看，默认为空行什么也没有。 sed g file，则file有多少行，就会打印出多少空行...

options：

-n:
Suppress the default output (in which each line, after it is examined for editing, is written to standard output). Only lines explicitly selected for output will be written.
解析：默认的，sed 每处理一行都打印到stdout，有了 -n，之后，则只打印“选择了”的行，
如： sed -n ‘/holy/p’ file，只打印有 /holy/ 选择的行
     没有-n， /holy/选则的行会被打印两遍，一遍默认，一遍是'p'的效果！
     而sed -n 's/holy/shit/g' file, 什么都打印！ p命令表示"explicitly selected for output", 其余默认都不打印

-f：
用来使用脚本跑sed命令
The script_files named by the -f option will consist of editing commands, one per line.

如： sed -f xxx file, xxx内容为's/holy/shit/g'
script 格式的注解：
The script consists of editing commands, one per line, of the following form:
[address[,address]]command[arguments]
Zero or more blank characters are accepted before the first address and before command.

address这个词会多次出现，大概表示“定址”的意思?
Addresses in sedAn address is either empty, a decimal number that counts input lines cumulatively across files, a "$" character that addresses the last line of input, or a context address (which consists of a regular expression as described in Regular Expressions in sed , preceded and followed by a delimiter, usually a slash). A command line with no addresses selects every pattern space.

A command line with one address selects each pattern space that matches the address.

A command line with two addresses selects the inclusive range from the first pattern space that matches the first address to the next pattern space that matches the second. (If the second address is a number less than or equal to the line number first selected, only one line will be selected.) Starting at the first line following the selected range, sed looks again for the first address. Thereafter the process is repeated.
Editing commands can be applied only to non-selected pattern spaces by use of the negation command "!"

command list:
Two of the commands take a command-list, which is a list of sed commands separated by newline characters, as follows:
{ command
command
. . .
}
The "{" can be preceded with blank characters and can be followed with white space. The commands can be preceded by white space. The terminating "}" must be preceded by a newline character and then zero or more blank characters.

类似
a \
text (append text) ，表示写法上在 slash'\'后换行，然后写要插入的text
（这些需要换行的写法一般出现在script里面吧，直接命令行不常用，有些偏高级）

N ：读入下一行，（吧下一行和当前 pattern space 行的内容以 \n 连接起来，改变文件的行号)
Append the next line of input to the pattern space, using an embedded newline character to separate the appended material from the original material. Note that the current line number changes.

D     Delete up to the first embedded newline in the pattern space. Start next cycle, but skip reading from the input if there is still data in the pattern space.
      删除直到遇到第一个 \n，"p to the first embedded newline"（删除的内容包括\n)。如果之前用了 N，命令，达到了两行拼接后删除前行的效果

待补充...

------------------------------------------------------------------
关于 , 和 ;
, 用于两个 addrees之间，如：
1,10 1到10行
/begin/,/end/
1,10!    ' ! ' 表示去饭，即 1,10 之外的行
-----
; 对于没一个读入pattern space后，以 ; 为命令为分隔，依次执行各个命令
如 sed -n '/aa/,/cc/ n; p' hhh
   其中，hhh文件内容为
   holy
   aa#
   bb#
   cc#
   dd#
   shit
则输出：
holy
bb#
dd#
shit
奇怪吧？解释。
1 sed 首先读入第一行 'holy' 到 patternspace，第一个命令 /aa/,/cc/ n, 地址被过滤，不应用于该行，第二个命令 p 打印其到 stdout，因此看到holy打印
2 第二个cycle开始，读入 aa# 到pattern space，满组 /aa/,/cc/ 地址过滤，命令n起效，效果1：Write the pattern space to standard output if the default output has not been suppressed，意图打印该行到stdout，但是被 -n 参数屏蔽，| 随后n命令的效果2 replace the pattern space with the next line of input-- 替换pattern space下一行 bb# ，| 之后第二个命令p，打印出 bb#
3 第三个cycle开始，注意此时sed不是读取第 bb#，而是读取 cc#，说明在第二个cycle中，n的效果2消耗了行。并影响了下一个cycle。也可以理解为 input里面的东西只允许被“拿出”一次， cursor是始终往前的！显示，cycle3打印出 dd#4 cycle 4 打印 shit，如 cycle1

;与 -e, ; 分隔的命令对对 pattern space中的"一行"(引号表示实际可能影响了几行，如N命令) 做连续处理。而 -e，则是对文本处理完一遍之后，接着开始下一轮，如：
sed -e 's/#.*//' -e '/^$/ d'
Removing comments and blank lines takes two commands. The first removes every character from the "#" to the end of the line, and the second deletes all blank lines.

（这些玩意好像使用用不上啊，理解起来还挺费劲....)

------------------------------------------------------------------

references：
1 http://pubs.opengroup.org/onlinepubs/007908799/xcu/sed.html
这篇简要而基础！非常棒！
附带扩展：unix使用工具的参数规则：
http://pubs.opengroup.org/onlinepubs/007908799/xbd/utilconv.html#usg
看完这个也许对阅读man手册有帮助

2 中文参考手册：
http://www.tsnc.edu.cn/tsnc_wgrj/doc/sed.htm

3 一个更详细的tutorial e文
http://www.grymoire.com/Unix/Sed.html#uh-64
详细，浅显易懂！

4 http://sed.sourceforge.net/sed1line_zh-CN.html
这里面提供了一些示例，解释不清楚（但是正确），考验自己理解的好地方！

分享到：