`
hillmover
  • 浏览: 34058 次
  • 性别: Icon_minigender_2
  • 来自: 上海
社区版块
存档分类
最新评论

Sed

阅读更多

Understanding the difference between current-line addressing in ed and global-line addressing in sed is very important. In ed you use addressing to expand the number of lines that are the object of a command; in sed, you use addressing to restrict the number of lines affected by a command.

command [options] script filename

sed -f scrptfile inputfile

$ sed '
> s/ MA/, Massachusetts/
> s/ PA/, Pennsylvania/
> s/ CA/, California/' list

The -n option suppresses the automatic output. When specifying this option, each instruction intended to produce output must contain a print command, p.
sed -n -e 's/MA/Massachusetts/p' list

awk -v var=value 'instruction' inputfile

.
Matches any single character except newline. In awk, dot can match newline also.
*
Matches any number (including zero) of the single character (including a character specified by a regular expression) that immediately precedes it.
[...] Matches any one of the class of characters enclosed between the brackets. A circumflex (^) as first character inside brackets reverses the match to all characters except newline and those listed in the class. In awk, newline will also match. A hyphen (-) is used to indicate a range of characters. The close bracket (]) as the first character in class is a member of the class. All other metacharacters lose their meaning when specified as members of a class.
^
First character of regular expression, matches the beginning of the line. Matches the beginning of a string in awk, even if the string contains embedded newlines.
$
As last character of regular expression, matches the end of the line. Matches the end of a string in awk, even if the string contains embedded newlines.
\{n,m\}
Matches a range of occurrences of the single character (including a character specified by a regular expression) that immediately precedes it. \{n\} will match exactly n occurrences, \{n,\} will match at least n occurrences, and \{n,m\} will match any number of occurrences between n and m. (sed and grep only, may not be in some very old versions.)
\
Escapes the special character that follows

Extended Metacharacters (egrep and awk):
+
Matches one or more occurrences of the preceding regular expression.
?
Matches zero or one occurrences of the preceding regular expression.
|
Specifies that either the preceding or following regular expression can be matched (alternation).
()
Groups regular expressions.
{n,m}
Matches a range of occurrences of the single character (including a character specified by a regular expression) that immediately precedes it. {n} will match exactly n occurrences, {n,} will match at least n occurrences, and {n,m} will match any number of occurrences between n and m. (POSIX egrep and POSIX awk, not in traditional egrep or awk.)

Inside square brackets, the standard metacharacters lose their meaning.

Special Characters in Character Classes
\ Escapes any special character (awk only)
- Indicates a range when not in the first or last position.
^ Indicates a reverse match only when in the first position.

The close bracket (]) is interpreted as a member of the class if it occurs as the first character in the class (or as the first character after a circumflex). The hyphen loses its special meaning within a class if it is the first or last character.
In awk, you could also use the backslash to escape the hyphen or close bracket wherever either one occurs in the range, but the syntax is messier.

Basic Regular Expressions (BREs), which are the kind used by grep and sed, and Extended Regular Expressions, which are the kind used by egrep and awk.

Character classes. A POSIX character class consists of keywords bracketed by [: and :]. The keywords describe different classes of characters such as alphabetic characters, control characters, and so on (see Table 3.3).
[:alnum:] Printable characters (includes whitespace)
[:alpha:] Alphabetic characters
[:blank:] Space and tab characters
[:cntrl:] Control characters
[:digit:] Numeric characters
[:graph:] Printable and visible (non-space) characters
[:lower:] Lowercase characters
[:print:] Alphanumeric characters
[:punct:] Punctuation characters
[:space:] Whitespace characters
[:upper:] Uppercase characters
[:xdigit:] Hexadecimal digits

Collating symbols. A collating symbol is a multicharacter sequence that should be treated as a unit. It consists of the characters bracketed by [. and .].

Equivalence classes. An equivalence class lists a set of characters that should be considered equivalent, such as e and è. It consists of a named element from the locale, bracketed by [= and =].

The vertical bar (|) metacharacter, part of the extended set of metacharacters, allows you to specify a union of regular expressions.
compan(y|ies)

$ egrep "(^| )[\"[{(]*book[]})\"?\!.,;:'s]*( |$)" bookwords
This file tests for book in various places, such as
book at the beginning of a line or
at the end of a line book
as well as the plural books and
"book of the year award"
to look for a line with the word "book"
A GREAT book!
A great book? No.
told them about (the books) until it
Here are the books that you requested
Yes, it is a good book for children
amazing that it was called a "harmful book" when
once you get to the end of the book, you can't believe

a special metacharacter for matching a string at the beginning of a word, \<, and one for matching a string at the end of a word, \>. Used as a pair, they can match a string only when it is a complete word.

$ gres '"[^"]*"' '00' sampleLine
.Se 00 "Appendix"

1........5
5........10
10.......20
100......200
$ sed 's/\([0-9][0-9]*\)\.\{5,\}\([0-9][0-9]*\)/\1-\2/' sample
1-5
5-10
10-20
100-200

his mistake is simply a problem of the order of the commands in the script.

Sed also maintains a second temporary buffer called the hold space. You can copy the contents of the pattern space to the hold space and retrieve them later.

A sed command can specify zero, one, or two addresses. An address can be a regular expression describing a pattern, a line number, or a line addressing symbol.
If no address is specified, then the command is applied to each line.
If there is only one address, the command is applied to any line matching the address.
If two comma-separated addresses are specified, the command is performed on the first line matching the first address and all succeeding lines up to and including a line matching the second address.
If an address is followed by an exclamation mark (!), the command is applied to all lines that do not match the address.

The line number refers to an internal line count maintained by sed. This counter is not reset for multiple input files.
Similarly, the input stream has only one last line. It can be specified using the addressing symbol $.
eg.
d
1d
$d
/^$/d
/^\.TS/,/^\.TE/d
50,$d
1,/^$/d #This example deletes from the first line up to the first blank line.
An exclamation mark (!) following an address reverses the sense of the match. For instance, the following script deletes all lines except those inside tbl input:
/^\.TS/,/^\.TE/!d

Braces ({}) are used in sed to nest one address inside another or to apply multiple commands at the same address.
/^\.TS/,/^\.TE/{
    /^$/d #to delete blank lines only inside blocks of tbl input
    s/^\.ps 10/.ps 8/
    s/^\.vs 12/.vs 10/
}

/---/!s/--/\\(em/g
If you find a line containing three consecutive hyphens, don't apply the edit. On all other lines, the substitute command will be applied.

Substitution
[address]s/pattern/replacement/flags
n A number(1 to 512) indicating that a replacement should be made for only the nth occurrence of the pattern.
g Make changes globally on all occurrences in the pattern space. Normally only the first occurrence is replaced.
p Print the contents of the pattern space.
w file
Write the contents of the pattern space to file.
The substitute command is applied to the lines matching the address. If no address is specified, it is applied to all lines that match the pattern. If a regular expression is supplied as an address, and no pattern is specifed, the substitute command matches what is matched by the address.

In the replacement section, only the following characters have special meaning:
& Replaced by the string matched by the regular expression.
\n Matches the nth substring previously specified in the pattern using "\(" and "\)".
\ Used to escape the ampersand, the blackslash, and the substitution command's delimiter. In addtion, it can be used to escape the newline and create a multiline replacement string.

#! /bin/sh
grep "^\.XX" $* | sort -u |
sed '
s/^\.XX \(.*\)$/\/^\\.XX \/s\/\1\/\1\//'

Delete
The delete command is also a command that can change the flow of control in a script. That is because once it is executed, no further commands are executed on the "empty" pattern space.

Append, Insert, and Change
[line-address]a\
text


[line-address]i\
text

[address]c\
text

The insert command places the supplied text before the current line in the pattern space.
The append command places it after the current line.
The change command replaces the content of the pattern space with the supplied text.
The text must begins on the next line. To input multiple lines of text, each successive line must end with a backslash, with the exception of the very last line.
E.g,
/<Larry's Address>/i\
4600 Cross Court \
French Lick, IN
The append and insert commands can be applied only a single line address, not a range of lines. The change command, however, can address a range of lines. In this case, it replaces all addressed lines with a single copy of the text. In other words, it deletes each line in the range but the supplid text is output only once.
E.g,
/^From /,/^$/c\
<Mail Header Removed>

The insert and append commands do not affect the contents of the pattern space. The supplied text will not match any address in subsequent commands in the script, nor can thse commands affect the text(different with s command). No matter what changes occur to alter the pattern space, the supplied text will still be output appropriately. Also, the supplied text does not affect sed's internal line counter(nor do s,d commands).

#cat data
line1

#sed '1{
i\
before line1
a\
after line1
s/line1/subline1\nsubline2/g
s/line/ /g}' data

before line1
sub 1
sub 2
after line1

Print line number
[address]=

The next command (n) outputs the contents of the pattern space and then reads the next line of input without returning to the top of the script.
[address]n
E.g,
/^\.H1/{
n
/^$/d
}
Match any line beginning with the string '.H1', then print that line and read in the next line. If that line is blank, delete it.

The quit command (q) causes sed to stop reading new input lines (and stop sending them to the output). (timesaver)
[address]q

 

分享到:
评论

相关推荐

    sed-4.2.1.rar包括sed-4.2.1-setup.exe、sed-4.2.1-dep.zip、sed-4.2.1-bin.zip

    **sed工具介绍** `sed` 是“流编辑器”(Stream Editor)的缩写,它是一种功能强大的文本处理工具,广泛应用于Linux和Unix系统中。`sed` 可以读取数据流,对输入的数据进行各种操作,如替换、删除、插入等,并将...

    sed-4.2.1-setup

    **sed-4.2.1-setup** 是一个安装程序,用于在计算机上部署 **sed** 工具的4.2.1版本。**sed**,全称“Stream Editor”,是Unix和类Unix操作系统中的一款强大文本处理工具。它能够对输入流(标准输入或文件)进行读取...

    windows下cmd程序sed命令所需文件

    在Windows环境下,`cmd`命令行工具通常用于执行各种系统级操作,而`sed`(流编辑器,Stream Editor)是Unix/Linux系统中一个强大的文本处理工具,它在Windows下的应用可能需要额外的配置。本篇文章将详细介绍如何在...

    Sed AWK编程指南

    ### Sed AWK编程指南知识点详解 #### 一、引言 在计算机科学领域,文本处理是必不可少的一部分。其中,`sed` 和 `awk` 是两种非常强大的文本处理工具,广泛应用于Linux/Unix环境中。本指南将详细介绍这两个工具的...

    sed.exe win x32 x64

    标题中的"sed.exe win x32 x64"指的是在Windows操作系统中,为32位(x32)和64位(x64)系统提供的sed命令行工具。sed(流编辑器Stream Editor)是一个功能强大的文本处理工具,常用于Linux和Unix系统中,但在Windows上...

    Shell、awk、sed面试题汇总(无答案).doc

    Shell、awk、sed 面试题汇总 以下是从给定的文件中生成的相关知识点: Shell 1. 变量赋值:在 Shell 中,可以使用多种方法来赋值变量,包括直接赋值、使用 `read` 命令、使用命令行参数和使用命令的输出。 2. ...

    通过sed截取一行匹配内容

    通过sed截取一行匹配内容 sed是一种在线编辑器,它一次处理一行内容。处理时,把当前处理的行存储在临时缓冲区中,称为“模式空间”(pattern space),接着用sed命令处理缓冲区中的内容,处理完成后,把缓冲区的...

    SED与AWK 高清第三版

    《SED与AWK 高清第三版》是一本专注于Linux系统中强大文本处理工具sed和awk的教程。在Linux环境中,sed和awk是不可或缺的工具,它们能够高效地处理大量文本数据,进行搜索、替换、格式化等操作,极大地提高了运维...

    使用sed grep工具过滤实例

    ### 使用sed与grep工具过滤实例详解 在处理大量数据时,掌握并熟练运用Linux环境下的文本处理工具至关重要。本文档将详细介绍如何利用sed与grep工具完成特定的数据过滤任务,包括去除特殊符号、按指定规则分割字符...

    基本的SED命令有大量的SED命令

    ### 基本的SED命令详解 #### 一、SED命令概述 SED(Stream Editor)是一种强大的文本处理工具,主要用于对文件进行批量编辑操作。它能够执行诸如替换、删除、插入等多种文本处理任务,尤其适合处理结构化数据或...

    sed 用法sed 用法sed 用法

    ### sed 命令用法详解 #### 一、sed 命令简介 `sed`(stream editor)是一款功能强大的文本处理工具,主要用于对文本流进行编辑与修改。它支持正则表达式,能够非常方便地实现字符串查找、替换等操作。 #### 二、...

    sed and awk 101 hacks.pdf

    Sed和Awk是UNIX和Linux系统中极为重要的流编辑器和文本处理工具,它们能够通过简单的命令或脚本高效处理文本文件,实现复杂的文本转换和报告生成。接下来,我们将根据提供的文件内容详细地说明Sed和Awk的关键知识点...

    sed和awk单行命令比较

    ### sed和awk单行命令比较 本文将对两种常见的文本处理工具——`sed`和`awk`进行详细的对比分析,并通过具体的示例来说明这两种工具在处理文本时的不同之处。 #### 1. 基本介绍 - **sed** (stream editor):是一种...

    《SED 单行脚本快速参考》

    ### SED与AWK在Linux下的应用技巧 #### 概述 在Linux系统中,`sed`(stream editor)和`awk`都是极其强大的文本处理工具。它们的主要用途是在命令行环境中对文件进行批量编辑、查找和替换等操作。虽然两者都能完成...

    Sed与awk_中英文高清版

    《Sed与Awk》是IT领域中关于文本处理的经典之作,主要讲解了两种强大的命令行工具:Sed(流编辑器)和Awk(数据处理语言)。这两款工具在Linux和Unix系统中广泛使用,尤其适用于数据提取、转换、报告生成等任务。 ...

    Sed - An Introduction and Tutorial by Bruce Barnett

    标题与描述:“Sed - An Introduction and Tutorial by Bruce Barnett” Sed,即Stream Editor(流编辑器),是一种功能强大的文本处理工具,广泛应用于Unix和类Unix系统中。它能够读取输入流,对其进行一系列预...

    SED单行脚本快速参考 / sed1line

    SED,全称Stream EDitor,是Unix环境下的流编辑器,主要用于对文本进行过滤和转换,支持正则表达式。它以行为单位对文本进行处理,并且可以执行插入、删除、替换以及其它复杂的文本转换操作。SED可以使用单行命令...

    sed1520_51单片机_SED1520_badlyzgv_源码

    《深入理解SED1520:51单片机与LCD显示技术的融合》 在电子工程领域,尤其是在嵌入式系统设计中,51单片机因其结构简单、易于编程而广泛应用于各种项目。而SED1520是一款常用的LCD控制器,常用于驱动字符或点阵液晶...

Global site tag (gtag.js) - Google Analytics