全面搜索正则表达式 grep and Regular Expressions -

universsky

浏览: 92303 次

最近访客更多访客>>

Janne

llartemisll

wjmtt1000

cooky98

博主相关

博客

微博

相册

留言

关于我

文章分类

全部博客 (169)

社区版块

存档分类

2013-05 ( 93)
2013-04 ( 59)
2013-03 ( 17)
更多存档...

全面搜索正则表达式 grep and Regular Expressions

grep and Regular Expressions

Pattern Matching—grep
z grep searches files for lines matching a
given pattern
z Normally each line that contains an
instance of the pattern is copied to the
standard output
z Usage: grep [options] pattern files
z If no file is specified, stdin is taken as the
input

grep简介：

grep （global search regular expression(RE) and print out the line,全面搜索正则表达式并把搜索到的行打印出来）是一种强大的文本搜索工具

，它能使用正则表达式搜索文本，并把匹配的行打印出来。Unix的grep家族包括grep、egrep和fgrep。egrep和fgrep的命令只跟grep有很小不同。egrep

是grep的扩展，支持更多的re元字符， fgrep就是fixed grep或fast grep，它们把所有的字母都看作单词，也就是说，正则表达式中的元字符表示回其

自身的字面意义，不再特殊。linux使用GNU版本的grep。它功能更强，可以通过-G、-E、-F命令行选项来使用egrep和fgrep的功能。

grep的工作方式是这样的，它在一个或多个文件中搜索字符串模板；如果模板包括空格，则必须用符号应用；模板后的所有字符串将被看作文件名；搜索

的结果被送到屏幕，不影响原文件内容。 grep可用于shell脚本，因为grep通过返回一个状态值来说明搜索的状态，如果模板搜索成功，则返回0，如果

搜索不成功，则返回1，如果搜索的文件不存在，则返回2。我们利用这些返回值可以进行一些自动化的文本处理工作。

grep选项：

grep的命令格式：

grep [options] PATTERN [file...]

grep [options] [-e PATTREN |-f FILE] [FILE...]

grep选项：
-c 只输出匹配行的计数
-i 不区分大小写（只使用于单字符）
-h 查询多个文件时不显示文件名
-l 查询多个文件时只输出包含匹配字符的文件名
-n 显示匹配行及行号
-s 不显示不存在或无匹配文本的错误信息
-v 显示不包含匹配文本的所有行。
-f 从文件中提取模板，空文件包含0个模板，所以什么都不匹配
-L 打印不匹配的字符清单

● –q 取消显示，只返回退出状态，0则表示找到了匹配的行
-w 如果被\<和\>引用，就把表达式做为一个单词搜索
-b 打印匹配行所在的块号码。

范例： (1)：范例文本：为使范例易懂，我们下面的范例大部分使用名为hosts的文本，其文本内容如下： genesis1 192.168.100.134
genesis2 192.168.100.137
genesis3 192.168.100.138
genesis /home/app_gen
liu cai lin

(2):查询多个文件：
● 在当前目录下所有.txt文件中查找字符串"m48"
$grep “m48” *.txt

(3)：行匹配：

● 显示包含字符串的行数： $grep -c "genesis" hosts

grep正则表达式元字符集：
^ 锚定行的开始如：'^grep'匹配所有以grep开头的行。
$ 锚定行的结束如：'grep$'匹配所有以grep结尾的行。
. 匹配一个非换行符的字符如：'gr.p'匹配gr后接一个任意字符，然后是p。
* 匹配零个或多个先前字符如：'*grep'匹配所有一个或多个空格后紧跟grep的行。 .*一起用代表任意字符。
[] 匹配一个指定范围内的字符，如'[Gg]rep'匹配Grep和grep。
[^] 匹配一个不在指定范围内的字符，如：'[^A-FH-Z]rep'匹配不包含A-R和T-Z的一个字母开头，紧跟rep的行。
$..$ 标记匹配字符，如'$love$'，love被标记为1。
\ 锚定单词的开始，如:'\匹配包含以grep开头的单词的行。
\> 锚定单词的结束，如'grep\>'匹配包含以grep结尾的单词的行。
x\{m\} 重复字符x，m次，如：'0\{5\}'匹配包含5个o的行。
x\{m,\} 重复字符x,至少m次，如：'o\{5,\}'匹配至少有5个o的行。
x\{m,n\}重复字符x，至少m次，不多于n次，如：'o\{5,10\}'匹配5--10个o的行。
\w 匹配文字和数字字符，也就是[A-Za-z0-9]，如：'G\w*p'匹配以G后跟零个或多个文字或数字字符，然后是p。
\b 单词锁定符，如: '\bgrep\b'只匹配grep。

常用的 grep 选项有：
-c 只输出匹配行的个数。
-i 不区分大小写（只适用于单字符）。
-h 查询多文件时不显示文件名。
-l 查询多文件时只输出包含匹配字符的文件名。
-n 显示匹配行及行号。
-s 不显示不存在或无匹配文本的错误信息。
-v 显示不包含匹配文本的所有行。
-V 显示软件版本信息
使用grep匹配时最好用双引号引起来，防止被系统误认为参数或者特殊命令，也可以匹配多个单词。

关于匹配的实例：
grep -c "48" test.txt 统计所有以“48”字符开头的行有多少
grep -i "May" test.txt 不区分大小写查找“May”所有的行）
grep -n "48" test.txt 显示行号；显示匹配字符“48”的行及行号，相同于 nl test.txt |grep 48）
grep -v "48" test.txt 显示输出没有字符“48”所有的行）
grep "471" test.txt 显示输出字符“471”所在的行）
grep "48;" test.txt 显示输出以字符“48”开头，并在字符“48”后是一个tab键所在的行
grep "48[34]" test.txt 显示输出以字符“48”开头，第三个字符是“3”或是“4”的所有的行）
grep "^[^48]" test.txt 显示输出行首不是字符“48”的行）
grep "[Mm]ay" test.txt 设置大小写查找：显示输出第一个字符以“M”或“m”开头，以字符“ay”结束的行）
grep "K…D" test.txt 显示输出第一个字符是“K”，第二、三、四是任意字符，第五个字符是“D”所在的行）
grep "[A-Z][9]D" test.txt 显示输出第一个字符的范围是“A-D”，第二个字符是“9”，第三个字符的是“D”的所有的行
grep "[35]..1998" test.txt 显示第一个字符是3或5，第二三个字符是任意，以1998结尾的所有行
grep "4\{2,\}" test.txt 模式出现几率查找：显示输出字符“4”至少重复出现两次的所有行
grep "9\{3,\}" test.txt 模式出现几率查找：显示输出字符“9”至少重复出现三次的所有行
grep "9\{2,3\}" test.txt 模式出现几率查找：显示输出字符“9”重复出现的次数在一定范围内，重复出现2次或3次所有行
grep -n "^$" test.txt 显示输出空行的行号
ls -l |grep "^d" 如果要查询目录列表中的目录同：ls -d *
ls -l |grep "^d[d]" 在一个目录中查询不包含目录的所有文件
ls -l |grpe "^d…..x..x" 查询其他用户和用户组成员有可执行权限的目录集合

file name : cars

Toyota Camry 88 Red 50000 15000
Chevy nova 79 Green 63000 5000
ford escort 81 Blue 80000 5000
Honda Civic 83 red 45000 8000
toyota tercel 86 Yellow 140000 9500
Pfatt Credo 58 Black 215000 600
Ford Bronco 87 Pink 99000 9800
Chevy Nomad 83 blue 118000 6000
ford Mustang 67 White 58000 12000
honda Accord 85 red 40000 3000
Toyota corona 71 Blue 180000 2500
Ford Futura 95 White 50 35000
Toyota Camry 94 Red 10000 26000
Holden Apollo 80 Brown 20000 5000
ford Mustang 67 White 58000 12000
Holden Apollo 80 Brown 20000 5000
Ford Bronco 87 Pink 99000 9800
Holden Apollo 80 Brown 20000 5000
Holden Apollo 80 Brown 20000 5000
honda Accord 85 red 40000 3000
Honda Civic 83 red 45000 8000
Holden Apollo 80 Brown 20000 5000
toyota tercel 86 Yellow 140000 9500
Holden Apollo 80 Brown 20000 5000
Holden Apollo 80 Brown 20000 5000
Chevy Nomad 83 blue 118000 6000
ford Mustang 67 White 58000 12000
honda Accord 85 red 40000 3000
Holden Apollo 80 Brown 20000 5000

Example of grep
$ grep Ford cars
Ford Bronco 87 Pink 99000 9800
Ford Futura 95 White 50 35000
Ford Bronco 87 Pink 99000 9800

Regular Expressions
z Regular expressions are character
sequences that describe a family of
matching strings.
z They are used in many Unix tools: grep,
awk, sed, vi, lex are examples.
z Regular expressions are formed out of a
sequence of normal and special (meta)
characters.

RE rules
z Any character other than a metacharacter
matches itself
z Any character (including metacharacters)
preceded by a backslash matches the
character (remember how to escape
metacharacters?)

Metacharacters
z . matches any single character (for sh- ?)
z ^ matches the beginning of a line
z $ matches the end of a line
z A regular expression followed by an
asterisk (*) matches zero of more
occurrences of the preceding regular
expression

[ ] defines a set of characters
y It matches any single character in the set.
y Examples: [A-Z], [aeiou]
x range of values specified with a dash - as in A-Z.
y Note: Characters *, ^, $, and \ lose their
special meaning inside the square brackets.
y If the first character in the bracket is ^, then
it matches any single character not in the
set. [^aeiou] means match any character other then
a, e, I, o and u.

Metacharacters for
Extended REs
z Available for egrep
z A regular expression (RE) followed by a +
matches one or more matches of the
regular expression.
z An RE followed by a ? matches zero or
one match of the regular expression.
z r1|r2 will match if there is a match for r1
or for r2

Rules of matching
z r1r2
y Two regular expressions concatenated match
a match of the first followed by a match of
the second
z An RE matches the longest possible string
starting as far towards the beginning as
possible

Rules (continued)
z If an RE is composed of two RE's, the first
will match as long a string as possible, but
will not exclude a match of the second.
z Quoted parentheses $ and $ can be used
for grouping RE's.

Examples:
RE matches
thing thing anywhere in the line
^thing thing at the beginning of the line
thing$ thing at the end of the line
^thing$ Line that contains only thing
[tT]hing thing or Thing anywhere in the line
thing[0-9] thing followed by a digit anywhere
thing[^0-9] thing followed by any character other than a
digit
thing.*thing thing followed by any number of characters
followed by thing

List the names of all subdirectories in the
current directory
ls -l | grep '^d'
z List the files others can read
ls -l | grep '^.......r..'
z List all words in /usr/dict/words that have
all the vowels in order (e.g. the word
facetious)
grep '[^aeiou]*a[^aeiou]*e[^aeiou]*i[^aeiou]*o[^aeiou]*u[^aeiou]*' /usr/dict/words

Major options for grep
z -c displays only a count of matching lines
z -i ignores the upper and lower case
distinctions in pattern matching
z -l lists only the names of files containing
matching lines
z -n precedes each matching line with its
line number in the file
z -v displays all lines that do not match

More Examples
z Check whether srini is logged on
x who | grep srini
z List all filenames that does not end in h
x ls | grep -v '.*h’ (think another way without using -v option)
z Print message headers of all messages in
the mailbox
x grep From $MAIL (what is the value of MAIL variable?)

sed: stream editing
sed [ - n ] [ -e script] [ -f scriptfile ] {inputfile}*
z Edits an input stream according to a given
set of editing commands (called sed
scripts).
z The editing command may be given on
the command line (default)
y or in a scriptfile (using -f option)

sed options (continued)
z For multiple editing commands on the
command line
y precede each editing command with -e flag
z -n option is to output only edited lines
y the default is to output every line

sed commands
address a\
text
Append text after the line specified
by the address
addressRange c\
text
Replace the text specified by
addressRange with text
addressRange d Delete the text specified by
addressRange
address i\
text
Insert text after the line specified by
address
address r fileName Append the contents of fileName
after the line specified by address
addressRange
s/expr/str/
Substitute the first occurrence of RE
expr by the string str
addressRange
s/expr/str/g
Substitute every occurrence of the
RE expr by the string str

Commands (continued)
z addressRange p
y Print the specified line(s), usually used with
the -n option
z q
y Quit after printing the current line
z !
y Don't do
z There is more... Refer man pages

Address specification
z An address is
y a line number, or
y a regular expression
x enclosed within two slashes, or
y a $ indicating the last line
z If no address is specified
x The command is applied to all the lines in input

Data file for sed examples
2 25 114 register
5 20 188 sphere
12 29 176 trapeg
1 25 110 sphere
10 40 193 whereis
29 114 671 total

Substitution
$ sed 's/sphere/SPHERE/' sfile
2 25 114 register
5 20 188 SPHERE
12 29 176 trapeg
1 25 110 SPHERE
10 40 193 whereis
29 114 671 total

Substitution within an address range
$ sed '1,3s/sphere/SPHERE/' sfile
2 25 114 register
5 20 188 SPHERE
12 29 176 trapeg
1 25 110 sphere
10 40 193 whereis
29 114 671 total

Specifying address using RE
$ sed '/sphere/s/1/ONE/' sfile
2 25 114 register
5 20 ONE88 sphere
12 29 176 trapeg
ONE 25 110 sphere
10 40 193 whereis
29 114 671 total

Global substitution
$ sed '/sphere/s/1/ONE/g' sfile
2 25 114 register
5 20 ONE88 sphere
12 29 176 trapeg
ONE 25 ONEONE0 sphere
10 40 193 whereis
29 114 671 total

Specifying address range using RE
$ sed '/r.*g/,/r.*g/s/1/ONE/g' sfile
2 25 ONEONE4 register
5 20 ONE88 sphere
ONE2 29 ONE76 trapeg
1 25 110 sphere
10 40 193 whereis
29 114 671 total

Specifying address range using RE
$ sed '/sphere/,/trapeg/s/1/ONE/g' sfile
2 25 114 register
5 20 ONE88 sphere
ONE2 29 ONE76 trapeg
ONE 25 ONEONE0 sphere
ONE0 40 ONE93 whereis
29 ONEONE4 67ONE total

Substitution using a pattern
$ sed '/sp.*e/s//CONE/' sfile
2 25 114 register
5 20 188 CONE
12 29 176 trapeg
1 25 110 CONE
10 40 193 whereis
29 114 671 total

Do default action and quit early
$ sed 2q sfile
2 25 114 register
5 20 188 sphere
z Printing only certain lines
$ sed -n 3p sfile
2 25 114 register
5 20 188 sphere
12 29 176 trapeg

Don't print lines matching the
address
$ sed -n '/9/!p' sfile
2 25 114 register
5 20 188 sphere
1 25 110 sphere

using brackets
$ sed 's/$sphere$/Shake \1/' sfile
2 25 114 register
5 20 188 Shake sphere
12 29 176 trapeg
1 25 110 Shake sphere
10 40 193 whereis
29 114 671 total
z The brackets $ and $ can be used to tag
regular expressions. They can be referred
later as \1, \2 etc.

Replace all occurrences of Fred and George in
a file data to George and Fred
sed 's/$Fred$ and $George$/\2 and \1/g' data
z Using multiple commands on command
line
y Capitalise all occurrences of a and e
sed -e s/a/A/g -e s/e/E/g data

tr—Character translation
(Glass, p. 280)
z Translates characters from one character
set to another
tr string1 string2
z string1 specifies the source character set
and string2 the target character set
z tr maps all characters in its standard input
from character set string1 to character set
string2

tr—Example
$ tr a-z A-Z < trdata
GO CART
RACING
$ tr abc DEF < trdata
go FDrt
rDFing

Varying string lengths
z If length of string2 is less than the length of
string1
y string2 is padded by repeating its last
character
z Example
tr a-c DE < trdata
go EDrt
rDEing

-c Complements string1 before
performing the mapping
z -d Causes every character is string1 to be
deleted from standard input
z -s Causes every repeated output character
to be condensed into a single instance
Replace every character (including new
lines) other than a by X
$ tr -c a X < trdata
XXXXaXXXXaXXXXX$
z Delete all a-c characters
$ tr -d a-c < trdata
go rt
ring

Replace characters other than a to z by a
new line (i.e. \012)
$ tr -c a-z '\012'
go
cart
racing
z Suppress repeated characters
$ tr -s acr efe < trdata
go fet
efing

分享到：