正则习点 --- 12

java-mans

浏览: 11444934 次

最近访客更多访客>>

morelily

devcang

serisboy

chjinniu

博主相关

博客

微博

相册

留言

关于我

文章分类

全部博客 (15568)

社区版块

存档分类

2012-08 ( 1467)
2012-07 ( 1872)
2012-06 ( 1125)
更多存档...

4.5. More About Greediness andBacktracking

4.5.1 Problems of Greediness

既然，我们知道「.*」可以匹配到一行文本的末尾。

那我们用「”.*”」匹配下面的文本：

Thename “McDonald’s” is said “makudonarudo” in Japanese.

程序如下：

#! /usr/bin/perl -w

$str1 = "The name \"McDonald's\" is said \"makudonarudo\" in Japanese.";

$str1 =~ /(".*")/;

print $1;

其执行结果为：

$perl mre45_1

"McDonald's"is said "makudonarudo"

这显然不是我们期望的结果。那么，我们如何能够只取得‘”mcDonald’s”’呢？关键的问题在于要认识到，我们希望匹配的不是双引号之间的“任何文本”，而是“除双引号以外的任何文本”。使用「[^”]*」替代「.*」，就不会出现上面的问题。

程序如下：

#! /usr/bin/perl -w

$str1 = "The name \"McDonald's\" is said \"makudonarudo\" in Japanese.";

# $str1 =~ /(".*")/;

$str1 =~ /("[^"]*")/;

print $1;

第一个双引号匹配之后，「[^”]*」会匹配尽可能多的字符。因为「[^”]」无法匹配之后的双引号，所以，匹配的字符串就是‘McDonald’s’。此时，控制权转移到正则表达式末尾的「”」。而他刚好能够匹配，所以获得全局匹配。

4.5.2 Multi-Character “Quotes”

我们再来看一个例子使用「.*」匹配多字符“引文”

Billionsand Zillions of suns…

正则表达式「.*」中匹配优先的「.*」会一直匹配该行结尾的字符，回溯只会进行到「」能够匹配为止，也就是最后一个’’,而不是与匹配开头的「」对应得「」。

程序如下：

#! /usr/bin/perl -w

$str2 = "<B>Billions</B> and <B>Zillions</B> of suns";

$str2 =~ m{(<B>.*</B>)};

print $1;

4.5.3 Using Lazy Quantifiers

上面的问题之所以会出现，原因在于标准量词是匹配优先的。某些NFA支持忽略优先的量词，*?就是与*对应得忽略优先量词。

我们用「.*?」来匹配上例中的字符串：

Billionsand Zillions of suns…

开始的「」匹配之后，「*?」首先决定不需要匹配任何字符，因为他是忽略优先的。于是控制权交给后面的「<」符号：

‘…_▲Billions…’

「.*?_▲」

此时「<」无法匹配，所以控制权交还给「.*?」，因为还有未尝试过的匹配可能（事实上能够进行多次匹配尝试）。它的匹配尝试是步步为营的(begrudgingly)，先用点号来匹配…Billions…中带下划线的B。此时，*?又必须选择，是继续尝试匹配，还是忽略？因为它是忽略优先的，会首先选择忽略。接下来的「<」仍然无法匹配，所以「.*」必须继续尝试未匹配的分支。在这个过程重复8次后，「.*」最终匹配了‘Billions’，此时，接下来的「<」(以及整个「」)都能匹配：

…Billions andZillions of suns…

程序如下：

#! /usr/bin/perl -w

$str2 = "<B>Billions</B> and <B>Zillions</B> of suns";

$str2 =~ m{(<B>.*?</B>)};

print $1;

如果用上述程序匹配下列字符串会怎样呢？

…Billions and Zillions of suns…

这种情况下的结果不是用户期望的。不过，「.*?」必然会匹配‘Zillions’左边的，一直到。

这个例子很好地说明了，为什么通常情况下，忽略优先量词并不是排除类的完美替身。

如果我们使用否定环视，就能得到想要的结果。

请看下面表达式：

# Match the opening

( # Now, only asmany of the following as needed …

(?! ) # If not …

. # … any character is okay

)*? #

# … until the closingdelimiter can match

程序如下：

i��n-f�(F�imes New Roman";mso-hansi-font-family:"Times New Roman"'>」。

程序如下：

#! /usr/bin/perl -w

$str2 = "<B>Billions and <B>Zillions</B> of suns";

$str2 =~ m{
	(
	<B>
	(
		(?! <B> )
		.
	)*?
	</B>
	)
	}x;

print $1;

执行结果：

$perl mre45_24.pl

Zillions

使用了环视功能之后，我们可以重新使用普通的匹配优先量词：

# Match the opening

( # Now, as manyof the following as possible …

(?!< /? B> ) # If not , and not …

. # … any character isokay

)* # (now greedy)

# <ANNO> … until the closingdelimiter can match.

程序如下：

#! /usr/bin/perl -w

$str2 = "<B>Billions and <B>Zillions</B> of suns";

$str2 =~ m{
	(
	<B>
	(
		(?! < /? B> )
		.
	)*
	</B>
	)
	}x;

print $1;

执行结果：

$perl mre45_25.pl

Zillions

4.5.4 Greediness and LazinessAlways Favor a Match

第2章中显示价格的例子，我们会在本章的多个地方仔细检查这个例子，因为浮点数的显示问题，“1.625”或者“3.00”有时候会变成“1.62500000002828”和“3.0000000002882”。为解决这个问题，使用如下正则：

a.$price =~ s/(\.\d\d[1-9]?)\d*/$1/;

「\.\d\d」匹配最开始两位数字，而「[1-9]?」用来匹配可能出现的不等于0的第三位数字。

到现在看起来一切正常，但是，如果$price的数据本身格式规范，会出现什么问题呢？就是用‘.625’替换‘.625’—相当于白费功夫。

所以，我们用如下表达式：

b.$price =~ s/(\.\d\d[1-9]?)\d+/$1/

但是a处理格式规范的数据会降低效率，使用b处理规范数据还匹配不上。

例如：

#! /usr/bin/perl -w

$price = 9.436;

$price =~ s/(\.\d\d[1-9]?)\d+/$1/;
# $price =~ s/(\.\d\d[1-9]?)\d*/$1/;

print $1;

执行结果：

$perl mre45_3.pl

.43

结果不是我们想要的‘.436’!

4.5.5 The Essence of Greediness, Laziness, and Backtracking

之前的章节告诉我们，正则中的某个元素，无论是匹配优先，还是忽略优先，都是为全局匹配服务的。

他们（匹配优先或忽略优先），在遇到“本地匹配失败”时，引擎都会回归到备用状态，然后尝试尚未尝试的路径。

如果存在不止一个可能的匹配结果，那么匹配优先匹配最长的结果，而使用忽略优先的匹配最短的结果。

如下程序所示：

#! /usr/bin/perl -w

$str1 = "The name \"McDonald's\" is said \"makudonarudo\" in Japanese.";

$str1 =~ /(".*")/;

print "$1\n";

$str1 =~ /(".*?")/;

print $1;

执行结果：

$perl mre45_12.pl

"McDonald's"is said "makudonarudo"

"McDonald's"

分享到：

理解反射（三）类中的属性的反射，假如有下 ... | 计算机视觉及图像图像领域people

2012-02-18 19:46
浏览 669
评论(0)
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

正则习点 --- 12

4.5. More About Greediness andBacktracking

4.5.1 Problems of Greediness

4.5.2 Multi-Character “Quotes”

4.5.3 Using Lazy Quantifiers

4.5.4 Greediness and LazinessAlways Favor a Match

4.5.5 The Essence of Greediness, Laziness, and Backtracking

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

正则习点 --- 12

4.5. More About Greediness andBacktracking

4.5.1 Problems of Greediness

4.5.2 Multi-Character “Quotes”

4.5.3 Using Lazy Quantifiers

4.5.4 Greediness and LazinessAlways Favor a Match

4.5.5 The Essence of Greediness, Laziness, and Backtracking

评论

发表评论

相关推荐

最近访客更多访客>>