`
dky_rl
  • 浏览: 67033 次
  • 性别: Icon_minigender_1
  • 来自: 黑龙江
社区版块
存档分类
最新评论

Groovy正则表达式(转)

阅读更多

Regular expressions

 

Groovy In Action.pdf

 

Symbol

 

Meaning

 

.

 

Any character

 

^

 

Start of line (or start of document, when in single-line mode)

 

$

 

End of line (or end of document, when in single-line mode)

 

\d

 

Digit character

 

\D

 

Any character except digits

 

\s

 

Whitespace character

 

\S

 

Any character except whitespace

 

\w

 

Word character

 

\W

 

Any character except word characters

 

\b

 

Word boundary

 

()

 

Grouping

 

(x|y)

 

x or y as in (Groovy|Java|Ruby)

 

\1

 

Backmatch to group one; for example, find doubled characters with (.)\1

 

x*

 

Zero or more occurrences of x

 

x+

 

One or more occurrences of x

 

x?

 

Zero or one occurrence of x

 

x{m,n}

 

At least m and at most n occurrences of x

 

x{m}

 

Exactly m occurrences of x

 

[a-f]

 

Character class containing the characters a, b, c, d, e, f

 

[^a]

 

Character class containing any character except a

 

[aeiou]

 

Character class representing lowercase vowels(元音)

 

[a-z&&[^aeiou]]

 

Lowercase consonants协调一致的

 

[a-zA-Z0-9]

 

Uppercase or lowercase letter or digit

 

[+|-]?(\d+(\.\d*)?)|(\.\d+)

 

Positive or negative floating-point number

 

^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$

 

Simple email validation

 

(?is:x)

 

Switches mode when evaluating x; i turns on ignoreCase, s is single-line mode

 

(?=regex)

 

Positive lookahead正向前瞻

 

(?<=text)

 

Positive lookbehind

 

Beginning Groovy and Grails (Jun 2008).pdf

Table 2-1. Summary of Regular-Expression Constructs

Construct                                          Matches

Characters

 

x

 

The character x

 

\\

 

The backslash character

 

\t

 

The tab character (\u0009)

 

\n

 

The newline (line feed) character (\u000A)

 

\r

 

The carriage return character (\u000D)

 

\f

 

The form feed character (\u000C)

 

\e

 

The escape character (\u001B)

 

 

Character Classes

 

[abc]

 

a, b, or c (simple class)

 

[^abc]

 

Any character except a, b, or c (negation)

 

[a-zA-Z]

 

a through z or A through Z, inclusive (range)

 

[a-d[m-p]]

 

a through d, or m through p: [a-dm-p] (union)

 

[a-z&&[def]]

 

d, e, or f (intersection)

 

[a-z&&[^bc]]

 

a through z, except for b and c: [ad-z] (subtraction)

 

[a-z&&[^m-p]]

 

a through z, and not m through p: [a-lq-z] (subtraction)

 

 

Predefined Character Classes

 

.

 

Any character (may or may not match line terminators)

 

\d

 

A digit: [0-9]

 

\D

 

A nondigit: [^0-9]

 

\s

 

A whitespace character: [\t\n\x0B\f\r]

 

\S

 

A non-whitespace character: [^\s]

 

\w

 

A word character: [a-zA-Z_0-9]

 

\W

 

A nonword character: [^\w]

 

 

Boundary Matchers

 

^

 

The beginning of a line

 

$

 

The end of a line

 

\b

 

A word boundary

 

\B

 

A nonword boundary

 

\A

 

The beginning of the input

 

\G

 

The end of the previous match

 

\Z

 

The end of the input but for the final terminator, if any

 

\z

 

The end of the input

 

 

Greedy Quantifiers

 

X?

 

 X, once or not at all

 

X*

 

X, zero or more times

 

X+

 

X, one or more times

 

X{n}

 

 X, exactly n times

 

X{n,}

 

X, at least n times

 

X{n,m}

 

X, at least n but not more than m times

 

 

Reluctant Quantifiers

 

X??

 

X, once or not at all

 

X*?

 

X, zero or more times

 

X+?

 

X, one or more times

 

X{n}?

 

X, exactly n times

 

X{n,}?

 

X, at least n times

 

X{n,m}?

 

X, at least n but not more than m times

 

 

Possessive Quantifiers

 

X?+

 

X, once or not at all

 

X*+

 

X, zero or more times

 

X++

 

X, one or more times

 

X{n}+

 

X, exactly n times

 

X{n,}+

 

X, at least n times

 

X{n,m}+

 

X, at least n but not more than m times

 

 

Logical Operators

 

XY

 

X followed by Y

 

X|Y

 

Either X or Y

 

(X)

 

X, as a capturing group

 

 

http://groovy.codehaus.org/Regular+Expressions

http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html

http://groovy.codehaus.org/Tutorial+4+-+Regular+expressions+basics  and

http://groovycodehaus.org/Tutorial+5+-+Capturing+regex+groups  

For a complete list of regular expressions, see http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html .

 

Groovy正则表达式(转) - hbluojiahui - hbluojiahui的博客

Groovy正则表达式(转) - hbluojiahui - hbluojiahui的博客

 

 

some text --Exactly “some text”.

some\s+text --The word “some” followed by one or more whitespace characters followed by the word “text”.

^\d+(\.\d+)? (.*) -Our introductory example: headings of level one or two. ^ denotes a line start, \d a digit,

\d+ one or more digits. Parentheses are used for grouping. The question mark makes the first group optional. The second group contains the title, made of a dot for any character and a star for any number of such characters.

\d\d/\d\d/\d\d\d\d --A date formatted as exactly two digits followed by slash, two more digits followed

by a slash, followed by exactly four digits.

$ 的作用:

def reg1 =  ~/^[A-Z]{1}[a-zA-Z0-9]+$/

assert "Ad1WRldd" =~ reg1  true

assert "Ad1@#!ldd" =~ reg1  false

 

但如果没有 $

def reg1 =  ~/^[A-Z]{1}[a-zA-Z0-9]+$/

assert "Ad1@#!ldd" =~ reg1  true

 

groovy正则表达式(google 笔记本中…)

 Sometimes the slashy syntax interferes with other valid Groovy expressions  such as line comments (注释//)or numerical expressions with multiple slashes for division(除号/). When in doubt, put parentheses () around your pattern like (/pattern/). Parentheses force the parser to interpret the content as an expression.

■ The regex find operator =~ 

■ The regex match operator ==~ 

■ The regex pattern operator ~String

assert "abc" == /abc/

assert "\\d" == /\d/

def reference = "hello"

assert reference == /$reference/

assert "\$" == /$/

twister = 'she sells sea shells at the sea shore of seychelles'

// twister must contain a substring of size 3

// that starts with s and ends with a

assert twister =~ /s.a/                     ß Regex find operator as usable in if

 

finder = (twister =~ /s.a/)                  ß Find expression evaluates to a matcher object

assert finder instanceof java.util.regex.Matcher

 

// twister must contain only words delimited by single spaces

assert twister ==~ /(\w+ \w+)*/            ß Regex match operator

 

WORD = /\w+/                             

matches = (twister ==~ /($WORD $WORD)*/) ß Match expression evaluates to a Boolean

assert matches instanceof java.lang.Boolean

 

assert (twister ==~ /s.e/) == false           ß Match is full, not partial like find

wordsByX = twister.replaceAll(WORD, 'x')

assert wordsByX == 'x x x x x x x x x x'

 

words = twister.split(/ /)                    ß Split returns a list of words

assert words.size() == 10

assert words[0] == 'she'

 

使用建议:

■ When things get complex (note, this is when, not if), comment verbosely.

■ Use the slashy syntax instead of the regular string syntax, or you will get lost in a forest of backslashes.

■ Don’t let your pattern look like a toothpick puzzle. Build your pattern from subexpressions like WORD in listing 3.5.

■ Put your assumptions to the test. Write some assertions or unit tests to test your regex against static strings. Please don’t send us any more flowers for this advice; an email with the subject “assertion saved my life today” will suffice.

"\$abc." =~ /\$(.*)\./

//"\$abc."=~ \\\$(.*)\\.

 '''${abc.}tttt${abc.}''' =~ /\$\{(.*)\.\}/

ddd = ~/\$\{(\w*)\.\}/

println ddd.class               <--class java.util.regex.Pattern,  显示构造pattern 对象(同时构建相应的状态机(参考性能一节))

regex = /\$\{(\w*)\.\}/       <-- () 分组?

println regex.class            <-- class java.lang.String,匹配时(=~ )隐式构建pattern 对象

str = '''${abc.}tttt${def.}''' 

str.eachMatch(regex){match->

     println match[0]          <--      ${abc.}   abc   ${def.}  def

    println match[1]                                                                 

}

regex2 = /\$\{\w*\.\}/          <-- 去掉了 ()  ok

cloze = str.replaceAll(regex2){  ch ->

    println "all match = "+ch

    ch=44

}

println "cloze = "+cloze     <-- all match = ${abc.}    all match = ${def.}   cloze = 44tttt44234

                                          

def message = '${Hello}, ${world}'

def reg = '\\$\\{\\w*\\}'

def reg2= /\$\{\w*\}/

def rs = message.replaceAll(reg) { ch ->

    println "all match = "+ch

    ch=11

}

println "rs = "+rs

matcher = 'a b c' =~ /\S/

assert matcher[0] == 'a'

assert matcher[1..2] == 'bc'

assert matcher.count == 3

 

matcher = 'a:1 b:2 c:3' =~ /(\S+):(\S+)/    ß 注意这两例子中的区别,分组的matcher 值是不同的

assert matcher.hasGroup()

assert matcher[0] == ['a:1', 'a', '1']

 

('xy' =~ /(.)(.)/).each { all, x, y ->          ß 也可以打印出来  

assert all == 'xy'

assert x == 'x'

assert y == 'y'

}

myFairStringy = 'The rain in Spain stays mainly in the plain!'

// words that end with 'ain': \b\w*ain\b

BOUNDS = /\b/

rhyme = /$BOUNDS\w*ain$BOUNDS/

found = ''

myFairStringy.eachMatch(rhyme) { match ->      string.eachMatch (pattern_string)

     found += match[0] + ' '

}

assert found == 'rain Spain plain '

found = ''

(myFairStringy =~ rhyme).each { match ->        matcher.each(closure)

     found += match + ' '

}

assert found == 'rain Spain plain '

cloze = myFairStringy.replaceAll(rhyme){ it-'ain'+'___' }    string.replaceAll (pattern_string, closure)

assert cloze == 'The r___ in Sp___ stays mainly in the pl___!'

 

Ø         性能

Listing 3.7 Increase performance with pattern reuse.

twister = 'she sells sea shells at the sea shore of seychelles'

// some more complicated regex:

// word that starts and ends with same letter

regex = /\b(\w)\w*\1\b/

start = System.currentTimeMillis()

100000.times{

twister =~ regex          ß Find operator with implicit pattern construction

}

first = System.currentTimeMillis() – start

 

start = System.currentTimeMillis()

pattern = ~regex              ß Explicit pattern construction

100000.times{

pattern.matcher(twister)    ß Apply the pattern on a String

}

second = System.currentTimeMillis() - start

assert first > second * 1.20

 

模式操作符(pattern operator) ~String

The pattern operator 把字符串转换成java.util.regex.Pattern 对象. For a given string, this pattern object can be asked for a matcher object.

The rationale behind this construction is that patterns are internally backed by a so-called finite state machine that does all the high-performance magic. This machine is compiled when the pattern object is created(当模式对象创建的时候这个有限状态机就编译完成). The more complicated the pattern, the longer the creation takes. In contrast, the matching process as performed by the machine is extremely fast.

模式操作符可以把模式创建时间从模式匹配时分离出来,这样可以达到reuse 有限状态机的目的。因此,

由上面例子可以看出,first 是在每次匹配时构造一次pattern,而second在匹配前显示pattern-creation , 可以节省时间

 

Ø         Patterns for classification

Pattern object 有个 isCase(String ) 方法

Listing 3.8 Patterns in grep() and switch()

assert (~/..../).isCase('bear')

switch('bear'){

case ~/..../ : assert true; break

default : assert false

}

beasts = ['bear','wolf','tiger','regex']

assert beasts.grep(~/..../) == ['bear','wolf']

 

TIP   Classifications read nicely in switch and grep. The direct use of classifier.isCase(candidate) happens rarely, but when it does, it is best read from right to left: “candidate is a case of classifier”.

分享到:
评论

相关推荐

    groovy正则表达式基础1

    Groovy 正则表达式 Groovy正则表达式语法 背景项目使用Gradle作为自动化构建的工具, 闲暇之余对这个工具的使用方式以及其配置文件依赖的Groov

    我收集的groovy 正则表达式参考

    groovy Regular expressions 我收集的groovy 正则表达式参考

    groovy正则表达式使用1

    Groovy正则表达式使用 // 定义正则表达式,里面的特殊字符会自动转义~// 定义正则表达式,会将字符串编译成Pattern=~ 将左边的字符串局部匹配右边

    正则表达式工具(java版)

    早期我用c#开发了一个正则表达式工具,而现在这个版本是我用java实现的。 众所周知,java在桌面应用方面一直是短板,c#则有着天然的优势。然作为一个java开发人员,采用java的编程思想来实现此功能还是很有必要的。

    [Groovy入门]第四讲.数据搜索功能与正则表达式

    [Groovy入门]第四讲.数据搜索功能与正则表达式

    groovy代码-测试正则表达式

    groovy代码-测试正则表达式

    正则表达式教程

    本教程主要参考自网上最普遍的《正则表达式30分钟入门教程》(http://deerchao.net/tutorials/regex/regex.htm),这是一个非常优秀的教程,深入浅出讲解了正则表达式的基本概念,更加深入的内容可以参考CSDN上过客...

    js中的正则表达式入门(大量实例代码)

    什么是正则表达式呢? 正则表达式(regular expression)描述了一种字符串匹配的模式,可以用来检查一个字符串是否含有某种子串、将匹配的子串做替换或者从某个字符串中取出符合某个条件的子串等。 先科普一下基本的...

    apache-groovy-sdk-2.4.11

    了解 Groovy 对 Java 语法的简化变形,学习 Groovy 的核心功能,例如本地集合、内置正则表达式和闭包。编写第一个 Groovy 类,然后学习如何使用 JUnit 轻松地进行测试。借助功能完善的 Groovy 开发环境和使用技能,...

    apache-groovy-sdk-3.0.9.zip

    了解 Groovy 对 Java 语法的简化变形,学习 Groovy 的核心功能,例如本地集合、内置正则表达式和闭包。编写第一个 Groovy 类,然后学习如何使用 JUnit 轻松地进行测试。借助功能完善的 Groovy 开发环境和使用技能,...

    nexus-repository-cpan:很多正则表达式,v perl,超级兴奋

    Nexus储存库CPAN格式 目录 发展 要求 通过网络访问 另外,有大量可用信息 建造 要构建项目并生成捆绑包,请使用Maven mvn clean install 如果一切顺利,则CPAN捆绑软件应在target文件夹中可用 ...

    Groovy学习笔记

    Groovy学习笔记,内容如下: 1.概念 2.基本语法 3.字符串 4.集合 5.类和闭包 6.控制语句 7.操作符重载 8.I/O操作 9.高级应用 10.正则表达式

    Groovy 教程

    本教程适合于不熟悉 Groovy,但想快速轻松地了解其基础知识的 Java™ 开发人员。了解 Groovy 对 Java 语法的简化变形,学习 Groovy 的核心功能,例如本地集合、内置正则表达式和闭包。

    SaveActionGroovyScripts:IntelliJ IDEA Android Studio插件,当文件同步保存时,允许用户自动运行自定义Groovy脚本

    Groovy脚本执行可以基于对保存文件路径的正则表达式检查而有条件地启用/禁用。 入门 安装插件 首先, 并安装插件。 配置插件 该插件随附一个默认的IntelliJ IDEA设置文件,该文件位于&lt;USER&gt;/&lt;IDE&gt;/config/options/...

    nexus-scripts:Sonatype nexus 3 groovy 脚本来列出和删除资产

    基于正则表达式列出或删除资产的 groovy 脚本部署脚本provision.sh -h https://repository.host.com -u admin -p ****删除脚本delete.sh -h https://repository.host.com -u admin -p ****调用脚本删除Maven存储库中...

    blog:记录学习中的一些新技能

    正则表达式 精彩文章 王垠的博客主页 究竟什么是技术——非科班程序员两年的内心挣扎 一个10年IT技术人的历程-Java架构师的演变 阿里高级技术专家:如何量化考核技术人的KPI? 黎活明给程序员的忠告 阿里巴巴高级...

    【最新版】DbSchema_macos_8_2_9.tgz【亲测可用】最好的可视化图表设计器和 GUI工具

    使用可配置,随机和反向正则表达式 。 使用JDBC驱动程序连接到任何数据库,对结构进行反向工程,并将其作为ER图查看。 利用具有文本自动完成功能,Groovy脚本支持,脚本和查询执行功能的强大SQL编辑器。 将数据从...

    fun-with-thoughtworks-hw:台湾家庭作业之一的解决方案。 要求未公布,以保持 TW 公平的招聘过程

    解决方案简单(可能不是最优的)算法,基于带有 First Fit Descending “trick”的工装...” 对正则表达式很重要。 实体中没有实现参数验证Equals/HashCode 和 Comparator 没有针对尴尬的空情况进行测试。 通常使用

Global site tag (gtag.js) - Google Analytics