Regular expressions
Groovy In Action.pdf
|
Symbol
|
Meaning
|
.
|
Any character
|
^
|
Start of line (or start of document, when in single-line mode)
|
$
|
End of line (or end of document, when in single-line mode)
|
\d
|
Digit character
|
\D
|
Any character except digits
|
\s
|
Whitespace character
|
\S
|
Any character except whitespace
|
\w
|
Word character
|
\W
|
Any character except word characters
|
\b
|
Word boundary
|
()
|
Grouping
|
(x|y)
|
x or y as in (Groovy|Java|Ruby)
|
\1
|
Backmatch to group one; for example, find doubled characters with (.)\1
|
x*
|
Zero or more occurrences of x
|
x+
|
One or more occurrences of x
|
x?
|
Zero or one occurrence of x
|
x{m,n}
|
At least m and at most n occurrences of x
|
x{m}
|
Exactly m occurrences of x
|
[a-f]
|
Character class containing the characters a, b, c, d, e, f
|
[^a]
|
Character class containing any character except a
|
[aeiou]
|
Character class representing lowercase vowels(元音)
|
[a-z&&[^aeiou]]
|
Lowercase consonants协调一致的
|
[a-zA-Z0-9]
|
Uppercase or lowercase letter or digit
|
[+|-]?(\d+(\.\d*)?)|(\.\d+)
|
Positive or negative floating-point number
|
^[\w-\.]+@([\w-]+\.)+[\w-]{2,4}$
|
Simple email validation
|
(?is:x)
|
Switches mode when evaluating x; i turns on ignoreCase, s is single-line mode
|
(?=regex)
|
Positive lookahead正向前瞻
|
(?<=text)
|
Positive lookbehind
|
Beginning Groovy and Grails (Jun 2008).pdf
Table 2-1. Summary of Regular-Expression Constructs
Construct Matches
Characters
x
|
The character x
|
\\
|
The backslash character
|
\t
|
The tab character (\u0009)
|
\n
|
The newline (line feed) character (\u000A)
|
\r
|
The carriage return character (\u000D)
|
\f
|
The form feed character (\u000C)
|
\e
|
The escape character (\u001B)
|
Character Classes
[abc]
|
a, b, or c (simple class)
|
[^abc]
|
Any character except a, b, or c (negation)
|
[a-zA-Z]
|
a through z or A through Z, inclusive (range)
|
[a-d[m-p]]
|
a through d, or m through p: [a-dm-p] (union)
|
[a-z&&[def]]
|
d, e, or f (intersection)
|
[a-z&&[^bc]]
|
a through z, except for b and c: [ad-z] (subtraction)
|
[a-z&&[^m-p]]
|
a through z, and not m through p: [a-lq-z] (subtraction)
|
Predefined Character Classes
.
|
Any character (may or may not match line terminators)
|
\d
|
A digit: [0-9]
|
\D
|
A nondigit: [^0-9]
|
\s
|
A whitespace character: [\t\n\x0B\f\r]
|
\S
|
A non-whitespace character: [^\s]
|
\w
|
A word character: [a-zA-Z_0-9]
|
\W
|
A nonword character: [^\w]
|
Boundary Matchers
^
|
The beginning of a line
|
$
|
The end of a line
|
\b
|
A word boundary
|
\B
|
A nonword boundary
|
\A
|
The beginning of the input
|
\G
|
The end of the previous match
|
\Z
|
The end of the input but for the final terminator, if any
|
\z
|
The end of the input
|
Greedy Quantifiers
X?
|
X, once or not at all
|
X*
|
X, zero or more times
|
X+
|
X, one or more times
|
X{n}
|
X, exactly n times
|
X{n,}
|
X, at least n times
|
X{n,m}
|
X, at least n but not more than m times
|
Reluctant Quantifiers
X??
|
X, once or not at all
|
X*?
|
X, zero or more times
|
X+?
|
X, one or more times
|
X{n}?
|
X, exactly n times
|
X{n,}?
|
X, at least n times
|
X{n,m}?
|
X, at least n but not more than m times
|
Possessive Quantifiers
X?+
|
X, once or not at all
|
X*+
|
X, zero or more times
|
X++
|
X, one or more times
|
X{n}+
|
X, exactly n times
|
X{n,}+
|
X, at least n times
|
X{n,m}+
|
X, at least n but not more than m times
|
Logical Operators
XY
|
X followed by Y
|
X|Y
|
Either X or Y
|
(X)
|
X, as a capturing group
|
http://groovy.codehaus.org/Regular+Expressions
http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html
http://groovy.codehaus.org/Tutorial+4+-+Regular+expressions+basics and
http://groovycodehaus.org/Tutorial+5+-+Capturing+regex+groups
For a complete list of regular expressions, see http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html .
|
some text --Exactly “some text”.
some\s+text --The word “some” followed by one or more whitespace characters followed by the word “text”.
^\d+(\.\d+)? (.*) -Our introductory example: headings of level one or two. ^ denotes a line start, \d a digit,
\d+ one or more digits. Parentheses are used for grouping. The question mark makes the first group optional. The second group contains the title, made of a dot for any character and a star for any number of such characters.
\d\d/\d\d/\d\d\d\d --A date formatted as exactly two digits followed by slash, two more digits followed
by a slash, followed by exactly four digits.
$ 的作用:
def reg1 = ~/^[A-Z]{1}[a-zA-Z0-9]+$/
assert "Ad1WRldd" =~ reg1 true
assert "Ad1@#!ldd" =~ reg1 false
但如果没有 $
def reg1 = ~/^[A-Z]{1}[a-zA-Z0-9]+$/
assert "Ad1@#!ldd" =~ reg1 true
groovy正则表达式(google 笔记本中…)
Sometimes the slashy syntax interferes with other valid Groovy expressions such as line comments (注释//)or numerical expressions with multiple slashes for division(除号/). When in doubt, put parentheses () around your pattern like (/pattern/). Parentheses force the parser to interpret the content as an expression.
■ The regex find operator =~
■ The regex match operator ==~
■ The regex pattern operator ~String
assert "abc" == /abc/
assert "\\d" == /\d/
def reference = "hello"
assert reference == /$reference/
assert "\$" == /$/
twister = 'she sells sea shells at the sea shore of seychelles'
// twister must contain a substring of size 3
// that starts with s and ends with a
assert twister =~ /s.a/ ß Regex find operator as usable in if
finder = (twister =~ /s.a/) ß Find expression evaluates to a matcher object
assert finder instanceof java.util.regex.Matcher
// twister must contain only words delimited by single spaces
assert twister ==~ /(\w+ \w+)*/ ß Regex match operator
WORD = /\w+/
matches = (twister ==~ /($WORD $WORD)*/) ß Match expression evaluates to a Boolean
assert matches instanceof java.lang.Boolean
assert (twister ==~ /s.e/) == false ß Match is full, not partial like find
wordsByX = twister.replaceAll(WORD, 'x')
assert wordsByX == 'x x x x x x x x x x'
words = twister.split(/ /) ß Split returns a list of words
assert words.size() == 10
assert words[0] == 'she'
使用建议:
■ When things get complex (note, this is when, not if), comment verbosely.
■ Use the slashy syntax instead of the regular string syntax, or you will get lost in a forest of backslashes.
■ Don’t let your pattern look like a toothpick puzzle. Build your pattern from subexpressions like WORD in listing 3.5.
■ Put your assumptions to the test. Write some assertions or unit tests to test your regex against static strings. Please don’t send us any more flowers for this advice; an email with the subject “assertion saved my life today” will suffice.
"\$abc." =~ /\$(.*)\./
//"\$abc."=~ \\\$(.*)\\.
'''${abc.}tttt${abc.}''' =~ /\$\{(.*)\.\}/
ddd = ~/\$\{(\w*)\.\}/
println ddd.class <--class java.util.regex.Pattern, 显示构造pattern 对象(同时构建相应的状态机(参考性能一节))
regex = /\$\{(\w*)\.\}/ <-- () 分组?
println regex.class <-- class java.lang.String,匹配时(=~ )隐式构建pattern 对象
str = '''${abc.}tttt${def.}'''
str.eachMatch(regex){match->
println match[0] <-- ${abc.} abc ${def.} def
println match[1]
}
regex2 = /\$\{\w*\.\}/ <-- 去掉了 () ok
cloze = str.replaceAll(regex2){ ch ->
println "all match = "+ch
ch=44
}
println "cloze = "+cloze <-- all match = ${abc.} all match = ${def.} cloze = 44tttt44234
def message = '${Hello}, ${world}'
def reg = '\\$\\{\\w*\\}'
def reg2= /\$\{\w*\}/
def rs = message.replaceAll(reg) { ch ->
println "all match = "+ch
ch=11
}
println "rs = "+rs
matcher = 'a b c' =~ /\S/
assert matcher[0] == 'a'
assert matcher[1..2] == 'bc'
assert matcher.count == 3
matcher = 'a:1 b:2 c:3' =~ /(\S+):(\S+)/ ß 注意这两例子中的区别,分组的matcher 值是不同的
assert matcher.hasGroup()
assert matcher[0] == ['a:1', 'a', '1']
('xy' =~ /(.)(.)/).each { all, x, y -> ß 也可以打印出来
assert all == 'xy'
assert x == 'x'
assert y == 'y'
}
myFairStringy = 'The rain in Spain stays mainly in the plain!'
// words that end with 'ain': \b\w*ain\b
BOUNDS = /\b/
rhyme = /$BOUNDS\w*ain$BOUNDS/
found = ''
myFairStringy.eachMatch(rhyme) { match -> string.eachMatch (pattern_string)
found += match[0] + ' '
}
assert found == 'rain Spain plain '
found = ''
(myFairStringy =~ rhyme).each { match -> matcher.each(closure)
found += match + ' '
}
assert found == 'rain Spain plain '
cloze = myFairStringy.replaceAll(rhyme){ it-'ain'+'___' } string.replaceAll (pattern_string, closure)
assert cloze == 'The r___ in Sp___ stays mainly in the pl___!'
Ø 性能
Listing 3.7 Increase performance with pattern reuse.
twister = 'she sells sea shells at the sea shore of seychelles'
// some more complicated regex:
// word that starts and ends with same letter
regex = /\b(\w)\w*\1\b/
start = System.currentTimeMillis()
100000.times{
twister =~ regex ß Find operator with implicit pattern construction
}
first = System.currentTimeMillis() – start
start = System.currentTimeMillis()
pattern = ~regex ß Explicit pattern construction
100000.times{
pattern.matcher(twister) ß Apply the pattern on a String
}
second = System.currentTimeMillis() - start
assert first > second * 1.20
模式操作符(pattern operator) ~String
The pattern operator 把字符串转换成java.util.regex.Pattern 对象. For a given string, this pattern object can be asked for a matcher object.
The rationale behind this construction is that patterns are internally backed by a so-called finite state machine that does all the high-performance magic. This machine is compiled when the pattern object is created(当模式对象创建的时候这个有限状态机就编译完成). The more complicated the pattern, the longer the creation takes. In contrast, the matching process as performed by the machine is extremely fast.
模式操作符可以把模式创建时间从模式匹配时分离出来,这样可以达到reuse 有限状态机的目的。因此,
由上面例子可以看出,first 是在每次匹配时构造一次pattern,而second在匹配前显示pattern-creation , 可以节省时间
Ø Patterns for classification
Pattern object 有个 isCase(String ) 方法
Listing 3.8 Patterns in grep() and switch()
assert (~/..../).isCase('bear')
switch('bear'){
case ~/..../ : assert true; break
default : assert false
}
beasts = ['bear','wolf','tiger','regex']
assert beasts.grep(~/..../) == ['bear','wolf']
TIP Classifications read nicely in switch and grep. The direct use of classifier.isCase(candidate) happens rarely, but when it does, it is best read from right to left: “candidate is a case of classifier”.
分享到:
相关推荐
Groovy 正则表达式 Groovy正则表达式语法 背景项目使用Gradle作为自动化构建的工具, 闲暇之余对这个工具的使用方式以及其配置文件依赖的Groov
groovy Regular expressions 我收集的groovy 正则表达式参考
Groovy正则表达式使用 // 定义正则表达式,里面的特殊字符会自动转义~// 定义正则表达式,会将字符串编译成Pattern=~ 将左边的字符串局部匹配右边
早期我用c#开发了一个正则表达式工具,而现在这个版本是我用java实现的。 众所周知,java在桌面应用方面一直是短板,c#则有着天然的优势。然作为一个java开发人员,采用java的编程思想来实现此功能还是很有必要的。
[Groovy入门]第四讲.数据搜索功能与正则表达式
groovy代码-测试正则表达式
本教程主要参考自网上最普遍的《正则表达式30分钟入门教程》(http://deerchao.net/tutorials/regex/regex.htm),这是一个非常优秀的教程,深入浅出讲解了正则表达式的基本概念,更加深入的内容可以参考CSDN上过客...
什么是正则表达式呢? 正则表达式(regular expression)描述了一种字符串匹配的模式,可以用来检查一个字符串是否含有某种子串、将匹配的子串做替换或者从某个字符串中取出符合某个条件的子串等。 先科普一下基本的...
了解 Groovy 对 Java 语法的简化变形,学习 Groovy 的核心功能,例如本地集合、内置正则表达式和闭包。编写第一个 Groovy 类,然后学习如何使用 JUnit 轻松地进行测试。借助功能完善的 Groovy 开发环境和使用技能,...
了解 Groovy 对 Java 语法的简化变形,学习 Groovy 的核心功能,例如本地集合、内置正则表达式和闭包。编写第一个 Groovy 类,然后学习如何使用 JUnit 轻松地进行测试。借助功能完善的 Groovy 开发环境和使用技能,...
Nexus储存库CPAN格式 目录 发展 要求 通过网络访问 另外,有大量可用信息 建造 要构建项目并生成捆绑包,请使用Maven mvn clean install 如果一切顺利,则CPAN捆绑软件应在target文件夹中可用 ...
Groovy学习笔记,内容如下: 1.概念 2.基本语法 3.字符串 4.集合 5.类和闭包 6.控制语句 7.操作符重载 8.I/O操作 9.高级应用 10.正则表达式
本教程适合于不熟悉 Groovy,但想快速轻松地了解其基础知识的 Java™ 开发人员。了解 Groovy 对 Java 语法的简化变形,学习 Groovy 的核心功能,例如本地集合、内置正则表达式和闭包。
Groovy脚本执行可以基于对保存文件路径的正则表达式检查而有条件地启用/禁用。 入门 安装插件 首先, 并安装插件。 配置插件 该插件随附一个默认的IntelliJ IDEA设置文件,该文件位于<USER>/<IDE>/config/options/...
基于正则表达式列出或删除资产的 groovy 脚本部署脚本provision.sh -h https://repository.host.com -u admin -p ****删除脚本delete.sh -h https://repository.host.com -u admin -p ****调用脚本删除Maven存储库中...
正则表达式 精彩文章 王垠的博客主页 究竟什么是技术——非科班程序员两年的内心挣扎 一个10年IT技术人的历程-Java架构师的演变 阿里高级技术专家:如何量化考核技术人的KPI? 黎活明给程序员的忠告 阿里巴巴高级...
使用可配置,随机和反向正则表达式 。 使用JDBC驱动程序连接到任何数据库,对结构进行反向工程,并将其作为ER图查看。 利用具有文本自动完成功能,Groovy脚本支持,脚本和查询执行功能的强大SQL编辑器。 将数据从...
解决方案简单(可能不是最优的)算法,基于带有 First Fit Descending “trick”的工装...” 对正则表达式很重要。 实体中没有实现参数验证Equals/HashCode 和 Comparator 没有针对尴尬的空情况进行测试。 通常使用