Regular expressions


Groovy In Action.pdf








Any character




Start of line (or start of document, when in single-line mode)




End of line (or end of document, when in single-line mode)




Digit character




Any character except digits




Whitespace character




Any character except whitespace




Word character




Any character except word characters




Word boundary








x or y as in (Groovy|Java|Ruby)




Backmatch to group one; for example, find doubled characters with (.)\1




Zero or more occurrences of x




One or more occurrences of x




Zero or one occurrence of x




At least m and at most n occurrences of x




Exactly m occurrences of x




Character class containing the characters a, b, c, d, e, f




Character class containing any character except a




Character class representing lowercase vowels(元音)




Lowercase consonants协调一致的




Uppercase or lowercase letter or digit




Positive or negative floating-point number




Simple email validation




Switches mode when evaluating x; i turns on ignoreCase, s is single-line mode




Positive lookahead正向前瞻




Positive lookbehind


Beginning Groovy and Grails (Jun 2008).pdf

Table 2-1. Summary of Regular-Expression Constructs

Construct                                          Matches





The character x




The backslash character




The tab character (\u0009)




The newline (line feed) character (\u000A)




The carriage return character (\u000D)




The form feed character (\u000C)




The escape character (\u001B)



Character Classes




a, b, or c (simple class)




Any character except a, b, or c (negation)




a through z or A through Z, inclusive (range)




a through d, or m through p: [a-dm-p] (union)




d, e, or f (intersection)




a through z, except for b and c: [ad-z] (subtraction)




a through z, and not m through p: [a-lq-z] (subtraction)



Predefined Character Classes




Any character (may or may not match line terminators)




A digit: [0-9]




A nondigit: [^0-9]




A whitespace character: [\t\n\x0B\f\r]




A non-whitespace character: [^\s]




A word character: [a-zA-Z_0-9]




A nonword character: [^\w]



Boundary Matchers




The beginning of a line




The end of a line




A word boundary




A nonword boundary




The beginning of the input




The end of the previous match




The end of the input but for the final terminator, if any




The end of the input



Greedy Quantifiers




 X, once or not at all




X, zero or more times




X, one or more times




 X, exactly n times




X, at least n times




X, at least n but not more than m times



Reluctant Quantifiers




X, once or not at all




X, zero or more times




X, one or more times




X, exactly n times




X, at least n times




X, at least n but not more than m times



Possessive Quantifiers




X, once or not at all




X, zero or more times




X, one or more times




X, exactly n times




X, at least n times




X, at least n but not more than m times



Logical Operators




X followed by Y




Either X or Y




X, as a capturing group





http://groovy.codehaus.org/Tutorial+4+-+Regular+expressions+basics


For a complete list of regular expressions, see http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html .


Groovy正则表达式(转) - hbluojiahui - hbluojiahui的博客



some text --Exactly “some text”.

some\s+text --The word “some” followed by one or more whitespace characters followed by the word “text”.

^\d+(\.\d+)? (.*) -Our introductory example: headings of level one or two. ^ denotes a line start, \d a digit,

\d+ one or more digits. Parentheses are used for grouping. The question mark makes the first group optional. The second group contains the title, made of a dot for any character and a star for any number of such characters.

\d\d/\d\d/\d\d\d\d --A date formatted as exactly two digits followed by slash, two more digits followed

by a slash, followed by exactly four digits.

$ 的作用:

def reg1 =  ~/^[A-Z]{1}[a-zA-Z0-9]+$/

assert "Ad1WRldd" =~ reg1  true

assert "Ad1@#!ldd" =~ reg1  false


但如果没有 $

def reg1 =  ~/^[A-Z]{1}[a-zA-Z0-9]+$/

assert "Ad1@#!ldd" =~ reg1  true


groovy正则表达式(google 笔记本中…)

 Sometimes the slashy syntax interferes with other valid Groovy expressions  such as line comments (注释//)or numerical expressions with multiple slashes for division(除号/). When in doubt, put parentheses () around your pattern like (/pattern/). Parentheses force the parser to interpret the content as an expression.

■ The regex find operator =~ 

■ The regex match operator ==~ 

■ The regex pattern operator ~String

assert "abc" == /abc/

assert "\\d" == /\d/

def reference = "hello"

assert reference == /$reference/

assert "\$" == /$/

twister = 'she sells sea shells at the sea shore of seychelles'

// twister must contain a substring of size 3

// that starts with s and ends with a

assert twister =~ /s.a/                     ß Regex find operator as usable in if


finder = (twister =~ /s.a/)                  ß Find expression evaluates to a matcher object

assert finder instanceof java.util.regex.Matcher


// twister must contain only words delimited by single spaces

assert twister ==~ /(\w+ \w+)*/            ß Regex match operator


WORD = /\w+/                             

matches = (twister ==~ /($WORD $WORD)*/) ß Match expression evaluates to a Boolean

assert matches instanceof java.lang.Boolean


assert (twister ==~ /s.e/) == false           ß Match is full, not partial like find

wordsByX = twister.replaceAll(WORD, 'x')

assert wordsByX == 'x x x x x x x x x x'


words = twister.split(/ /)                    ß Split returns a list of words

assert words.size() == 10

assert words[0] == 'she'



■ When things get complex (note, this is when, not if), comment verbosely.

■ Use the slashy syntax instead of the regular string syntax, or you will get lost in a forest of backslashes.

■ Don’t let your pattern look like a toothpick puzzle. Build your pattern from subexpressions like WORD in listing 3.5.

■ Put your assumptions to the test. Write some assertions or unit tests to test your regex against static strings. Please don’t send us any more flowers for this advice; an email with the subject “assertion saved my life today” will suffice.

"\$abc." =~ /\$(.*)\./

//"\$abc."=~ \\\$(.*)\\.

 '''${abc.}tttt${abc.}''' =~ /\$\{(.*)\.\}/

ddd = ~/\$\{(\w*)\.\}/

println ddd.class               <--class java.util.regex.Pattern,  显示构造pattern 对象(同时构建相应的状态机(参考性能一节))

regex = /\$\{(\w*)\.\}/       <-- () 分组?

println regex.class            <-- class java.lang.String,匹配时(=~ )隐式构建pattern 对象

str = '''${abc.}tttt${def.}''' 


     println match[0]          <--      ${abc.}   abc   ${def.}  def

    println match[1]                                                                 


regex2 = /\$\{\w*\.\}/          <-- 去掉了 ()  ok

cloze = str.replaceAll(regex2){  ch ->

    println "all match = "+ch



println "cloze = "+cloze     <-- all match = ${abc.}    all match = ${def.}   cloze = 44tttt44234


def message = '${Hello}, ${world}'

def reg = '\\$\\{\\w*\\}'

def reg2= /\$\{\w*\}/

def rs = message.replaceAll(reg) { ch ->

    println "all match = "+ch



println "rs = "+rs

matcher = 'a b c' =~ /\S/

assert matcher[0] == 'a'

assert matcher[1..2] == 'bc'

assert matcher.count == 3


matcher = 'a:1 b:2 c:3' =~ /(\S+):(\S+)/    ß 注意这两例子中的区别,分组的matcher 值是不同的

assert matcher.hasGroup()

assert matcher[0] == ['a:1', 'a', '1']


('xy' =~ /(.)(.)/).each { all, x, y ->          ß 也可以打印出来  

assert all == 'xy'

assert x == 'x'

assert y == 'y'


myFairStringy = 'The rain in Spain stays mainly in the plain!'

// words that end with 'ain': \b\w*ain\b

BOUNDS = /\b/

rhyme = /$BOUNDS\w*ain$BOUNDS/

found = ''

myFairStringy.eachMatch(rhyme) { match ->      string.eachMatch (pattern_string)

     found += match[0] + ' '


assert found == 'rain Spain plain '

found = ''

(myFairStringy =~ rhyme).each { match ->        matcher.each(closure)

     found += match + ' '


assert found == 'rain Spain plain '

cloze = myFairStringy.replaceAll(rhyme){ it-'ain'+'___' }    string.replaceAll (pattern_string, closure)

assert cloze == 'The r___ in Sp___ stays mainly in the pl___!'


Ø         性能

Listing 3.7 Increase performance with pattern reuse.

twister = 'she sells sea shells at the sea shore of seychelles'

// some more complicated regex:

// word that starts and ends with same letter

regex = /\b(\w)\w*\1\b/

start = System.currentTimeMillis()


twister =~ regex          ß Find operator with implicit pattern construction


first = System.currentTimeMillis() – start


start = System.currentTimeMillis()

pattern = ~regex              ß Explicit pattern construction


pattern.matcher(twister)    ß Apply the pattern on a String


second = System.currentTimeMillis() - start

assert first > second * 1.20


模式操作符(pattern operator) ~String

The pattern operator 把字符串转换成java.util.regex.Pattern 对象. For a given string, this pattern object can be asked for a matcher object.

The rationale behind this construction is that patterns are internally backed by a so-called finite state machine that does all the high-performance magic. This machine is compiled when the pattern object is created(当模式对象创建的时候这个有限状态机就编译完成). The more complicated the pattern, the longer the creation takes. In contrast, the matching process as performed by the machine is extremely fast.

模式操作符可以把模式创建时间从模式匹配时分离出来,这样可以达到reuse 有限状态机的目的。因此,

由上面例子可以看出,first 是在每次匹配时构造一次pattern,而second在匹配前显示pattern-creation , 可以节省时间


Ø         Patterns for classification

Pattern object 有个 isCase(String ) 方法

Listing 3.8 Patterns in grep() and switch()

assert (~/..../).isCase('bear')


case ~/..../ : assert true; break

default : assert false


beasts = ['bear','wolf','tiger','regex']

assert beasts.grep(~/..../) == ['bear','wolf']


TIP   Classifications read nicely in switch and grep. The direct use of classifier.isCase(candidate) happens rarely, but when it does, it is best read from right to left: “candidate is a case of classifier”.




