`
orcl_zhang
  • 浏览: 234137 次
  • 性别: Icon_minigender_1
  • 来自: 杭州
社区版块
存档分类
最新评论

Regular Expressions

    博客分类:
  • ruby
阅读更多
                               Regular expressions (“regexps”) match strings.
/abc/ =~ "abc"
                               When a match is successful, the return value
   0
֒→                             is the position of the first matching character.
                               An if construct will count a successful match as
puts 'match' if /abc/ =~ "abc"
                               true.
   match
֒→
                               The matching substring can be anywhere in the
/abc/ =~ "cbaabc"
                               string.
   3
֒→
                               When the string doesn’t match, the result is nil.
/abc/ =~ "ab!c"
   nil
֒→
                               There may be more than one match in the string.
/abc/ =~ "abc and abc"
                               Matching always returns the index of the first
   0
֒→                             match.
                               Case matters.
/cow/ =~ "Cow"
   nil
֒→
                               The regular expression doesn’t have to be on the
"foofarah" =~ /foo/
                               left.
   0
֒→

10.1 Special Characters
                        You can anchor the match to the beginning of
/^abc/ =~ "!abc"
                        the string with ˆ (the caret character, sometimes
   nil
֒→                      called “hat”).
                        You can also anchor the match to the end
/abc$/ =~ "abc!"
                        of the string with a dollar sign character,
   nil
֒→                      often abbreviated “dollar.” Special characters
                        like the caret and dollar are what make regular
                        expressions more powerful than something like
                        "string".include?("ing").

\d Any digit
\D Any character except a digit
\s “whitespace”: space, tab, carriage return, line feed, or newline
\S Anything except whitespace
\w A “word character”: [A-Za-z0-9_]
\W Any character except a word character
                 Figure 10.1: Character Classes

                               A period (“dot”) matches any character.
/a.c/ =~ "does abc match?"
   5
֒→
                               The asterisk character (“star”) matches any
/ab*c/ =~ "does abbbbc match?"
                               number of occurrences of the character preced-
   5
֒→                             ing it.
                               “Any number” includes zero.
/ab*c/ =~ "does ac match?"
   5
֒→
                               Frequently, you’ll want to match one or more
/ab+c/ =~ "does ac match?"
                               occurrence but not zero. That’s done with the
   nil
֒→                             plus character.
                               The question mark character matches zero or
/ab?c/ =~ "does ac match?"
                               one occurrences but not more than one.
   5
֒→
                               Special characters can be combined. The com-
/a.*b/ =~ "a ! b ! i j k b"
                               bination of a dot and star is used to match any
   0
֒→                             number of any kind of character.
                               To match all characters in a character class,
/[0123456789]+/ =~ "number 55"
                               enclose them within square brackets.

   7
֒→
                        Character classes containing alphabetically
/[0-9][a-f]/ =~ "5f"
                        ordered runs of characters can be abbreviated
   0
֒→                      with the dash.
                        Within brackets, characters like the dot, plus,
/[.]/ =~ "b"
                        and star are not special.
   nil
֒→
                        Outside of brackets, special characters can be
/\[a\]\+/ =~ "[a]+"
                        stripped of their powers by “escaping” them with
   0
֒→                      a backslash.
                        To include open and close brackets inside of
/^[\[=\]]+$/ =~ '=]=[='
                        brackets, escape them with a backslash. This
   0
֒→                      expression matches any sequence of one or more
                        characters, all of which must be either [, ], or =.
                        (The two anchors ensure that there are no char-
                        acters before or after the matching characters.)

                        Putting a caret at the beginning of a character
/[^ab]/ =~ "z"
                        class causes the set to contain all characters
   0
֒→                      except the ones listed.
                        Some character classes are so common they’re
/=\d=[x\d]=/ =~ "=5=x="
                        given abbreviations. \d is the same character
   0
֒→                      class as [0-9]. Other characters can be added
                        to the abbreviation, in which case brackets are
                        needed. See Figure 10.1, on the previous page,
                        for a complete list of abbreviations.
10.2 Grouping and Alternatives

                              Parentheses can group sequences of characters
/(ab)+/ =~ "ababab"
                              so that special characters apply to the whole
   0
֒→                            sequence.
                              Special characters can appear within groups.
/(ab*)+/ =~ "aababbabbb"
                              Here, the group containing one a and any num-
   0
֒→                            ber of b’s is repeated one or more times.
                              The vertical bar character is used to allow alter-
/a|b/ =~ "a"
                              natives. Here, either a or b match.
   0
֒→
                              A vertical bar divides the regular expression into
/^Fine birds|cows ate\.$/ =~
                              two smaller regular expressions. A match means
      "Fine birds ate seeds."
                              that either the entire left regexp matches or the
   0
֒→                            entire right one does.
                              This regular expression does not mean “Match
                              either 'Fine birds ate.' or 'Fine cows ate.'” It actu-
                              ally matches either a string beginning with "Fine
                              birds" or one ending in "cows ate."


                                 This regular expression matches only the two
/^Fine (birds|cows) ate\.$/ =~
                                 alternate sentences, not the infinite number of
       "Fine birds ate seeds."
                                 possibilities the previous example’s regexp does.
   nil
֒→
10.3 Taking Strings Apart
                                 Like the =~ operator, match returns nil if there’s
re = /(\w+), (\w+), or (\w+)/
                                 no match. If there is, it returns a MatchData
s = 'Without a Bob, ox, or bin!'
                                 object. You can pull information out of that
match = re.match(s)
                                 object.
֒→ #<MatchData:0x323c44>
                                 A MatchData is indexable. Its zeroth element is
match[0]
                                 the entire match.
֒→ "Bob, ox, or bin"
                                 Each following element stores the result of what
match[1]
                                 a group matched, counting from left to right.
֒→ "Bob"

                              Groups are often used to pull apart strings and
"#{match[3]} and #{match[1]}"
                              construct new ones.
֒→ "bin and Bob"
                              pre_match returns any portion of the string
match.pre_match
                              before the part that matched.
֒→ "Without a "
                              post_match returns any portion of the string
match.post_match
                              after the part that matched. match.pre_match,
֒→ "!"                        match[0], and match.post_match can be added
                              together to reconstruct the original string.
                              The plus and star special characters are greedy:
str = "a bee in my bonnet"
                              they match as many characters as they can.
/a.*b/.match(str)[0]
                              Expect that to catch you by surprise sometimes.
֒→ "a bee in my b"
                              You can make plus and star match as few char-
/a.*?b/.match(str)[0]
                              acters as they can by suffixing them with a ques-
֒→ "a b"                      tion mark.

                                 You can use a regular expression to slice a
"has 5 and 3" [/\d+/]
                                 string. The result is the first substring that
֒→ "5"                           matches the regular expression.
10.4 Variables Behind the Scenes
                                 Both =~ and match set some variables. All begin
re = /(\w+), (\w+), or (\w+)/
                                 with $. Each parenthesized group gets its own
s = 'Without a Bob, ox, or bin!'
                                 number, from $1 up through $9. You might
re =~ s
                                 expect $0 to name the entire string that matched,
[$1, $2, $3]
                                 but it’s already used for something else: the
֒→ ["Bob" , "ox" , "bin" ]       name of the program being executed.
                                 $& is the equivalent of match[0].
$&
֒→ "Bob, ox, or bin"
                                 These two variables are used to store the string
$‘ + $'
                                 before the match and the string after the match.
֒→ "Without a !"                 (The first is a backward quote / backtick; the
                                 second a normal quote.)


These variables are probably most often used to immediately do some-
thing with a string that’s “equal enough” to some pattern. Like this:
if name =~ /(.+), (.+)/
  name = "#{$2} #{$1}"
end
10.5 Regular Expression Options
                                  Normally, the period in a regular expression
/a.*b/ =~ "az\nzb"
                                  does not match the end-of-line character. There-
   nil
֒→                                fore, .* or .+ matches won’t span lines.

                        Adding the m (multiline) option makes a period
/a.*b/m =~ "az\nzb"
                        match end-of-line characters, so the regular
   0
֒→                      expression match can span lines.
                        This is a far too annoying way to do a case-
/[cC][aA][tT]/ =~ "Cat"
                        insensitive match.
   0
֒→
                        The i (insensitive) option is a better way.
/cat/i =~ "Cat"
   0
֒→
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics