字符串处理利器--正则表达式(RegularExpressions)之一

凯旋人生

浏览: 62131 次
性别:
来自: 大连

最近访客更多访客>>

renlongnian

w555w66

叼着烟----在街头流浪

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

J2EE

正则表达式 F#Unix

单词：

regular:[ˈreɡjulə] 有规律的, 定期的, 定时的
Expression:[iksˈpreʃən] 表示式,公式
Pattern:[ˈpætən] 型, 样式花样, 图案
Matcher: [ˈmætʃə] 匹配器;制榫机

一用途：

字符串的匹配
字符串的查找
字符串的替换

二 Java中的工具类：

java.lang.String
java.util.regex.Pattern
java.util.regex.Matcher

API:

.util.regex
Class Pattern

java.lang.Object
  java.util.regex.Pattern

All Implemented Interfaces:

Serializable

public final class Patternextends Object
implements Serializable

A compiled representation of a regular expression. 
A regular expression, specified as a string, must first be compiled into an instance of this class. The resulting pattern can then be used to create a Matcher object that can match arbitrary character sequences against the regular expression. All of the state involved in performing a match resides in the matcher, so many matchers can share the same pattern. 
A typical invocation sequence is thus 

 Pattern p = Pattern.compile("a*b");
 Matcher m = p.matcher("aaaaab");
 boolean b = m.matches();

A matches method is defined by this class as a convenience for when a regular expression is used just once. This method compiles an expression and matches an input sequence against it in a single invocation. The statement 

 boolean b = Pattern.matches("a*b", "aaaaab");

is equivalent to the three statements above, though for repeated matches it is less efficient since it does not allow the compiled pattern to be reused. 
Instances of this class are immutable and are safe for use by multiple concurrent threads. Instances of the Matcher class are not safe for such use. 

Summary of regular-expression constructs 


ConstructMatches

 

Characters

 

Character classes

 

Predefined character classes

 

POSIX character classes (US-ASCII only)

<!---->

 

java.lang.Character classes (simple java character type)

 

Classes for Unicode blocks and categories

 

Boundary matchers

 

Greedy quantifiers

 

Reluctant quantifiers

 

Possessive quantifiers

 

Logical operators

 

Back references

 

Quotation

<!---->

 

Special constructs (non-capturing)






x
The character x


\\
The backslash character


\0n
The character with octal value 0n (0 <= n <= 7)


\0nn
The character with octal value 0nn (0 <= n <= 7)


\0mnn
The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)


\xhh
The character with hexadecimal value 0xhh


\uhhhh
The character with hexadecimal value 0xhhhh


\t
The tab character ('\u0009')


\n
The newline (line feed) character ('\u000A')


\r
The carriage-return character ('\u000D')


\f
The form-feed character ('\u000C')


\a
The alert (bell) character ('\u0007')


\e
The escape character ('\u001B')


\cx
The control character corresponding to x




[abc]
a, b, or c (simple class)


[^abc]
Any character except a, b, or c (negation)


[a-zA-Z]
a through z or A through Z, inclusive (range)


[a-d[m-p]]
a through d, or m through p: [a-dm-p] (union)


[a-z&&[def]]
d, e, or f (intersection)


[a-z&&[^bc]]
a through z, except for b and c: [ad-z] (subtraction)


[a-z&&[^m-p]]
a through z, and not m through p: [a-lq-z](subtraction)




.
Any character (may or may not match line terminators)


\d
A digit: [0-9]


\D
A non-digit: [^0-9]


\s
A whitespace character: [ \t\n\x0B\f\r]


\S
A non-whitespace character: [^\s]


\w
A word character: [a-zA-Z_0-9]


\W
A non-word character: [^\w]




\p{Lower}
A lower-case alphabetic character: [a-z]


\p{Upper}
An upper-case alphabetic character:[A-Z]


\p{ASCII}
All ASCII:[\x00-\x7F]


\p{Alpha}
An alphabetic character:[\p{Lower}\p{Upper}]


\p{Digit}
A decimal digit: [0-9]


\p{Alnum}
An alphanumeric character:[\p{Alpha}\p{Digit}]


\p{Punct}
Punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

\p{Graph}
A visible character: [\p{Alnum}\p{Punct}]


\p{Print}
A printable character: [\p{Graph}\x20]


\p{Blank}
A space or a tab: [ \t]


\p{Cntrl}
A control character: [\x00-\x1F\x7F]


\p{XDigit}
A hexadecimal digit: [0-9a-fA-F]


\p{Space}
A whitespace character: [ \t\n\x0B\f\r]




\p{javaLowerCase}
Equivalent to java.lang.Character.isLowerCase()


\p{javaUpperCase}
Equivalent to java.lang.Character.isUpperCase()


\p{javaWhitespace}
Equivalent to java.lang.Character.isWhitespace()


\p{javaMirrored}
Equivalent to java.lang.Character.isMirrored()




\p{InGreek}
A character in the Greek block (simple block)


\p{Lu}
An uppercase letter (simple category)


\p{Sc}
A currency symbol


\P{InGreek}
Any character except one in the Greek block (negation)


[\p{L}&&[^\p{Lu}]] 
Any letter except an uppercase letter (subtraction)




^
The beginning of a line


$
The end of a line


\b
A word boundary


\B
A non-word boundary


\A
The beginning of the input


\G
The end of the previous match


\Z
The end of the input but for the final terminator, if any


\z
The end of the input




X?
X, once or not at all


X*
X, zero or more times


X+
X, one or more times


X{n}
X, exactly n times


X{n,}
X, at least n times


X{n,m}
X, at least n but not more than m times




X??
X, once or not at all


X*?
X, zero or more times


X+?
X, one or more times


X{n}?
X, exactly n times


X{n,}?
X, at least n times


X{n,m}?
X, at least n but not more than m times




X?+
X, once or not at all


X*+
X, zero or more times


X++
X, one or more times


X{n}+
X, exactly n times


X{n,}+
X, at least n times


X{n,m}+
X, at least n but not more than m times




XY
X followed by Y


X|Y
Either X or Y


(X)
X, as a capturing group




\n
Whatever the n^th capturing group matched




\
Nothing, but quotes the following character


\Q
Nothing, but quotes all characters until \E


\E
Nothing, but ends quoting started by \Q



(?:X)
X, as a non-capturing group


(?idmsux-idmsux) 
Nothing, but turns match flags on - off


(?idmsux-idmsux:X)  
X, as a non-capturing group with the given flags on - off


(?=X)
X, via zero-width positive lookahead


(?!X)
X, via zero-width negative lookahead


(?<=X)
X, via zero-width positive lookbehind


(?<!X)
X, via zero-width negative lookbehind


(?>X)
X, as an independent, non-capturing group




 
例子：
****************************************************
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
 /**
  * @param args
  */
 public static void main(String[] args) {
  // 简单的正则表达式例子
  //字符串是否是3个字符
 /* p("abc".matches("..."));
  //把字符串中的数字替换"-"
  p("a8729a".replaceAll("\\d", "-"));
  //匹配一个3个字符的字符串，并且每个字符都是a-z,编译是为了执行时会提升效率 
  Pattern p = Pattern.compile("[a-z]{3}");
  //public Matcher matcher(CharSequence input), String 实现了 CharSequence接口
  Matcher m = p.matcher("fgh");
  p(m.matches());
  p("fgh".matches("[a-z]{3}"));*/
  
 /* // 认识 .  *  +  ?
  //.匹配一个任意字符
  p("a".matches("."));
  //匹配字符"aa"
  p("aa".matches("aa"));
  //*匹配字符0个或多个a
  p("aaaa".matches("a*"));
  //+匹配字符 1个或多个
  p("aaaa".matches("a+"));
  //?匹配字符 0个或1个
  p("".matches("a?"));
  //?匹配字符 0个或1个
  p("a".matches("a?"));
  //{n}出现n次，{n,}出现n次以上，{n,m}出现n到m次
  p("214523145234532".matches("\\d{3,100}"));
  p("192.168.0.aaa".matches("\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}"));
  //[]是范围
  p("192".matches("[0-2][0-9][0-9]"));*/
  
  //范围 []
  //表示匹配[]中的一个字符   
 /* p("a".matches("[abc]"));
  //^abc  匹配abc外的一个字符
  p("a".matches("^abc"));
  //匹配
  p("a".matches("[a-zA-Z]"));
  //匹配a-z或 A-Z
  p("A".matches("[a-z]|[A-Z]"));
  //匹配a-z或 A-Z
  p("A".matches("[a-z[A-Z]]"));
  //匹配A-Z中的RFG中的之一
  p("R".matches("[A-Z&&[RFG]]"));*/
  
  /*//认识\s \w \d \
  //4个空白字符
  p("\n\r\t".matches("\\s{4}"));
  //非空白字符
  p(" ".matches("\\S"));
  //\w构成单词的字符
  p("a_8".matches("\\w{3}"));
  
  p("abc888&^%".matches("[a-z]{1,3}\\d+[&^#%]+"));
  //匹配一个\
  p("\\".matches("\\\\"));
  */
  
 /* //POSIX Style POSIX是一种UNIX标准
  p("a".matches("\\p{Lower}"));
  
  //boundary 边界匹配  ^在[]中是取反，[]外表示一行的开头
  
  p("hello sir".matches("^h.*"));
  p("hello sir".matches(".*ir$"));
  // \b是单词边界
  p("hello sir".matches("^h[a-z]{1,3}o\\b.*"));
  p("hellosir".matches("^h[a-z]{1,3}o\\b.*"));*/
  //whilte lines 空白行  （以空白字符开头并且不是换行符 ）
  /*p(" \n".matches("^[\\s&&[^\\n]]*\\n$"));
  
  p("aaa 8888c".matches(".*\\d{4}."));
  p("aaa 8888c".matches(".*\\b\\d{4}."));
  p("aaa8888c".matches(".*\\d{4}."));
  p("aaa8888c".matches(".*\\b\\d{4}."));*/
  
  //email \w A word character: [a-zA-Z_0-9]    
  //[\\w[.-]]+  [a-zA-Z_0-9]或 .-  出现一次或多次
 /* p("asdfasdfsasdfasdf@asdfasdf.com".matches("[\\w[.-]]+@[\\w[.-]]+\\.[\\w]+"));
  */
  //matches find lookingAt
  /*
  Pattern p = Pattern.compile("\\d{3,5}");
  String s="123-34345-234-00";
  Matcher m = p.matcher(s);
  //匹配整个字串 "123-34345-234-00"
  p(m.matches());//匹配后引擎截取后"34345-234-00"
  //重新开始匹配
  m.reset();
  //匹配第一只串
  p(m.find());
  p(m.start()+"-"+m.end());
  p(m.find());
  p(m.start()+"-"+m.end());
  p(m.find());
  p(m.start()+"-"+m.end());
  p(m.find());
  //如果找不到会报错
  p(m.start()+"-"+m.end());
  //每次都重头匹配 lookingAt()
  p(m.lookingAt());
  p(m.lookingAt());
  p(m.lookingAt());
  p(m.lookingAt());
  */
  
 /* //replacement 把所含有java(无论大小写的，单数替换为java 偶数替换为JAVA
  Pattern p = Pattern.compile("java",Pattern.CASE_INSENSITIVE);
  Matcher m = p.matcher("java Java JAVa JaVa IloveJAVA you hateJava asdf");
  StringBuffer buf = new StringBuffer();
  int i=0;
  while(m.find()){
   //m.guoup()匹配的子串
   //p(m.group());
   i++;
   if(i%2==0){
    //找到子串放到buf中，并用后面的替换
    m.appendReplacement(buf, "java");
   }else
   {
    m.appendReplacement(buf, "JAVA");
   }
   
  }
  m.appendTail(buf);
  p(buf);*/
  
  //group
  Pattern p = Pattern.compile("(\\d{3,5})([a-z]{2})");
  String s ="123aa-34345bb-234cc-00";
  Matcher m = p.matcher(s);
  while(m.find())
  {
   //整个大组
   p(m.group());
  }
  
  m.reset();
  while(m.find())
  {
   //第1小组
   p(m.group(1));
  }
  m.reset();
  while(m.find())
  {
   //第2小组
   p(m.group(2));
  }  
 }
 
 
 
 
 public static void p (Object o)
 {
  System.out.println(o);
 }
 
 
} 
****************************************************
 
 
抓取文件中Email地址的程序代码
***************************************************************
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class EmailSpider {
 /**
  * @param args
  */
 public static void main(String[] args) {
  try {
   BufferedReader br= new BufferedReader(new FileReader("c:\\email.htm"));
   String line="";
   while((line=br.readLine())!=null)
   {
     parse(line); 
   }
  } catch (FileNotFoundException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  } catch (IOException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
  }
 }
 private static void parse(String line) {
  // TODO Auto-generated method stub
  Pattern p =Pattern.compile("[\\w[.-]]+@[\\w[.-]]+\\.[\\w]+");
  Matcher m = p.matcher(line);
  while(m.find())
  {
   System.out.println(m.group());
  }
  
//  Matcher m = 
 }
}
 

*****************************************************************
 

  


  
    
    分享到：
      
      
    
  

  
    
      hibernate应用笔记
      |
      PLSQL复习一
    
  
  
    
      2008-08-04 11:43
      浏览 941
      评论(0)
      
      
            
      查看更多
        
  
    
  
    评论
    
    
    
    
  

  
    发表评论
               您还没有登录,请您登录后再发表评论

相关推荐

SAP ABAP 正则表达式 Regular expressions

Regular expressions are a powerful tool for processing text-based information effectively and efficiently. The Regex Toy is a small, interactive tool aimed at ABAP developers who want to test their ...

正则表达式REGULAREXPRESSIONS[汇编].pdf

正则表达式REGULAREXPRESSIONS[汇编].pdf

Mastering Regular Expressions 3 ed pdf 版精通正则表达式（第三版）

该书质量之高，声誉之盛，使得几乎没有人企图挑战它的地位，从而在正则表达式图书领域形成了独特的“一夫当关”的局面，称其为正则表达式圣经，绝对当之无愧。 Mastering Regular Expressions, Third Edition, now ...

通用正则表达式详解

正则表达式详解基础正则表达式之道（A Tao of Regular Expressions）

学习正则表达式（Introducing Regular Expressions）

英文高清PDF版。 Introducing Regular ...此外，书中各在线和桌面工具一应俱全，并介绍了进阶参考资料，是一本不可多得的正则表达式入门好书。《学习正则表达式》适合对正则表达式感兴趣的程序员和互联网从业者。

正则表达式验证工具源码

正则表达式验证工具源码源码的简单描述： ...正则表达式验证工具，通过System.Text.RegularExpressions里边的Regex类的方法对输入的正则表达式与填写的字符串进行验证，界面干净简洁，功能实用，代码简单易懂。

18.C#字符串和正则表达式参考手册影印版

C#字符串和正则表达式参考手册目录第1章系统处理文本的方式 1 1.1 .NET Framework 1 1.1.1 公共语言运行时 2 1.1.2 .NET Framework类库 3 1.2 文本是一种数据类型 4 1.2.1 C#的数据类型 5 1.2.2 字符和字符集 6 ...

在C#中利用正则表达式判断用户输入是否是数字

正则表达式是一种强大的字符串匹配工具，它可以根据特定的模式来匹配字符串。在C#中，我们可以使用System.Text.RegularExpressions命名空间中的Regex类来使用正则表达式。在给定的代码中，我们可以看到作者使用了...

正则表达式之道（A Tao of Regular Expressions）.html

正则表达式之道（A Tao of Regular Expressions）.htm

正则表达式(regular expression)手册(CHM)

正则表达式(regular expression)手册(CHM) 要学习正则表达式的朋友可以看看啊很好用的手册哦。

Mastering Regular Expressions 3e 掌握正则表达式英文版

Mastering Regular Expressions 3e 掌握正则表达式英文版，我个人把水印去掉了，非常清晰

正则表达式验证工具源码（转）

正则表达式的用途很多，然而对网上公开的正则...正则表达式验证工具，通过System.Text.RegularExpressions里边的Regex类的方法对输入的正则表达式与填写的字符串进行验证，界面干净简洁，功能实用，代码简单易懂。

High Performance Java IO And Regular Expressions 高性能JAVA IO 与正则表达式

High Performance Java IO And Regular Expressions 高性能JAVA IO 与正则表达式

java 正则表达式大全菜鸟也能玩转

正则表达式大全正则表达式大全正则表达式大全正则表达式大全

Java正则表达式教程(Regular Expressions of Java Tutorial)

正因如此，正则表达式现在是作为程序员七种基本技能之一*，因此学习和使用它在工作中都能达到很高的效率。教程中所有的源代码都在 src 目录下，可以直接编译运行。由于当前版本的 Java Tutorial 是基于 JDK 6.0 的...

(正则表达式)Wrox - Beginning Regular Expressions - 2005 - (By Laxxuss).part1.rar

(正则表达式)Wrox - Beginning Regular Expressions - 2005 - (By Laxxuss).part1.rar (正则表达式)Wrox - Beginning Regular Expressions - 2005 - (By Laxxuss).part1.rar (正则表达式)Wrox - Beginning Regular ...

(正则表达式)Wrox - Beginning Regular Expressions - 2005 - (By Laxxuss).part2.rar

(正则表达式)Wrox - Beginning Regular Expressions - 2005 - (By Laxxuss).part2.rar (正则表达式)Wrox - Beginning Regular Expressions - 2005 - (By Laxxuss).part2.rar (正则表达式)Wrox - Beginning Regular ...

PCRE 【Perl兼容正则表达式解析库】

Perl兼容正则表达式解析库，本资源包是作者制作的静态库，版本为7.8。使用VC6在WinXp下编译通过。使用时请将pcre.h放到VC的include目录下。

《正则表达式必知必会》高清带书签

《正则表达式必知必会》(原书名：Sams Teach Yourself Regular Expressions in 10 Minutes)从简单的文本匹配开始，循序渐进地介绍了很多复杂内容，其中包括回溯引用、条件性求值和前后查找，等等。每章都为读者准备...

常用正则表达式//引入命名空间

常用正则表达式可以解决很多条件判断的用处使用方法： //引入命名空间 using System.Text.RegularExpressions; //声明Regex对象 Regex 对象名 = new Regex (表达式);//表达式为赋予对象的规则该表达式用字符串...

x	The character x
`\\`	The backslash character
`\0`n	The character with octal value `0`n (0 `<=` n `<=` 7)
`\0`nn	The character with octal value `0`nn (0 `<=` n `<=` 7)
`\0`mnn	The character with octal value `0`mnn (0 `<=` m `<=` 3, 0 `<=` n `<=` 7)
`\x`hh	The character with hexadecimal value `0x`hh
`\u`hhhh	The character with hexadecimal value `0x`hhhh
`\t`	The tab character (`'\u0009'`)
`\n`	The newline (line feed) character (`'\u000A'`)
`\r`	The carriage-return character (`'\u000D'`)
`\f`	The form-feed character (`'\u000C'`)
`\a`	The alert (bell) character (`'\u0007'`)
`\e`	The escape character (`'\u001B'`)
`\c`x	The control character corresponding to x
`[abc]`	`a`, `b`, or `c` (simple class)
`[^abc]`	Any character except `a`, `b`, or `c` (negation)
`[a-zA-Z]`	`a` through `z` or `A` through `Z`, inclusive (range)
`[a-d[m-p]]`	`a` through `d`, or `m` through `p`: `[a-dm-p]` (union)
`[a-z&&[def]]`	`d`, `e`, or `f` (intersection)
`[a-z&&[^bc]]`	`a` through `z`, except for `b` and `c`: `[ad-z]` (subtraction)
`[a-z&&[^m-p]]`	`a` through `z`, and not `m` through `p`: `[a-lq-z]`(subtraction)
`.`	Any character (may or may not match line terminators)
`\d`	A digit: `[0-9]`
`\D`	A non-digit: `[^0-9]`
`\s`	A whitespace character: `[ \t\n\x0B\f\r]`
`\S`	A non-whitespace character: `[^\s]`
`\w`	A word character: `[a-zA-Z_0-9]`
`\W`	A non-word character: `[^\w]`
`\p{Lower}`	A lower-case alphabetic character: `[a-z]`
`\p{Upper}`	An upper-case alphabetic character:`[A-Z]`
`\p{ASCII}`	All ASCII:`[\x00-\x7F]`
`\p{Alpha}`	An alphabetic character:`[\p{Lower}\p{Upper}]`
`\p{Digit}`	A decimal digit: `[0-9]`
`\p{Alnum}`	An alphanumeric character:`[\p{Alpha}\p{Digit}]`
`\p{Punct}`	Punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_`{\|}~
`\p{Graph}`	A visible character: `[\p{Alnum}\p{Punct}]`
`\p{Print}`	A printable character: `[\p{Graph}\x20]`
`\p{Blank}`	A space or a tab: `[ \t]`
`\p{Cntrl}`	A control character: `[\x00-\x1F\x7F]`
`\p{XDigit}`	A hexadecimal digit: `[0-9a-fA-F]`
`\p{Space}`	A whitespace character: `[ \t\n\x0B\f\r]`
`\p{javaLowerCase}`	Equivalent to java.lang.Character.isLowerCase()
`\p{javaUpperCase}`	Equivalent to java.lang.Character.isUpperCase()
`\p{javaWhitespace}`	Equivalent to java.lang.Character.isWhitespace()
`\p{javaMirrored}`	Equivalent to java.lang.Character.isMirrored()
`\p{InGreek}`	A character in the Greek block (simple block)
`\p{Lu}`	An uppercase letter (simple category)
`\p{Sc}`	A currency symbol
`\P{InGreek}`	Any character except one in the Greek block (negation)
`[\p{L}&&[^\p{Lu}]]`	Any letter except an uppercase letter (subtraction)
`^`	The beginning of a line
`$`	The end of a line
`\b`	A word boundary
`\B`	A non-word boundary
`\A`	The beginning of the input
`\G`	The end of the previous match
`\Z`	The end of the input but for the final terminator, if any
`\z`	The end of the input
X`?`	X, once or not at all
X`*`	X, zero or more times
X`+`	X, one or more times
X`{`n`}`	X, exactly n times
X`{`n`,}`	X, at least n times
X`{`n`,`m`}`	X, at least n but not more than m times
X`??`	X, once or not at all
X`*?`	X, zero or more times
X`+?`	X, one or more times
X`{`n`}?`	X, exactly n times
X`{`n`,}?`	X, at least n times
X`{`n`,`m`}?`	X, at least n but not more than m times
X`?+`	X, once or not at all
X`*+`	X, zero or more times
X`++`	X, one or more times
X`{`n`}+`	X, exactly n times
X`{`n`,}+`	X, at least n times
X`{`n`,`m`}+`	X, at least n but not more than m times
XY	X followed by Y
X`\|`Y	Either X or Y
`(`X`)`	X, as a capturing group
`\`n	Whatever the n^th capturing group matched
`\`	Nothing, but quotes the following character
`\Q`	Nothing, but quotes all characters until `\E`
`\E`	Nothing, but ends quoting started by `\Q`
`(?:`X`)`	X, as a non-capturing group
`(?idmsux-idmsux)`	Nothing, but turns match flags on - off
`(?idmsux-idmsux:`X`)`	X, as a non-capturing group with the given flags on - off
`(?=`X`)`	X, via zero-width positive lookahead
`(?!`X`)`	X, via zero-width negative lookahead
`(?<=`X`)`	X, via zero-width positive lookbehind
`(?<!`X`)`	X, via zero-width negative lookbehind
`(?>`X`)`	X, as an independent, non-capturing group

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

字符串处理利器--正则表达式(RegularExpressions)之一

Summary of regular-expression constructs

评论

发表评论

相关推荐

最近访客更多访客>>