特殊构造（非捕获总结）

参考：http://blog.chenlb.com/2008/12/java-regular-expression-special-constructs-ornon-capturing-group.html

在 java api 文档中的正则表达式关于特殊构造(非捕获组)的说明看不懂。例如：

(?:X)	X，作为非捕获组
(?idmsux-idmsux)	Nothing，但是将匹配标志由 on 转为 off
(?idmsux-idmsux:X)	X，作为带有给定标志 on - off 的非捕获组
(?=X)	X，通过零宽度的正 lookahead
(?!X)	X，通过零宽度的负 lookahead
(?<=X)	X，通过零宽度的正 lookbehind
(?<!X)	X，通过零宽度的负 lookbehind
(?>X)	X，作为独立的非捕获组

这些字都说的很抽象。不懂……。还是搜索下去。找到火龙果的解释如下：

以 (? 开头，) 结尾的都称为非捕获组，在匹配完成后在内存中不保留匹配到的字符。

非捕获组的应用比较复杂，这里只能简单地说一下它们的意思。

1、(?:X) X，作为非捕获组
与捕获组 ( ) 的意思一样也是将其作为一组进行处理，与捕获组的区别在于不捕获匹配的文本，仅仅作为分组。
比如：

要匹配 123123 这个，就可以写为 (123)\1 使用反向引用，这时只能用捕获组，在匹配123 后会保留在内存中，便于反向引用；

(?:123) 在匹配完后则不会保留反向引用，区别仅在于此。不保留反向引用可以节省内存，提高效率。

(123)\1等效(?:123)123，代码示例：

 /*** outputs* 模式(123)\1* 123123*/Pattern p = Pattern.compile("(123)\\1");Matcher m = p.matcher("ad123dfe123123grr");System.out.println("模式(123)\\1");while(m.find()){System.out.println(m.group());}/*** outputs* 模式(?:123)123* 123123*/p = Pattern.compile("(?:123)123");m = p.matcher("ad123dfe123123grr");System.out.println("模式(?:123)123");while(m.find()){System.out.println(m.group());}

2、(?idmsux-idmsux) Nothing，但是将匹配标志i d m s u x on - off
用于标志匹配，比如：表达式 (?i)abc(?-i)def 这时，(?i) 打开不区分大小写开关，abc 匹配
不区分大小地进行匹配，(?-i) 关闭标志，恢复不区分大小写，这时的 def 只能匹配 def

//(?i)忽略大小写，这个简单。但是只适合ASCII字符//当有俄文字符——小写б(大写Б)л(Л)，这时要一起用(?u)p = Pattern.compile("(?i)(?u)sayбл");m = p.matcher("This is a test sayбл hello.\n"+"Wello SayБЛ \nello?");System.out.println("参数为(?i)(?u)");while(m.find()){System.out.println(m.group());}

//单行模式下，.可以匹配任何字符(包括\n)/*** outputs* hello* Wello* \nello*/p = Pattern.compile("(?s).ello");m = p.matcher("This is a test say hello.\n"+ "Wello say \nello?");System.out.println("参数为(?s)");while(m.find()){System.out.println(m.group());}

 //多行模式下，\n 或\r\n 作为行的分隔符，不匹配(.)/*** outputs:* hello* Wello*/p = Pattern.compile("(?m).ello");m = p.matcher("This is a test say hello.\n"+ "Wello say \nello?");System.out.println("参数为(?m)");while(m.find()){System.out.println(m.group());}

   //(?d)模式，启动UNIX行模式，只认 \n//UNIX 行: \n//WINDOWS 行：\r\n/*** outputs:* ello start:27end:32* ello?start:37end:42*/p = Pattern.compile("(?m)ello.");m = p.matcher("This is a test say hello\r\n"+"Wello say \nello?");System.out.println("参数为(?d)");while(m.find()){System.out.println( m.group() + "start:" + m.start() + "end:" + m.end() );}//单独使用(?m)能依据\n 和\r\n 来分行/*** outputs:* ello* start:20end:25* ello start:27end:32* ello?start:37end:42*/p = Pattern.compile("(?d)(?m)ello.");m = p.matcher("This is a test say hello\r\n"+"Wello say \nello?");System.out.println("参数为(?d)");while(m.find()){System.out.println( m.group() + "start:" + m.start() + "end:" + m.end() );}//组合使用(?d)(?m)只能依据\n 来分行

3、(?idmsux-idmsux:X) X，作为带有给定标志 i d m s u x on - off
与上面的类似，上面的表达式，可以改写成为：(?i:abc)def，或者 (?i)abc(?-i:def)

4、(?=X) X，通过零宽度的正 lookahead
5、(?!X) X，通过零宽度的负 lookahead
(?=X) 表示当前位置（即字符的缝隙）后面允许出现的字符，比如：表示式 a(?=b)，在字符串为
ab 时，可能匹配 a，后面的 (?=b) 表示，a 后面的缝隙，可以看作是零宽度。
(?!X) 表示当前位置后面不允许出现的字符

字符扫描，从左到右，所以前瞻即向右看，后瞻即向左看！

 //a(?=b)，匹配a前瞻(lookahead)是b；即ab但不捕获b/*** outputs* a:start=4,end=5* a:start=9,end=10*/p = Pattern.compile("a(?=b)");m = p.matcher("aacdabaaeabdaBh");System.out.println("a(?=b)");while(m.find()){System.out.println(m.group() + ":start=" + m.start() + ",end=" + m.end());}

//a(?!b)，匹配a前瞻(lookahead)是非b；即a[^b]但不捕获[^b]/*** outputs* a:start=0,end=1* a:start=1,end=2* a:start=6,end=7* a:start=7,end=8* a:start=12,end=13*/p = Pattern.compile("a(?!b)");m = p.matcher("aacdabaaeabdaBh");System.out.println("a(?!b)");while(m.find()){System.out.println(m.group() + ":start=" + m.start() + ",end=" + m.end());}

6、(? <=X) X，通过零宽度的正 lookbehind
7、(? <!X) X，通过零宽度的负 lookbehind
这两个与上面两个类似，上面两个是向后看，这个是向前看

//(?<=b)a，匹配a后瞻(lookbehind)是b；即ba但不捕获b/*** outputs* a:start=6,end=7*/p = Pattern.compile("(?<=b)a");m = p.matcher("aacdabaaeabdaBh");System.out.println("(?<=b)a");while(m.find()){System.out.println(m.group() + ":start=" + m.start() + ",end=" + m.end());}

//(?<!b)a，匹配a后瞻(lookbehind)是非b；即[^b]a但不捕获[^b]/*** outputs* a:start=0,end=1* a:start=1,end=2* a:start=4,end=5* a:start=7,end=8* a:start=9,end=10* a:start=12,end=13*/p = Pattern.compile("(?<!b)a");m = p.matcher("aacdabaaeabdaBh");System.out.println("(?<!b)a");while(m.find()){System.out.println(m.group() + ":start=" + m.start() + ",end=" + m.end());}

8、(?>X) X，作为独立的非捕获组
匹配成功不进行回溯，这个比较复杂，也侵占量词“+”可以通用，比如：\d++ 可以写为 (?>\d+)。

//(?>x)不回溯的匹配，性能优化/*** outputs* integer:start=5,end=12* insert:start=17,end=23* in:start=27,end=29*/p = Pattern.compile("\\b(?>integer|insert|in)\\b");m = p.matcher("test integer and insert of in it");System.out.println("\\b(?>integer|insert|in)\\b");while(m.find()){System.out.println(m.group() + ":start=" + m.start() + ",end=" + m.end());}

// 换了个顺序，结果大不一样！

/*** outputs* in:start=27,end=29*/p = Pattern.compile("\\b(?>in|integer|insert)\\b");m = p.matcher("test integer and insert of in it");System.out.println("\\b(?>in|integer|insert)\\b");while(m.find()){System.out.println(m.group() + ":start=" + m.start() + ",end=" + m.end());}

还没搞懂，为什么换个顺序后结果就不同了，反正别人建议是：长的放前面，短的放后面。至于原因嘛，以后懂了再说。

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/498149.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！