NET Regular Expressions: Regex and Balanced Matching (转)

简介:
< DOCTYPE html PUBLIC -WCDTD XHTML StrictEN httpwwwworgTRxhtmlDTDxhtml-strictdtd>

One of the questions that seems to come up a lot is that someone wants to match balanced parenthesis. Something like the string “(aa (bbb) (bbb) aa)” and they want to match from the beginning parenthesis to the matching end parenthesis. Generally this is not possible with regular expression, that language just is not descriptive enough to handle this. For the longest time this is how I answered these question when they came to me.

However in .Net this is actually possible with something called Balancing Group Definition. This construct generally looks like (?<name1-name2>). The following is what MSDN has to say about this:

Balancing group definition. Deletes the definition of the previously defined group name2 and stores in group name1 the interval between the previously defined name2 group and the current group. If no group name2 is defined, the match backtracks. Because deleting the last definition of name2 reveals the previous definition of name2, this construct allows the stack of captures for group name2 to be used as a counter for keeping track of nested constructs such as parentheses. In this construct, name1 is optional. You can use single quotes instead of angle brackets; for example, (?'name1-name2').

The following expression matches all balanced opening and closing angle brackets(<>). Angle brackets were used because they do no require escaping like parenthesis and make the expression a little easier to read:

            <

            [^<>]*

            (

                        (

                                    (?<Open><)

                                    [^<>]*

                        )+

                        (

                                    (?<Close-Open>>)

                                    [^<>]*

                        )+

            )*

            (?(Open)(?!))

>

The outer most group just matches an open angle bracket followed by anything that is not a angle bracket followed by close angle bracket. I will explain “(?(Open)(?!))” later.

The inner group does all of the interesting angle bracket matching. The Open group matches only the open angle bracket and the following part of expression matches anything that is not an angle bracket. So the first group will basically match anything up till the first close angle bracket.

It is best to think of a Group as a Stack of captures. Where the top of the stack is the last capture made. (?<Close-Open>\)) Matches to “)” and pops a capture off of the Open group’s capture stack. This match can only be successful if and only if the Open group’s capture stack is not empty. This is a fancy way of saying that for every match of this group there must be a match of the group Open.

So now we know that for every closing angle bracket there must have been an opening angle bracket. However we still have done nothing to assert that for every opening angle bracket there is a matching closing angle bracket. That is where the (?(Open)(?!)) part of the expression comes into play. This expression tells Regex to match (?!) if the Open group still contains a match(i.e. there were more open angle brackets then close angle brackets). Trying to match (?!) will always cause the expression to fail. Basically this is a way of making the expression fail if the Open group still contains a capture.


本文转自 netcorner 博客园博客,原文链接:http://www.cnblogs.com/netcorner/archive/2008/10/21/2912102.html    ,如需转载请自行联系原作者 http://www.cnblogs.com/netcorner/archive/2008/10/21/2912102.html

相关文章
|
索引
一起谈.NET技术,改善代码设计 —— 简化条件表达式(Simplifying Conditional Expressions)
  系列博客       1. 改善代码设计 —— 优化函数的构成(Composing Methods)       2. 改善代码设计 —— 优化物件之间的特性(Moving Features Between Objects)       3.
1011 0
|
索引
改善代码设计 —— 简化“.NET技术”条件表达式(Simplifying Conditional Expressions)
  系列博客       1. 改善代码设计 —— 优化函数的构成(Composing Methods)       2. 改善代码设计 —— 优化物件之间的特性(Moving Features Between Objects)       3.
861 0
|
索引
改善“.NET研究”代码设计 —— 简化条件表达式(Simplifying Conditional Expressions)
  系列博客       1. 改善代码设计 —— 优化函数的构成(Composing Methods)       2. 改善代码设计 —— 优化物件之间的特性(Moving Features Between Objects)       3.
728 0
|
3月前
|
开发框架 前端开发 JavaScript
ASP.NET MVC 教程
ASP.NET 是一个使用 HTML、CSS、JavaScript 和服务器脚本创建网页和网站的开发框架。
48 7
|
3月前
|
存储 开发框架 前端开发
ASP.NET MVC 迅速集成 SignalR
ASP.NET MVC 迅速集成 SignalR
79 0
|
4月前
|
开发框架 前端开发 .NET
ASP.NET MVC WebApi 接口返回 JOSN 日期格式化 date format
ASP.NET MVC WebApi 接口返回 JOSN 日期格式化 date format
57 0
|
4月前
|
开发框架 前端开发 安全
ASP.NET MVC 如何使用 Form Authentication?
ASP.NET MVC 如何使用 Form Authentication?
|
4月前
|
开发框架 .NET
Asp.Net Core 使用X.PagedList.Mvc.Core分页 & 搜索
Asp.Net Core 使用X.PagedList.Mvc.Core分页 & 搜索
148 0
|
7月前
|
开发框架 前端开发 .NET
ASP.NET CORE 3.1 MVC“指定的网络名不再可用\企图在不存在的网络连接上进行操作”的问题解决过程
ASP.NET CORE 3.1 MVC“指定的网络名不再可用\企图在不存在的网络连接上进行操作”的问题解决过程
212 0
|
7月前
|
开发框架 前端开发 JavaScript
JavaScript云LIS系统源码ASP.NET CORE 3.1 MVC + SQLserver + Redis医院实验室信息系统源码 医院云LIS系统源码
实验室信息系统(Laboratory Information System,缩写LIS)是一类用来处理实验室过程信息的软件,云LIS系统围绕临床,云LIS系统将与云HIS系统建立起高度的业务整合,以体现“以病人为中心”的设计理念,优化就诊流程,方便患者就医。
87 0