正则表达式语法讲解(一)

简介:
Important note
      Below is the description of regular expressions implemented in freeware library TRegExpr. Please note, that the library widely used in many free and commertial software products. The author of TRegExpr library cannot answer direct questions from this products' users. Please, send Your questions to the product's support first.
      重要事项
      以下是对自由软件TregExpr库实现的正则表达式的说明。请注意,这个库广泛用于很多免费和商业软件产品。但TregExpr的作者不会直接回答来自使用这些产品的用户的问题。如果(这些用户)需要帮助,请先发送你的问题到这些产品的售后服务部门。
 
      Introduction
      Regular Expressions are a widely-used method of specifying patterns of text to search for. Special metacharacters allow You to specify, for instance, that a particular string You are looking for occurs at the beginning or end of a line, or contains n recurrences of a certain character.
      介绍
      正则表达式是广泛使用的、根据指定的文本模式进行查找的方法。它允许你指定特殊的原字符,比如你可以查找位于一行开头或结尾的特殊字符串,或者包括n个重复出现字符的字符串。
 
      Regular expressions look ugly for novices, but really they are very simple (well, usually simple ;) ), handly and powerfull tool.
      正则表达式对于初学者看来很费解,其实它真的是非常简单、易学和强大的工具。
 
      I recommend You to play with regular expressions using RegExp Studio - it'll help You to uderstand main conceptions. Moreover, there are many predefined examples with comments included into repository of R.e. visual debugger.
      我建议你使用RegExp Studio学习正则表达式-它可以帮助你理解主要的概念。另外,R.e. visual debugger的资料库里面还有很多有注释的完整示例。
 
      Let's start our learning trip!
      让我们开始吧!
 
      Simple matches
 
      Any single character matches itself, unless it is a metacharacter with a special meaning described below.
      简单匹配
      任何一个字符匹配它自己,除非它是下面有特殊含义的元字符。
 
      A series of characters matches that series of characters in the target string, so the pattern "bluh" would match "bluh'' in the target string. Quite simple, eh ?
      一系列的字符匹配目标串中相同的字符,所以“bluh”模式匹配目标串里的“bluh”。非常简单,不是吗?
 
      You can cause characters that normally function as metacharacters or escape sequences to be interpreted literally by 'escaping' them by preceding them with a backslash "\", for instance: metacharacter "^" match beginning of string, but "\^" match character "^", "\\" match "\" and so on.
      你可以使字符作为一个元字符的功能处理,或者通过在它们的前面加反斜线“\” 做转义序列处理,即按它们的字面意思进行解释,比如:元字符“^”匹配字符串的开头,但“\^”匹配字符“^”,同样的有“\\”表示“\”等。
 
      Examples:
 
        foobar           matchs string 'foobar'
        \^FooBarPtr      matchs '^FooBarPtr'
 
      Note for C++ Builder users
      Please, read in FAQ answer on question Why many r.e. work wrong in Borland C++ Builder?
      C++Builder的使用者注意
      请阅读FAQ中回答的关于为什么许多r.e在Borland C++ Builder无法正常工作的问题?
 
      Escape sequences
      转义序列
 
      Characters may be specified using a escape sequences syntax much like that used in C and Perl: "\n'' matches a newline, "\t'' a tab, etc. More generally, \xnn, where nn is a string of hexadecimal digits, matches the character whose ASCII value is nn. If You need wide (Unicode) character code, You can use '\x{nnnn}', where 'nnnn' - one or more hexadecimal digits.
 
 
        \xnn      char with hex code nn
       \x{nnnn} char with hex code nnnn (one byte for plain text and two bytes for Unicode)
        \t        tab (HT/TAB), same as \x09
       \n        newline (NL), same as \x0a
       \r        car.return (CR), same as \x0d
        \f        form feed (FF), same as \x0c
        \a        alarm (bell) (BEL), same as \x07
        \e        escape (ESC), same as \x1b
 
        \xnn      16进制nn形式的字符
        \x{nnnn} 16进制nnnn形式的字符(一字节用于明文,两字节用于Unicode)
       \t        tab (HT/TAB), 同\x09
       \n        换行 (NL), 同\x0a
       \r        回车(CR), 同\x0d
        \f        换页 (FF), 同\x0c
        \a        报警 (bell) (BEL), 同\x07
        \e        逃逸符 (ESC), 同\x1b
 
 
      Examples:
 
       foo\x20bar    matchs 'foo bar' (note space in the middle)
       \tfoobar      matchs 'foobar' predefined by tab
        foo\x20bar    匹配’foo bar’(注意中间的空格)
        \tfoobar      匹配前面有tab的’foobar’
 
      Character classes
      字符类
      You can specify a character class, by enclosing a list of characters in [], which will match any one character from the list.
      你可以通过用[]包括一系列字符指定一个字符类, 将匹配任何[]中的字符。
 
      If the first character after the "['' is "^'', the class matches any character not in the list.
      如果[后第一个字符使“^”,这个类将匹配任何不在这个[]里的的列表。
 
      Examples:
       foob[aeiou]r    finds strings 'foobar', 'foober' etc. but not 'foobbr', 'foobcr' etc.
       foob[^aeiou]r find strings 'foobbr', 'foobcr' etc. but not 'foobar', 'foober' etc.
       foob[aeiou]r    匹配'foobar', 'foober'等,但不匹配'foobbr', 'foobcr'等.
       foob[^aeiou]r 匹配'foobbr', 'foobcr'等,但不匹配'foobar', 'foober'等.
 
 
      Within a list, the "-'' character is used to specify a range, so that a-z represents all characters between "a'' and "z'', inclusive.
      在一个列表中,“-”表示一个范围,所以a-z表示a到z间的所有字符。
 
      If You want "-'' itself to be a member of a class, put it at the start or end of the list, or escape it with a backslash. If You want ']' you may place it at the start of list or escape it with a backslash.
      如果你要匹配“-”,你要把它放在列表的开始或者结束,或者用“\”转义。
      如果你要匹配“]”,你要把它放在列表的开始,或者用“\”转义。
 
 
      Examples:
        [-az]       matchs 'a', 'z' and '-'
 
        [az-]       matchs 'a', 'z' and '-'
        [a\-z]      matchs 'a', 'z' and '-'
        [a-z]       matchs all twenty six small characters from 'a' to 'z'
       [\n-\x0D] matchs any of #10,#11,#12,#13.
       [\d-t]      matchs any digit, '-' or 't'.
       []-a]       matchs any char from ']'..'a'.
 
 
      Metacharacters
      元字符
 
      Metacharacters are special characters which are the essence of Regular Expressions. There are different types of metacharacters, described below.
      元字符是正在表达式的本质,它是一类特殊的字符,下面展示了不同类型的元字符:
 
      Metacharacters - line separators
      元字符 – 行分隔符
 
        ^       start of line。表示一行的开头
        $       end of line。表示一行的结束
        \A      start of text。表示文本的开始
       \Z      end of text。表示文本的结束
       .       any character in line。匹配任意一个字符
 
      Examples:
       ^foobar      matchs string 'foobar' only if it's at the beginning of line
       foobar$      matchs string 'foobar' only if it's at the end of line
       ^foobar$     matchs string 'foobar' only if it's the only string in line
 
       foob.r       matchs strings like 'foobar', 'foobbr', 'foob1r' and so on
 
      The "^" metacharacter by default is only guaranteed to match at the beginning of the input string/text, the "$" metacharacter only at the end. Embedded line separators will not be matched by "^'' or "$''.
      当嵌入行分割符后,"^"或"$''就不在表示原来的意思。
 
      You may, however, wish to treat a string as a multi-line buffer, such that the "^'' will match after any line separator within the string, and "$'' will match before any line separator. You can do this by switching On the modifier /m.
      但是你可能处理多行文本,但是这样"^"或"$''就只会匹配行分隔符后的开头或者结束。这时,你可以启用修改符/m。





本文转自 xkdcc 51CTO博客,原文链接:http://blog.51cto.com/brantc/116473如需转载请自行联系原作者
目录
相关文章
|
5月前
|
机器学习/深度学习 存储 JavaScript
正则表达式基础语法与Java、JS使用实例
正则表达式基础语法与Java、JS使用实例
97 1
|
4月前
|
自然语言处理 JavaScript 前端开发
Python高级语法与正则表达式(二)
正则表达式描述了一种字符串匹配的模式,可以用来检查一个串是否含有某种子串、将匹配的子串做替换或者从某个串中取出符合某个条件的子串等。
|
4月前
|
安全 算法 Python
Python高级语法与正则表达式(一)
Python提供了 with 语句的写法,既简单又安全。 文件操作的时候使用with语句可以自动调用关闭文件操作,即使出现异常也会自动关闭文件操作。
|
5月前
正则表达式语法讲解
正则表达式语法讲解
46 0
|
存储 索引 Python
【100天精通python】Day23:正则表达式,基本语法与re模块详解示例
【100天精通python】Day23:正则表达式,基本语法与re模块详解示例
100 0
Python正则表达式语法快速入门
正则表达式需要与相关函数共同使用,对函数的学习可以参考: Python正则表达式所有函数详解
|
前端开发
前端知识案例86-javascript基础语法-正则表达式
前端知识案例86-javascript基础语法-正则表达式
62 0
前端知识案例86-javascript基础语法-正则表达式
|
前端开发
前端知识案例92-javascript基础语法-常见正则表达式
前端知识案例92-javascript基础语法-常见正则表达式
68 0
前端知识案例92-javascript基础语法-常见正则表达式
|
JavaScript
js基础笔记学习202正则表达式语法3量词
js基础笔记学习202正则表达式语法3量词
69 0
js基础笔记学习202正则表达式语法3量词
|
JavaScript
js基础笔记学习200正则表达式语法1
js基础笔记学习200正则表达式语法1
70 0
js基础笔记学习200正则表达式语法1