Python 正则表达式（regex）

2018-06-19 1390

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： Python 正则表达式（regex）正则表达式正则表达式是对字符串操作的一种逻辑公式，就是用事先定义好的一些特定字符、及这些特定字符的组合，组成一个“规则字符串”，这个“规则字符串”用来表达对字符串的一种过滤逻辑正则表达式非Python独...

Python 正则表达式（regex）

正则表达式

正则表达式是对字符串操作的一种逻辑公式，就是用事先定义好的一些特定字符、及这些特定字符的组合，组成一个“规则字符串”，这个“规则字符串”用来表达对字符串的一种过滤逻辑

正则表达式非Python独有，在Python中使用re模块实现

常见匹配模式

模式          描述
\w            匹配数字、字母、下划线
\W            匹配非数字、字母、下划线
\s            匹配任意空白字符，等价于[\t\n\r\f]
\S            匹配任意非空字符
\d            匹配任意数字，等价于[0-9]
\D            匹配任意非数字
\A            匹配字符串开始
\Z            匹配字符串结束，如果是存在换行，只匹配到换行前的结束字符串
\z            匹配字符串结束
\G            匹配最后匹配完成的位置
\n            匹配一个换行符
\t            匹配一个制表符
^             匹配字符串的开头
$             匹配字符串的末尾
.             匹配任意字符，除了换行符，当re.DOTALL标记被指定时，则可以匹配包括换行符的任意字符。
[...]         用来表示一组字符，单独列出：[abc]匹配"a","b"或"c"
[^...]        不再[]中的字符：[^abc]匹配除了a,b,c之外的字符
*             匹配0个或多个的表达式
+             匹配1个或多个的表达式
？            匹配0个或1个由前面的正则表达式定义的片段，非贪婪模式
{n}           精确匹配n个前面表达式
{n,m}         匹配n到m次由前面的正则表达式定义的片段，贪婪模式
a|b           匹配a或b
（）          匹配括号内的表达式，也表示一个组

re.match

re.match 尝试从字符串的起始位置匹配一个模式，如果不是起始位置匹配成功的话, match()就返回none

re.match(pattern,string,flags=0)

常规匹配

import re


content = 'Hello 111 2222 World hello python'
print(len(content))
res = re.match('^Hello\s\d\d\d\s\d{4}\s\w{5}\s.*python$', content)
print(res)
print(res.group())
print(res.span())

运行结果：
33
<_sre.SRE_Match object; span=(0, 21), match='Hello 111 2222 World '>
Hello 111 2222 World 
(0, 21)

泛匹配

import re


content = 'Hello 111 2222 World hello python'
res = re.match('^Hello.*python$', content)
print(res)
print(res.group())
print(res.span())

运行结果：
<_sre.SRE_Match object; span=(0, 33), match='Hello 111 2222 World hello python'>
Hello 111 2222 World hello python
(0, 33)

匹配目标

import re


content = 'Hello 111 2222 World hello python'
res = re.match('^Hello\s(\d+)\s(\d+)\s.*python$', content)
print(res)
print(res.group(1), res.group(2))

运行结果：
<_sre.SRE_Match object; span=(0, 33), match='Hello 111 2222 World hello python'>
111 2222

贪婪模式

import re


content = 'Hello 111 2222 World hello python'
res = re.match('^H.*(\d+)\s(\d+).*python$', content)
print(res)
print(res.group(1), res.group(2))

运行结果：
<_sre.SRE_Match object; span=(0, 33), match='Hello 111 2222 World hello python'>
1 2222

非贪婪模式

import re


content = 'Hello 111222 World hello python'
res = re.match('^He.*?(\d+).*?python$', content)
print(res)
print(res.group(1))

运行结果：
<_sre.SRE_Match object; span=(0, 31), match='Hello 111222 World hello python'>
111222

匹配模式

模式                  描述
re.I                  匹配的字符忽略大小写
re.M                  多行匹配
re.L                  本地化识别匹配
re.U                  根据Unicode进行相应化解析
re.S                  让 . 匹配包括换行符

import re


content = """Hello 1112222 World 
          hello python"""
res = re.match('^H.*?(\d+).*?python$', content, re.S)
print(res)
print(res.group(1))

运行结果：
<_sre.SRE_Match object; span=(0, 43), match='Hello 1112222 World \n          hello python'>
1112222

转义

import re


content = """The apple's price is $5.00"""
res = re.match('The apple\'s price is \$5.00', content, re.S)
print(res)
print(res.group())

<_sre.SRE_Match object; span=(0, 26), match="The apple's price is $5.00">
The apple's price is $5.00

总结：尽量使用泛匹配、使用括号得到匹配目标、尽量使用非贪婪模式、由换行符就用re.S

re.search

re.search 扫描整个字符串并返回第一个成功的匹配

# 使用re.match()
import re


content = """This is a string"""
res = re.match('a', content, re.S)
print(res)

运行结果：
None

# 使用re.search()
import re


content = """This is a string"""
res = re.search('a\s\w*', content, re.S)
print(res)
print(res.group())

运行结果：
<_sre.SRE_Match object; span=(8, 16), match='a string'>
a string

总结：为匹配方便，能用search就不用match

re.findall

搜索字符串，以列表形式返回全部能匹配的子串

import re


content = """This is a string"""
res = re.findall('a\s\w*', content, re.S)
print(res)

运行结果：
['a string']

re.sub

替换字符串中每一个匹配的子串后返回替换后的字符串

import re


content = """This is 222211111 string"""
res = re.sub('\d+', 'a',content)
print(res)

运行结果：
This is a string

re.compile

将正则字符串编译成正则表达式对象

将一个正则表达式串编译成正则对象，以便于复用该匹配模式

import re


content = """This is 222211111 string"""
pattern = re.compile('\d+')
res = re.search(pattern, content)
print(res)
print(res.group())

运行结果：
<_sre.SRE_Match object; span=(8, 17), match='222211111'>
222211111

欢迎访问

个人博客地址：www.limiao.tech

文章标签：

Python

Python 正则表达式（regex）

Python 正则表达式（regex）

正则表达式

常见匹配模式

re.match

常规匹配

泛匹配

匹配目标

贪婪模式

非贪婪模式

匹配模式

转义

re.search

re.findall

re.sub

re.compile

个人博客地址：www.limiao.tech

热门文章

最新文章

相关课程

相关电子书

推荐镜像

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

Python 正则表达式（regex）

Python 正则表达式（regex）

正则表达式

常见匹配模式

re.match

常规匹配

泛匹配

匹配目标

贪婪模式

非贪婪模式

匹配模式

转义

re.search

re.findall

re.sub

re.compile

个人博客地址：www.limiao.tech

热门文章

最新文章

相关课程

相关电子书

推荐镜像