14、Linux Shell 笔记(7),正则表达式

简介: 正则表达式 A regular expression is a pattern template you define that a Linux utility(sed,gawk等) uses to filter text. 正则表达式由正则表达式引擎来实现(regular expression engine)。

正则表达式

A regular expression is a pattern template you define that a Linux utility(sed,gawk) uses to filter text.

wps_clip_image-16400

正则表达式由正则表达式引擎来实现(regular expression engine)

In the Linux world, there are two popular regular expression engines:

The POSIX Basic Regular Expression (BRE) engine

The POSIX Extended Regular Expression (ERE) engine

定义BRE模式

1、纯文本

$ echo "This is a test" | sed -n /test/p

This is a test

正则表达式并不关心模式出现在数据流中的位置,关键是匹配正则表达式模式与数据流文本。正则表达式模式区分大小写。空格像其它字符一样处理。

2、特殊字符

The special characters recognized by regular expressions are:

.*[]^${}\+?|()

不要在文本模式中单独使用这些字符。可以用转义字符(\)把这些字符当作普通字符。

3、定位符

1The caret character (^) defines a pattern that starts at the beginning of a line of text in the data stream. If the pattern is located any place other than the start of the line of text, the regular expression pattern fails.

$ echo "Books are great" | sed -n /^Book/p

Books are great

2The opposite of looking for a pattern at the start of a line is looking for it at the end of a line. The dollar sign ($) special character defines the end anchor. Add this special character after a text pattern to indicate that the line of data must end with the text pattern:

$ echo "This is a good book" | sed -n /book$/p

This is a good book

3The dot special character is used to match any single character except a newline character. The dot character must match a character though; if theres no character in the place of the dot, then the pattern will fail.

4)字符类

用方括号来定义字符类。

$ sed -n /[ch]at/pdata6

The cat is sleeping.

That is a very nice hat.

5)否定字符类

$ sed -n /[^ch]at/pdata6

This test is at line two.

6)使用范围

You can use a range of characters within a character class by using the dash symbol

Just specify the first character in the range, a dash, then the last character in the range. The regular expression includes any character thats within the specified character range

$ sed -n /^[0-9][0-9][0-9][0-9][0-9]$/pdata8

60633

46201

45902

7)特殊字符类

BRE Special Character Classes

Class

Description

[[:alpha:]]

 Match any alphabetical character, either upper or lower case.

[[:alnum:]]

Match any alphanumeric character 0–9, A–Z, or a–z.

[[:blank:]]

 Match a space or Tab character.

[[:digit:]]

Match a numerical digit from 0 through 9.

[[:lower:]]

Match any lower-case alphabetical character a–z.

[[:print:]]

Match any printable character.

[[:punct:]]

 Match a punctuation character.

[[:space:]]

 Match any whitespace character: space, Tab, NL, FF, VT, CR.

[[:upper:]]

Match any upper-case alphabetical character A–Z.

8)星号

Placing an asterisk after a character signifies that the character must appear zero or more times in the text to match the pattern:

$ echo "ik" | sed -n /ie*k/p

ik

扩展正则表达式

gawk支持,而sed不支持。

1)问号

The question mark is similar to the asterisk, but with a slight twist. The question mark indicates that the preceding character can appear zero or one time, but thats all. It doesnt match repeating occurrences of the character:

$ echo "bt" | gawk /be?t/{print $0}

Bt

2)加号

The plus sign is another pattern symbol thats similar to the asterisk, but with a different twist than the question mark. The plus sign indicates that the preceding character can appear one or more times, but must be present at least once. The pattern doesnt match if the character is not present:

$ echo "beeet" | gawk /be+t/{print $0}

beeet

3)大括号

Curly braces are available in ERE to allow you to specify a limit on a repeatable regular expression. This is often referred to as an interval. You can express the interval in two formats:

m: The regular expression appears exactly m times.

m,n: The regular expression appears at least m times, but no more than n times.

This feature allows you to fine-tune exactly how many times you allow a character (or character class) to appear in a pattern.

4)管道符号

The pipe symbol allows to you to specify two or more patterns that the regular expression engine uses in a logical OR formula when examining the data stream. If any of the patterns match the data stream text, the text passes. If none of the patterns match, the data stream text fails.

The format for using the pipe symbol is:

expr1|expr2|...

$ echo "The cat is asleep" | gawk /cat|dog/{print $0}

The cat is asleep

5)将表达式分组

Regular expression patterns can also be grouped by using parentheses. When you group a regular expression pattern, the group is treated like a standard character. You can apply a special character to the group just as you would to a regular character.

$ echo "Sat" | gawk /Sat(urday)?/{print $0}

Sat

$ echo "Saturday" | gawk /Sat(urday)?/{print $0}

Saturday

$

几个例子

1)计算文件目录

$ cat countfiles

#!/bin/bash

# count number of files in your PATH

mypath=`echo $PATH | sed s/:/ /g`

count=0

for directory in $mypath

do

check=`ls $directory`

for item in $check

do

count=$[ $count + 1 ]

done

echo "$directory - $count"

count=0

done

$ ./countfiles

/usr/local/bin - 79

/bin - 86

/usr/bin - 1502

/usr/X11R6/bin - 175

/usr/games - 2

/usr/java/j2sdk1.4.1 01/bin - 27

$

/usr/local/bin - 79

/bin - 86

/usr/bin - 1502

/usr/X11R6/bin - 175

2)验证电话号码

$ cat isphone

#!/bin/bash

# script to filter out bad phone numbers

gawk --re-interval /^\(?[2-9][0-9]{2}\)?(| |-|\.)

[0-9]{3}( |-|\.)[0-9]{4}/{print $0}

$

By default, the gawk program doesnt recognize regular expression intervals. You must specify the --re-interval command line option for the gawk program to recognize

regular expression intervals.

(123)456-7890

(123) 456-7890

123-456-7890

123.456.7890

3)解析电子邮件地址

^([a-zA-Z0-9 \-\.\+]+)@([a-zA-Z0-9 \-\.]+)\.([a-zA-Z]{2,5})$

参考:

1Linux命令行和SHELL脚本编程

目录
相关文章
|
2月前
|
存储 安全 Unix
七、Linux Shell 与脚本基础
别再一遍遍地敲重复的命令了,把它们写进Shell脚本,就能一键搞定。脚本本质上就是个存着一堆命令的文本文件,但要让它“活”起来,有几个关键点:文件开头最好用#!/usr/bin/env bash来指定解释器,并用chmod +x给它执行权限。执行时也有讲究:./script.sh是在一个新“房间”(子Shell)里跑,不影响你;而source script.sh是在当前“房间”里跑,适合用来加载环境变量和配置文件。
402 10
|
2月前
|
存储 Shell Linux
八、Linux Shell 脚本:变量与字符串
Shell脚本里的变量就像一个个贴着标签的“箱子”。装东西(赋值)时,=两边千万不能有空格。用单引号''装进去的东西会原封不动,用双引号""则会让里面的$变量先“变身”再装箱。默认箱子只能在当前“房间”(Shell进程)用,想让隔壁房间(子进程)也能看到,就得给箱子盖个export的“出口”戳。此外,Shell还自带了$?(上条命令的成绩单)和$1(别人递进来的第一个包裹)等许多特殊箱子,非常有用。
287 3
|
2月前
|
算法 Linux Shell
Linux实用技能:打包压缩、热键、Shell与权限管理
本文详解Linux打包压缩技巧、常用命令与原理,涵盖.zip与.tgz格式操作、跨系统传文件方法、Shell运行机制及权限管理,助你高效使用Linux系统。
Linux实用技能:打包压缩、热键、Shell与权限管理
|
4月前
|
Web App开发 缓存 安全
Linux一键清理系统垃圾:释放30GB空间的Shell脚本实战​
这篇博客介绍了一个实用的Linux系统盘清理脚本,主要功能包括: 安全权限检查和旧内核清理,保留当前使用内核 7天以上日志文件清理和系统日志压缩 浏览器缓存(Chrome/Firefox)、APT缓存、临时文件清理 智能清理Snap旧版本和Docker无用数据 提供磁盘空间使用前后对比和大文件查找功能 脚本采用交互式设计确保安全性,适合定期维护开发环境、服务器和个人电脑。文章详细解析了脚本的关键功能代码,并给出了使用建议。完整脚本已开源,用户可根据需求自定义调整清理策略。
499 1
|
6月前
|
Linux Shell
Centos或Linux编写一键式Shell脚本删除用户、组指导手册
Centos或Linux编写一键式Shell脚本删除用户、组指导手册
191 4
|
6月前
|
Linux Shell 数据安全/隐私保护
Centos或Linux编写一键式Shell脚本创建用户、组、目录分配权限指导手册
Centos或Linux编写一键式Shell脚本创建用户、组、目录分配权限指导手册
386 3
|
7月前
|
Linux Shell
在Linux、CentOS7中设置shell脚本开机自启动服务
以上就是在CentOS 7中设置shell脚本开机自启动服务的全部步骤。希望这个指南能帮助你更好地管理你的Linux系统。
606 25
|
7月前
|
Linux Shell
shell_42:Linux参数移动
总的来说,参数移动是Linux shell脚本中的一个重要概念,掌握它可以帮助我们更好地处理和管理脚本中的参数。希望这个解释能帮助你理解和使用参数移动。
154 18
|
6月前
|
运维 监控 中间件
Linux运维笔记 - 如何使用WGCLOUD监控交换机的流量
WGCLOUD是一款开源免费的通用主机监控工具,安装使用都非常简单,它可以监控主机、服务器的cpu、内存、磁盘、流量等数据,也可以监控数据库、中间件、网络设备
|
Ubuntu Linux Python
Tkinter错误笔记(一):tkinter.Button在linux下出现乱码
在Linux系统中,使用Tkinter库时可能会遇到中文显示乱码的问题,这通常是由于字体支持问题导致的,可以通过更换支持中文的字体来解决。
684 0
Tkinter错误笔记(一):tkinter.Button在linux下出现乱码