Go 语言入门很简单：正则表达式（上）-阿里云开发者社区

Go 语言入门很简单：正则表达式（上）

2022-10-21 239

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 在计算中，我们经常需要将特定模式的字符或字符子集匹配为另一个字符串中的字符串。此技术用于使用特别的语法来搜索给定字符串中的特定字符集。比如邮件、手机号、身份证号等等。如果搜索到的模式匹配，或者在目标字符串中找到给定的子集，则搜索被称为成功；否则被认为是不成功的。那么此时该用到正则表达式了。

网络异常，图片无法展示

前言

在计算中，我们经常需要将特定模式的字符或字符子集匹配为另一个字符串中的字符串。此技术用于使用特别的语法来搜索给定字符串中的特定字符集。比如邮件、手机号、身份证号等等。

如果搜索到的模式匹配，或者在目标字符串中找到给定的子集，则搜索被称为成功；否则被认为是不成功的。

那么此时该用到正则表达式了。

什么是正则表达式

正则表达式（或 RegEx）是一个特殊的字符序列，它定义了用于匹配特定文本的搜索模式。在 Golang 中，有一个内置的正则表达式包: regexp 包，其中包含所有操作列表，如过滤、修改、替换、验证或提取。

正则表达式可以用于文本搜索和更高级的文本操作。正则表达式内置于 grep 和 sed 等工具，vi 和 emacs 等文本编辑器，Go、Java 和 Python 等编程语言中。

表达式的语法主要遵循这些流行语言中使用的已建立的 RE2 语法。 RE2 语法是 PCRE 的一个子集，有各种注意事项。

以下是正则表达式模式表格：

网络异常，图片无法展示

Go 语言的 regexp 包中有几个典型的函数：

MatchString()
Compile()
FindString()
FindAllString()
FindStringIndex()
FindAllStringIndex()
FindStringSubmatch()
Split()
ReplaceAllString
ReplaceAllStringFunc()

现在来一一看看这些函数的使用

MatchString 函数

MatchString() 函数报告作为参数传递的字符串是否包含正则表达式模式的任何匹配项。

 package main
 
 import (
   "fmt"
   "log"
   "regexp"
 )
 
 func main() {
 
   words := [...]string{"Seven", "even", "Maven", "Amen", "eleven"}
 
   for _, word := range words {
 
     found, err := regexp.MatchString(".even", word)
 
     if err != nil {
       log.Fatal(err)
     }
 
     if found {
 
       fmt.Printf("%s matches\n", word)
     } else {
 
       fmt.Printf("%s does not match\n", word)
     }
   }
 }

运行该代码：

 Seven matches
 even does not match
 Maven does not match
 Amen does not match
 eleven matches

但同时我们能看到编辑器有提示：

网络异常，图片无法展示

编译器已经开始提醒我们，MatchString 直接使用性能很差，所以考虑使用 regexp.Compile 函数。

Compile 函数

Compile 函数解析正则表达式，如果成功，则返回可用于匹配文本的 Regexp 对象。编译的正则表达式产生更快的代码。

MustCompile 函数是一个便利函数，它编译正则表达式并在无法解析表达式时发生 panic。

 package main
 
 import (
   "fmt"
   "log"
   "regexp"
 )
 
 func main() {
 
   words := [...]string{"Seven", "even", "Maven", "Amen", "eleven"}
 
   re, err := regexp.Compile(".even")
 
   if err != nil {
     log.Fatal(err)
   }
 
   for _, word := range words {
 
     found := re.MatchString(word)
 
     if found {
 
       fmt.Printf("%s matches\n", word)
     } else {
 
       fmt.Printf("%s does not match\n", word)
     }
   }
 }

在代码示例中，我们使用了编译的正则表达式。

 re, err := regexp.Compile(".even")

即使用 Compile 编译正则表达式。然后在返回的正则表达式对象上调用 MatchString 函数：

 found := re.MatchString(word)

运行程序，能看到同样的代码：

 Seven matches
 even does not match
 Maven does not match
 Amen does not match
 eleven matches

MustCompile 函数

 package main
 
 import (
   "fmt"
   "regexp"
 )
 
 func main() {
 
   words := [...]string{"Seven", "even", "Maven", "Amen", "eleven"}
 
   re := regexp.MustCompile(".even")
 
   for _, word := range words {
 
     found := re.MatchString(word)
 
     if found {
 
       fmt.Printf("%s matches\n", word)
     } else {
 
       fmt.Printf("%s does not match\n", word)
     }
   }
 }

FindAllString 函数

FindAllString 函数返回正则表达式的所有连续匹配的切片。

 package main
 
 import (
     "fmt"
     "os"
     "regexp"
 )
 
 func main() {
 
     var content = `Foxes are omnivorous mammals belonging to several genera 
 of the family Canidae. Foxes have a flattened skull, upright triangular ears, 
 a pointed, slightly upturned snout, and a long bushy tail. Foxes live on every 
 continent except Antarctica. By far the most common and widespread species of 
 fox is the red fox.`
 
     re := regexp.MustCompile("(?i)fox(es)?")
 
     found := re.FindAllString(content, -1)
 
     fmt.Printf("%q\n", found)
 
     if found == nil {
         fmt.Printf("no match found\n")
         os.Exit(1)
     }
 
     for _, word := range found {
         fmt.Printf("%s\n", word)
     }
 
 }

在代码示例中，我们找到了单词 fox 的所有出现，包括它的复数形式。

 re := regexp.MustCompile("(?i)fox(es)?")

使用 (?i) 语法，正则表达式不区分大小写。（es）？表示“es”字符可能包含零次或一次。

 found := re.FindAllString(content, -1)

我们使用 FindAllString 查找所有出现的已定义正则表达式。第二个参数是要查找的最大匹配项； -1 表示搜索所有可能的匹配项。

运行结果：

 ["Foxes" "Foxes" "Foxes" "fox" "fox"]
 Foxes
 Foxes
 Foxes
 fox
 fox

FindAllStringIndex 函数

 package main
 
 import (
     "fmt"
     "regexp"
 )
 
 func main() {
 
     var content = `Foxes are omnivorous mammals belonging to several genera 
 of the family Canidae. Foxes have a flattened skull, upright triangular ears, 
 a pointed, slightly upturned snout, and a long bushy tail. Foxes live on every 
 continent except Antarctica. By far the most common and widespread species of 
 fox is the red fox.`
 
     re := regexp.MustCompile("(?i)fox(es)?")
 
     idx := re.FindAllStringIndex(content, -1)
 
     for _, j := range idx {
         match := content[j[0]:j[1]]
         fmt.Printf("%s at %d:%d\n", match, j[0], j[1])
     }
 }

在代码示例中，我们在文本中找到所有出现的 fox 单词及其索引。

 Foxes at 0:5
 Foxes at 81:86
 Foxes at 196:201
 fox at 296:299
 fox at 311:314

Go 语言入门很简单：正则表达式（上）

前言

什么是正则表达式

MatchString 函数

Compile 函数

MustCompile 函数

FindAllString 函数

FindAllStringIndex 函数

热门文章

最新文章

相关课程

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

Go 语言入门很简单：正则表达式（上）

前言

什么是正则表达式

MatchString 函数

Compile 函数

MustCompile 函数

FindAllString 函数

FindAllStringIndex 函数

热门文章

最新文章

相关课程

相关电子书