matlab 正则表达式

简介:

regexprep

Replace text using regular expression

collapse all in page

Syntax

  • newStr = regexprep(str,expression,replace)
    example
  • newStr = regexprep(str,expression,replace,option1,...optionM)
    example

Description

example

newStr = regexprep(str,expression,replace) replaces the text in str that matches expression with the text described by replace. The regexprep function returns the updated text in newStr.

  • If str is a single piece of text (either a character vector or a string scalar), then newStr is also a single piece of text of the same type.newStr is a single piece of text even when expression or replace is a cell array of character vectors or a string array. Whenexpression is a cell array or a string array, regexprep applies the first expression to str, and then applies each subsequent expression to the preceding result.

  • If str is a cell array or a string array, then newStr is a cell array or string array with the same dimensions as str. For each element ofstr, the regexprep function applies each expression in sequence.

  • If there are no matches to expression, then newStr is equivalent to str.

example

newStr = regexprep(str,expression,replace,option1,...optionM) modifies the search using the specified options. For example, specify 'ignorecase' to perform a case-insensitive match.

Examples

collapse all

Update Text

Replace words that begin with M, end with y, and have at least one character between them.

str = 'My flowers may bloom in May';
expression = 'M(\w+)y';
replace = 'April';

newStr = regexprep(str,expression,replace)
newStr =

My flowers may bloom in April

Include Tokens in Replacement Text

Replace variations of the phrase 'walk up' by capturing the letters that follow 'walk' in a token.

str = 'I walk up, they walked up, we are walking up.';
expression = 'walk(\w*) up';
replace = 'ascend$1';

newStr = regexprep(str,expression,replace)
newStr =

I ascend, they ascended, we are ascending.

Include Dynamic Expression in Replacement Text

Replace lowercase letters at the beginning of sentences with their uppercase equivalents using the upperfunction.

str = 'here are two sentences. neither is capitalized.';
expression = '(^|\.)\s*.';
replace = '${upper($0)}';

newStr = regexprep(str,expression,replace)
newStr =

Here are two sentences. Neither is capitalized.

The regular expression matches single characters (.) that follow the beginning of the character vector (^) or a period (\.) and any whitespace (\s*). The replace expression calls the upper function for the currently matching character ($0).

Update Multiple Pieces of Text

Replace each occurrence of a double letter in a set of character vectors with the symbols '--'.

str = {                                 ...
'Whose woods these are I think I know.' ; ...
'His house is in the village though;'   ; ...
'He will not see me stopping here'      ; ...
'To watch his woods fill up with snow.'};

expression = '(.)\1';
replace = '--';
newStr = regexprep(str,expression,replace)
newStr =

  4×1 cell array

    'Whose w--ds these are I think I know.'
    'His house is in the vi--age though;'
    'He wi-- not s-- me sto--ing here'
    'To watch his w--ds fi-- up with snow.'

Preserve Case in Original Text

Ignore letter case in the regular expression when finding matches, but mimic the letter case of the original text when updating.

str = 'My flowers may bloom in May';
expression = 'M(\w+)y';
replace = 'April';

newStr = regexprep(str,expression,replace,'preservecase')
newStr =

My flowers april bloom in April

Replace Zero-Length Matches

Insert text at the beginning of a character vector using the '^' operator, which returns a zero-length match, and the 'emptymatch' keyword.

str = 'abc';
expression = '^';
replace = '__';

newStr = regexprep(str,expression,replace,'emptymatch')
newStr =

__abc

Input Arguments

collapse all

str — Text to update
character vector | cell array of character vectors | string array

Text to update, specified as a character vector, a cell array of character vectors, or a string array.

Data Types: char | cell | string

expression — Regular expression
character vector | cell array of character vectors | string array

Regular expression, specified as a character vector, a cell array of character vectors, or a string array. Each expression can contain characters, metacharacters, operators, tokens, and flags that specify patterns to match in str.

The following tables describe the elements of regular expressions.

 

Metacharacters

Metacharacters represent letters, letter ranges, digits, and space characters. Use them to construct a generalized pattern of characters.

Metacharacter

Description

Example

.

Any single character, including white space

'..ain' matches sequences of five consecutive characters that end with 'ain'.

[c1c2c3]

Any character contained within the brackets. The following characters are treated literally: $ | . * + ? and - when not used to indicate a range.

'[rp.]ain' matches 'rain' or 'pain' or ‘.ain'.

[^c1c2c3]

Any character not contained within the brackets. The following characters are treated literally: $ | . * + ? and- when not used to indicate a range.

'[^*rp]ain' matches all four-letter sequences that end in 'ain', except 'rain' and 'pain' and ‘*ain'. For example, it matches'gain', 'lain', or 'vain'.

[c1-c2]

Any character in the range of c1 through c2

'[A-G]' matches a single character in the range of A through G.

\w

Any alphabetic, numeric, or underscore character. For English character sets, \w is equivalent to [a-zA-Z_0-9]

'\w*' identifies a word.

\W

Any character that is not alphabetic, numeric, or underscore. For English character sets, \W is equivalent to[^a-zA-Z_0-9]

'\W*' identifies a term that is not a word.

\s

Any white-space character; equivalent to [ \f\n\r\t\v]

'\w*n\s' matches words that end with the letter n, followed by a white-space character.

\S

Any non-white-space character; equivalent to [^ \f\n\r\t\v]

'\d\S' matches a numeric digit followed by any non-white-space character.

\d

Any numeric digit; equivalent to [0-9]

'\d*' matches any number of consecutive digits.

\D

Any nondigit character; equivalent to [^0-9]

'\w*\D\>' matches words that do not end with a numeric digit.

\oN or \o{N}

Character of octal value N

'\o{40}' matches the space character, defined by octal 40.

\xN or \x{N}

Character of hexadecimal value N

'\x2C' matches the comma character, defined by hex 2C.

 

Character Representation

Operator

Description

\a

Alarm (beep)

\b

Backspace

\f

Form feed

\n

New line

\r

Carriage return

\t

Horizontal tab

\v

Vertical tab

\char

Any character with special meaning in regular expressions that you want to match literally (for example, use \\ to match a single backslash)

Quantifiers

Quantifiers specify the number of times a pattern must occur in the matching text.

Quantifier

Matches the expression when it occurs...

Example

expr*

0 or more times consecutively.

'\w*' matches a word of any length.

expr?

0 times or 1 time.

'\w*(\.m)?' matches words that optionally end with the extension.m.

expr+

1 or more times consecutively.

'<img src="\w+\.gif">' matches an <img> HTML tag when the file name contains one or more characters.

expr{m,n}

At least m times, but no more than n times consecutively.

{0,1} is equivalent to ?.

'\S{4,8}' matches between four and eight non-white-space characters.

expr{m,}

At least m times consecutively.

{0,} and {1,} are equivalent to * and +, respectively.

'<a href="\w{1,}\.html">' matches an <a> HTML tag when the file name contains one or more characters.

expr{n}

Exactly n times consecutively.

Equivalent to {n,n}.

'\d{4}' matches four consecutive digits.

Quantifiers can appear in three modes, described in the following table. q represents any of the quantifiers in the previous table.

Mode

Description

Example

exprq

Greedy expression: match as many characters as possible.

Given the text '<tr><td><p>text</p></td>', the expression'</?t.*>' matches all characters between <tr and /td>:

'<tr><td><p>text</p></td>'

exprq?

Lazy expression: match as few characters as necessary.

Given the text'<tr><td><p>text</p></td>', the expression'</?t.*?>' ends each match at the first occurrence of the closing bracket (>):

'<tr>'   '<td>'   '</td>'

exprq+

Possessive expression: match as much as possible, but do not rescan any portions of the text.

Given the text'<tr><td><p>text</p></td>', the expression'</?t.*+>' does not return any matches, because the closing bracket is captured using .*, and is not rescanned.

Grouping Operators

Grouping operators allow you to capture tokens, apply one operator to multiple elements, or disable backtracking in a specific group.

Grouping Operator

Description

Example

(expr)

Group elements of the expression and capture tokens.

'Joh?n\s(\w*)' captures a token that contains the last name of any person with the first name John or Jon.

(?:expr)

Group, but do not capture tokens.

'(?:[aeiou][^aeiou]){2}' matches two consecutive patterns of a vowel followed by a nonvowel, such as 'anon'.

Without grouping, '[aeiou][^aeiou]{2}'matches a vowel followed by two nonvowels.

(?>expr)

Group atomically. Do not backtrack within the group to complete the match, and do not capture tokens.

'A(?>.*)Z' does not match 'AtoZ', although 'A(?:.*)Z' does. Using the atomic group, Z is captured using .* and is not rescanned.

(expr1|expr2)

Match expression expr1 or expression expr2.

If there is a match with expr1, then expr2 is ignored.

You can include ?: or ?> after the opening parenthesis to suppress tokens or group atomically.

'(let|tel)\w+' matches words that start with let or tel.

 

Anchors

Anchors in the expression match the beginning or end of the input text or word.

Anchor

Matches the...

Example

^expr

Beginning of the input text.

'^M\w*' matches a word starting with M at the beginning of the text.

expr$

End of the input text.

'\w*m$' matches words ending with m at the end of the text.

\<expr

Beginning of a word.

'\<n\w*' matches any words starting with n.

expr\>

End of a word.

'\w*e\>' matches any words ending with e.

 

Lookaround Assertions

Lookaround assertions look for patterns that immediately precede or follow the intended match, but are not part of the match.

The pointer remains at the current location, and characters that correspond to the test expression are not captured or discarded. Therefore, lookahead assertions can match overlapping character groups.

Lookaround Assertion

Description

Example

expr(?=test)

Look ahead for characters that match test.

'\w*(?=ing)' matches terms that are followed by ing, such as'Fly' and 'fall' in the input text 'Flying, not falling.'

expr(?!test)

Look ahead for characters that do not match test.

'i(?!ng)' matches instances of the letter i that are not followed by ng.

(?<=test)expr

Look behind for characters that match test.

'(?<=re)\w*' matches terms that follow 're', such as 'new','use', and 'cycle' in the input text 'renew, reuse, recycle'

(?<!test)expr

Look behind for characters that do not match test.

'(?<!\d)(\d)(?!\d)' matches single-digit numbers (digits that do not precede or follow other digits).

If you specify a lookahead assertion before an expression, the operation is equivalent to a logical AND.

Operation

Description

Example

(?=test)expr

Match both test and expr.

'(?=[a-z])[^aeiou]' matches consonants.

(?!test)expr

Match expr and do not match test.

'(?![aeiou])[a-z]' matches consonants.

 

Logical and Conditional Operators

Logical and conditional operators allow you to test the state of a given condition, and then use the outcome to determine which pattern, if any, to match next. These operators support logical OR, and if or if/else conditions.

Conditions can be tokens, lookaround operators, or dynamic expressions of the form (?@cmd). Dynamic expressions must return a logical or numeric value.

Conditional Operator

Description

Example

expr1|expr2

Match expression expr1 or expression expr2.

If there is a match with expr1, then expr2 is ignored.

'(let|tel)\w+' matches words that start with let ortel.

(?(cond)expr)

If condition cond is true, then match expr.

'(?(?@ispc)[A-Z]:\\)' matches a drive name, such asC:\, when run on a Windows® system.

(?(cond)expr1|expr2)

If condition cond is true, then match expr1. Otherwise, match expr2.

'Mr(s?)\..*?(?(1)her|his) \w*' matches text that includes her when the text begins with Mrs, or that includes his when the text begins with Mr.

Token Operators

Tokens are portions of the matched text that you define by enclosing part of the regular expression in parentheses. You can refer to a token by its sequence in the text (an ordinal token), or assign names to tokens for easier code maintenance and readable output.

Ordinal Token Operator

Description

Example

(expr)

Capture in a token the characters that match the enclosed expression.

'Joh?n\s(\w*)' captures a token that contains the last name of any person with the first name John or Jon.

\N

Match the Nth token.

'<(\w+).*>.*</\1>' captures tokens for HTML tags, such as 'title' from the text '<title>Some text</title>'.

(?(N)expr1|expr2)

If the Nth token is found, then match expr1. Otherwise, match expr2.

'Mr(s?)\..*?(?(1)her|his) \w*' matches text that includes her when the text begins with Mrs, or that includes his when the text begins with Mr.

Named Token Operator

Description

Example

(?<name>expr)

Capture in a named token the characters that match the enclosed expression.

'(?<month>\d+)-(?<day>\d+)-(?<yr>\d+)' creates named tokens for the month, day, and year in an input date of the form mm-dd-yy.

\k<name>

Match the token referred to by name.

'<(?<tag>\w+).*>.*</\k<tag>>' captures tokens for HTML tags, such as 'title' from the text '<title>Some text</title>'.

(?(name)expr1|expr2)

If the named token is found, then match expr1. Otherwise, match expr2.

'Mr(?<sex>s?)\..*?(?(sex)her|his) \w*' matches text that includes her when the text begins with Mrs, or that includes his when the text begins with Mr.

Note:   If an expression has nested parentheses, MATLAB® captures tokens that correspond to the outermost set of parentheses. For example, given the search pattern '(and(y|rew))', MATLAB creates a token for 'andrew' but not for 'y' or 'rew'.

 

Dynamic Regular Expressions

Dynamic expressions allow you to execute a MATLAB command or a regular expression to determine the text to match.

The parentheses that enclose dynamic expressions do not create a capturing group.

Operator

Description

Example

(??expr)

Parse expr and include the resulting term in the match expression.

When parsed, expr must correspond to a complete, valid regular expression. Dynamic expressions that use the backslash escape character (\) require two backslashes: one for the initial parsing of expr, and one for the complete match.

'^(\d+)((??\\w{$1}))' determines how many characters to match by reading a digit at the beginning of the match. The dynamic expression is enclosed in a second set of parentheses so that the resulting match is captured in a token. For instance, matching '5XXXXX' captures tokens for'5' and 'XXXXX'.

(??@cmd)

Execute the MATLAB command represented by cmd, and include the output returned by the command in the match expression.

'(.{2,}).?(??@fliplr($1))' finds palindromes that are at least four characters long, such as 'abba'.

(?@cmd)

Execute the MATLAB command represented by cmd, but discard any output the command returns. (Helpful for diagnosing regular expressions.)

'\w*?(\w)(?@disp($1))\1\w*' matches words that include double letters (such as pp), and displays intermediate results.

Within dynamic expressions, use the following operators to define replacement text.

Replacement Operator

Description

$& or $0

Portion of the input text that is currently a match

$`

Portion of the input text that precedes the current match

$'

Portion of the input text that follows the current match (use $'' to represent $')

$N

Nth token

$<name>

Named token

${cmd}

Output returned when MATLAB executes the command, cmd

Comments

Characters

Description

Example

(?#comment)

Insert a comment in the regular expression. The comment text is ignored when matching the input.

'(?# Initial digit)\<\d\w+' includes a comment, and matches words that begin with a number.

 

Search Flags

Search flags modify the behavior for matching expressions. An alternative to using a search flag within an expression is to pass an option input argument.

Flag

Description

(?-i)

Match letter case (default for regexp and regexprep).

(?i)

Do not match letter case (default for regexpi).

(?s)

Match dot (.) in the pattern with any character (default).

(?-s)

Match dot in the pattern with any character that is not a newline character.

(?-m)

Match the ^ and $ metacharacters at the beginning and end of text (default).

(?m)

Match the ^ and $ metacharacters at the beginning and end of a line.

(?-x)

Include space characters and comments when matching (default).

(?x)

Ignore space characters and comments when matching. Use '\ ' and '\#' to match space and # characters.

The expression that the flag modifies can appear either after the parentheses, such as

(?i)\w*

or inside the parentheses and separated from the flag with a colon (:), such as

(?i:\w*)

The latter syntax allows you to change the behavior for part of a larger expression.

Data Types: char | cell | string

replace — Replacement text
character vector | cell array of character vectors | string array

Replacement text, specified as a character vector, a cell array of character vectors, or a string array, as follows:

  • If replace is a single character vector and expression is a cell array of character vectors, then regexprep uses the same replacement text for each expression.

  • If replace is a cell array of N character vectors and expression is a single character vector, then regexprep attempts N matches and replacements.

  • If both replace and expression are cell arrays of character vectors, then they must contain the same number of elements. regexprep pairs eachreplace element with its matching element in expression.

The replacement text can include regular characters, special characters (such as tabs or new lines), or replacement operators, as shown in the following tables.

Replacement Operator

Description

$& or $0

Portion of the input text that is currently a match

$`

Portion of the input text that precedes the current match

$'

Portion of the input text that follows the current match (use $'' to represent $')

$N

Nth token

$<name>

Named token

${cmd}

Output returned when MATLAB executes the command, cmd

Operator

Description

\a

Alarm (beep)

\b

Backspace

\f

Form feed

\n

New line

\r

Carriage return

\t

Horizontal tab

\v

Vertical tab

\char

Any character with special meaning in regular expressions that you want to match literally (for example, use \\ to match a single backslash)

Data Types: char | cell | string

option — Search or replacement option
'once' | N | 'warnings' | 'ignorecase' | 'preservecase' | 'emptymatch' | 'dotexceptnewline' | 'lineanchors' | ...

Search or replacement option, specified as a character vector or an integer value, as shown in the following table.

Options come in sets: one option that corresponds to the default behavior, and one or two options that allow you to override the default. Specify only one option from a set. Options can appear in any order.

Default

Override

Description

'all'

'once'

Match and replace the expression as many times as possible (default), or only once.

N

Replace only the Nth occurrence of the match, where N is an integer value.

'nowarnings'

'warnings'

Suppress warnings (default), or display them.

'matchcase'

'ignorecase'

Match letter case (default), or ignore case while matching and replacing.

'preservecase'

Ignore case while matching, but preserve the case of corresponding characters in the original text while replacing.

'noemptymatch'

'emptymatch'

Ignore zero length matches (default), or include them.

'dotall'

'dotexceptnewline'

Match dot with any character (default), or all except newline (\n).

'stringanchors'

'lineanchors'

Apply ^ and $ metacharacters to the beginning and end of a character vector (default), or to the beginning and end of a line.

'literalspacing'

'freespacing'

Include space characters and comments when matching (default), or ignore them. With freespacing, use '\ ' and '\#' to match space and # characters.

Data Types: char | string

 

Output Arguments

collapse all

newStr — Updated text
character vector | cell array of character vectors | string array

Updated text, returned as a character vector, a cell array of character vectors, or a string array. The data type of newStr is the same as the data type of str.

More About

collapse all

Tall Array Support

This function fully supports tall arrays. For more information, see Tall Arrays.

See Also

contains | regexp | replace | strcmp | strfind | strrep

没有整理与归纳的知识,一文不值!高度概括与梳理的知识,才是自己真正的知识与技能。 永远不要让自己的自由、好奇、充满创造力的想法被现实的框架所束缚,让创造力自由成长吧! 多花时间,关心他(她)人,正如别人所关心你的。理想的腾飞与实现,没有别人的支持与帮助,是万万不能的。





    本文转自wenglabs博客园博客,原文链接:http://www.cnblogs.com/arxive/p/6298470.html ,如需转载请自行联系原作者

相关文章
|
4天前
|
搜索推荐 编译器 Linux
一个可用于企业开发及通用跨平台的Makefile文件
一款适用于企业级开发的通用跨平台Makefile,支持C/C++混合编译、多目标输出(可执行文件、静态/动态库)、Release/Debug版本管理。配置简洁,仅需修改带`MF_CONFIGURE_`前缀的变量,支持脚本化配置与子Makefile管理,具备完善日志、错误提示和跨平台兼容性,附详细文档与示例,便于学习与集成。
296 116
|
19天前
|
域名解析 人工智能
【实操攻略】手把手教学,免费领取.CN域名
即日起至2025年12月31日,购买万小智AI建站或云·企业官网,每单可免费领1个.CN域名首年!跟我了解领取攻略吧~
|
7天前
|
数据采集 人工智能 自然语言处理
Meta SAM3开源:让图像分割,听懂你的话
Meta发布并开源SAM 3,首个支持文本或视觉提示的统一图像视频分割模型,可精准分割“红色条纹伞”等开放词汇概念,覆盖400万独特概念,性能达人类水平75%–80%,推动视觉分割新突破。
473 44
Meta SAM3开源:让图像分割,听懂你的话
|
14天前
|
安全 Java Android开发
深度解析 Android 崩溃捕获原理及从崩溃到归因的闭环实践
崩溃堆栈全是 a.b.c?Native 错误查不到行号?本文详解 Android 崩溃采集全链路原理,教你如何把“天书”变“说明书”。RUM SDK 已支持一键接入。
688 222
|
2天前
|
Windows
dll错误修复 ,可指定下载dll,regsvr32等
dll错误修复 ,可指定下载dll,regsvr32等
134 95
|
12天前
|
人工智能 移动开发 自然语言处理
2025最新HTML静态网页制作工具推荐:10款免费在线生成器小白也能5分钟上手
晓猛团队精选2025年10款真正免费、无需编程的在线HTML建站工具,涵盖AI生成、拖拽编辑、设计稿转代码等多种类型,均支持浏览器直接使用、快速出图与文件导出,特别适合零基础用户快速搭建个人网站、落地页或企业官网。
1692 158
|
存储 人工智能 监控
从代码生成到自主决策:打造一个Coding驱动的“自我编程”Agent
本文介绍了一种基于LLM的“自我编程”Agent系统,通过代码驱动实现复杂逻辑。该Agent以Python为执行引擎,结合Py4j实现Java与Python交互,支持多工具调用、记忆分层与上下文工程,具备感知、认知、表达、自我评估等能力模块,目标是打造可进化的“1.5线”智能助手。
939 62