Python标准库-string模块《未完待续》

简介:

>>> import string

>>> s='hello rollen , how are you '
>>> string.capwords(s)
'Hello Rollen , How Are You'          #每个单词的首字母大写
>>> string.split(s)
['hello', 'rollen', ',', 'how', 'are', 'you']     #划分为列表 默认是以空格划分
>>> s='1+2+3'
>>> string.split(s,'+')                 #以‘+’号进行划分
['1', '2', '3']

maketrans()方法会创建一个能够被translate()使用的翻译表,可以用来改变一些列的字符,这个方法比调用replace()更加的高效。

例如下面的例子将字符串s中的‘a,改为1,‘b’改为2,‘c’改为3’:

>>> leet=string.maketrans('abc','123')
>>> s='abcdef'
>>> s.translate(leet)
'123def'
>>> leet=string.maketrans('abc','123')
>>> s='aAaBcC'
>>> s.translate(leet)
'1A1B3C'

string中的template的小例子:

import string
values = { 'var':'foo' }
t = string.Template("""
Variable : $var
Escape : $$
Variable in text: ${var}iable
""")
print 'TEMPLATE:', t.substitute(values)
s = """
Variable : %(var)s
Escape : %%
Variable in text: %(var)siable
"""
print 'INTERPOLATION:', s % values

上面的例子的输出为:

TEMPLATE: 
Variable : foo 
Escape : $ 
Variable in text: fooiable

INTERPOLATION: 
Variable : foo 
Escape : % 
Variable in text: fooiable

但是上面的substitute如果提供的参数不足的时候,会出现异常,我们可以使用更加安全的办法,如下:

import string
values = { 'var':'foo' }
t = string.Template("$var is here but $missing is not provided")
try:
    print 'substitute() :', t.substitute(values)
except KeyError, err:
    print 'ERROR:', str(err)
print 'safe_substitute():', t.safe_substitute(values)

上面例子的输出为:

substitute() : ERROR: 'missing' 
safe_substitute(): foo is here but $missing is not provided

 

下面来看一些template的高级用法:

import string
template_text = '''
Delimiter : %%
Replaced : %with_underscore
Ignored : %notunderscored
'''
d = { 'with_underscore':'replaced',
'notunderscored':'not replaced',
}
class MyTemplate(string.Template):
    delimiter = '%'
    idpattern = '[a-z]+_[a-z]+'
t = MyTemplate(template_text)
print 'Modified ID pattern:'
print t.safe_substitute(d)

输出为:

Modified ID pattern:

Delimiter : % 
Replaced : replaced 
Ignored : %notunderscored

在这个例子中,我们通过自定义属性delimiter 和 idpattern自定了规则,我们使用%替代了美元符号$,而且我们定义的替换规则是被替换的变量名要包含下环线,所以在上面的例子中,只替换了一个。

 

import textwrap

sample_text = '''
The textwrap module can be used to format text for output in
situations where pretty-printing is desired. It offers
programmatic functionality similar to the paragraph wrapping
or filling features found in many text editors.
'''

print 'No dedent:\n'
print textwrap.fill(sample_text, width=50)



输出为:

No dedent:

The textwrap module can be used to format text 
for output in situations where pretty-printing is 
desired. It offers programmatic functionality 
similar to the paragraph wrapping or filling 
features found in many text editors.

上面的例子设置宽度为50,下面的例子我们来移除缩进

import textwrap

sample_text = '''
The textwrap module can be used to format text for output in
situations where pretty-printing is desired. It offers
programmatic functionality similar to the paragraph wrapping
or filling features found in many text editors.
'''

dedented_text = textwrap.dedent(sample_text)
print 'Dedented:'
print dedented_text




Dedented:

The textwrap module can be used to format text for output in 
situations where pretty-printing is desired. It offers 
programmatic functionality similar to the paragraph wrapping 
or filling features found in many text editors.

Hit any key to close this window...

下面来一个对比:

import textwrap

sample_text = '''
The textwrap module can be used to format text for output in
situations where pretty-printing is desired. It offers
programmatic functionality similar to the paragraph wrapping
or filling features found in many text editors.
'''

dedented_text = textwrap.dedent(sample_text).strip()
for width in [ 45, 70 ]:
    print '%d Columns:\n' % width
    print textwrap.fill(dedented_text, width=width)
    print





上面的例子的输出如下:

45 Columns:

The textwrap module can be used to format
text for output in situations where pretty-
printing is desired. It offers programmatic
functionality similar to the paragraph
wrapping or filling features found in many
text editors.

70 Columns:

The textwrap module can be used to format text for output in
situations where pretty-printing is desired. It offers programmatic
functionality similar to the paragraph wrapping or filling features
found in many text editors.

Hit any key to close this window...

我们也可以设置首行和剩余的行:

import textwrap

sample_text = '''
The textwrap module can be used to format text for output in
situations where pretty-printing is desired. It offers
programmatic functionality similar to the paragraph wrapping
or filling features found in many text editors.
'''

dedented_text = textwrap.dedent(sample_text).strip()
print textwrap.fill(dedented_text,
initial_indent=' ',
subsequent_indent=' ' * 4,
width=50,
)

输出为:

The textwrap module can be used to format text 
    for output in situations where pretty-printing 
    is desired. It offers programmatic 
    functionality similar to the paragraph 
    wrapping or filling features found in many 
    text editors.

上面的例子设置首行缩进1个空格,其余行缩进4个空格

 

在文本中查找:

import re
pattern = 'this'
text = 'Does this text match the pattern?'
match = re.search(pattern, text)
s = match.start()
e = match.end()
print 'Found "%s"\nin "%s"\nfrom %d to %d ("%s")' % \
(match.re.pattern, match.string, s, e, text[s:e])

start和end返回匹配的位置

输出如下:

Found "this" 
in "Does this text match the pattern?" 
from 5 to 9 ("this")

 

re includes module-level functions for working with regular expressions as text strings, 
but it is more efficient to compile the expressions a program uses frequently. The compile() function converts an expression string into a RegexObject.

import re
# Precompile the patterns
regexes = [ re.compile(p) for p in [ 'this', 'that' ]]
text = 'Does this text match the pattern?'
print 'Text: %r\n' % text
for regex in regexes:
    print 'Seeking "%s" ->' % regex.pattern,
    if regex.search(text):
        print 'match!'
    else:
       print 'no match'

Text: 'Does this text match the pattern?'

Seeking "this" -> match! 
Seeking "that" -> no match 

The module-level functions maintain a cache of compiled expressions. However, 
the size of the cache is limited, and using compiled expressions directly avoids the 
cache lookup overhead. Another advantage of using compiled expressions is that by 
precompiling all expressions when the module is loaded, the compilation work is shifted 
to application start time, instead of to a point when the program may be responding to 
a user action.

 

So far, the example patterns have all used search() to look for single instances of 
literal text strings. The findall() function returns all substrings of the input that 
match the pattern without overlapping.

import re
text = 'abbaaabbbbaaaaa'
pattern = 'ab'
for match in re.findall(pattern, text):
    print 'Found "%s"' % match

Found "ab" 
Found "ab" 

finditer() returns an iterator that produces Match instances instead of the 
strings returned by findall().

import re
text = 'abbaaabbbbaaaaa'
pattern = 'ab'
for match in re.finditer(pattern, text):
    s = match.start()
    e = match.end()
    print 'Found "%s" at %d:%d' % (text[s:e], s, e)

Found "ab" at 0:2 
Found "ab" at 5:7


==============================================================================
本文转自被遗忘的博客园博客,原文链接:http://www.cnblogs.com/rollenholt/archive/2012/06/05/2537348.html,如需转载请自行联系原作者

相关文章
|
2天前
|
算法 Python
请解释Python中的关联规则挖掘以及如何使用Sklearn库实现它。
使用Python的mlxtend库,可以通过Apriori算法进行关联规则挖掘。首先导入TransactionEncoder和apriori等模块,然后准备数据集(如购买行为列表)。对数据集编码并转换后,应用Apriori算法找到频繁项集(设置最小支持度)。最后,生成关联规则并计算置信度(设定最小置信度阈值)。通过调整这些参数可以优化结果。
25 9
|
2天前
|
索引 Python
如何在Python中使用Pandas库进行季节性调整?
在Python中使用Pandas和Statsmodels进行季节性调整的步骤包括:导入pandas和seasonal_decompose模块,准备时间序列DataFrame,调用`seasonal_decompose()`函数分解数据为趋势、季节性和残差,可选地绘制图表分析,以及根据需求去除季节性影响(如将原始数据减去季节性成分)。这是对时间序列数据进行季节性分析的基础流程。
19 2
|
1天前
|
数据挖掘 数据处理 索引
如何使用Python的Pandas库进行数据筛选和过滤?
Pandas是Python数据分析的核心库,提供DataFrame数据结构。基本步骤包括导入库、创建DataFrame及进行数据筛选。示例代码展示了如何通过布尔索引、`query()`和`loc[]`方法筛选`Age`大于19的记录。
5 0
|
2天前
|
数据处理 Python
如何使用Python的Pandas库进行数据排序和排名
【4月更文挑战第22天】Pandas Python库提供数据排序和排名功能。使用`sort_values()`按列进行升序或降序排序,如`df.sort_values(by='A', ascending=False)`。`rank()`函数用于计算排名,如`df['A'].rank(ascending=False)`。多列操作可传入列名列表,如`df.sort_values(by=['A', 'B'], ascending=[True, False])`和分别对'A'、'B'列排名。
13 2
|
2天前
|
Python
如何使用Python的Pandas库进行数据缺失值处理?
Pandas在Python中提供多种处理缺失值的方法:1) 使用`isnull()`检查;2) `dropna()`删除含缺失值的行或列;3) `fillna()`用常数、前后值填充;4) `interpolate()`进行插值填充。根据需求选择合适的方法处理数据缺失。
29 9
|
4天前
|
数据挖掘 API 数据安全/隐私保护
python请求模块requests如何添加代理ip
python请求模块requests如何添加代理ip
|
4天前
|
缓存 自然语言处理 数据处理
Python自然语言处理面试:NLTK、SpaCy与Hugging Face库详解
【4月更文挑战第16天】本文介绍了Python NLP面试中NLTK、SpaCy和Hugging Face库的常见问题和易错点。通过示例代码展示了如何进行分词、词性标注、命名实体识别、相似度计算、依存关系分析、文本分类及预训练模型调用等任务。重点强调了理解库功能、预处理、模型选择、性能优化和模型解释性的重要性,帮助面试者提升NLP技术展示。
22 5
|
5天前
|
Python
如何使用Python的Plotly库创建交互式图表?
Plotly是Python的交互式图表库,支持多种图表类型,如折线图、散点图、柱状图。使用步骤包括安装库、导入模块、准备数据、创建图表对象、添加数据和设置属性,最后显示或保存图表。
16 6
|
5天前
|
机器学习/深度学习 数据采集 算法
请解释Python中的Sklearn库以及它的主要用途。
Sklearn是Python的机器学习库,提供数据预处理、特征选择、分类回归、聚类、模型评估和参数调优等工具。包含监督和无监督学习算法,如SVM、决策树、K-means等,并提供样例数据集便于实践。它是进行机器学习项目的重要资源。
13 1
|
5天前
|
XML 数据采集 自然语言处理
请解释Python中的BeautifulSoup库以及它的主要用途。
BeautifulSoup是Python的HTML/XML解析库,用于数据提取和网页抓取。它提供树形结构解析文档,支持查找、访问和修改元素。主要用途包括网页抓取、数据清洗、自动化测试、内容生成、网站开发及与其他库集成,如Requests和Scrapy。适用于各种数据处理场景。
9 1