【数据科学导论】实验六：字符串与字典-阿里云开发者社区

【数据科学导论】实验六：字符串与字典

2023-10-12 42

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 【数据科学导论】实验六：字符串与字典

字符串与字典

0 下面的字符串的长度是多少？（共25分）

对于下面五个字符串中的每一个，预测传递该字符串时 len() 将返回什么。使用变量 length 记录你的答案，然后运行单元格检查你是否正确。

0a.（5分）

a = ""
length = 0

0b.（5分）

b = "it's ok"
length = 7

0c.（5分）

c = 'it\'s ok'
length = 7

0d.（5分）

d = """hey"""
length = 3

0e.（5分）

e = '\n'
length = 1

1.（25分）

有一种说法是“数据科学家花费 80% 的时间清理数据，而他们的 20% 的时间都在抱怨清理数据”。让我们看看您是否可以编写一个函数来帮助清理中国邮政编码数据。给定一个字符串，它应该返回该字符串是否代表有效的邮政编码。就我们的目的而言，有效的邮政编码是由 6 位数字组成的任何字符串。

提示：str 有一个在这里很有用的方法（str.isdigit）。使用 help(str) 查看字符串方法列表。

def is_valid_zip(zip_code):
    """Returns whether the input string is a valid (6 digit) zip code
    Example:
    SZ_ZIP = "215000"
    >>> is_valid_zip(SZ_ZIP)
    >>> Ture
    """
    return len(zip_code) == 6 and zip_code.isdigit()

测试

is_valid_zip('215000')

True

2.（25分）

一位研究人员收集了数以千计的新闻文章。但她想将注意力集中在包含特定词的文章上。完成以下功能以帮助她过滤文章列表。

您的函数应满足以下条件：

不要包含关键字字符串仅作为较大单词的一部分出现的文档。例如，如果她要查找关键字“closed”，您就不会包含字符串“enclosed”。

她不想让你区分大写和小写字母。所以这句话“Closed the case”。当关键字为“closed”时将包含在内

不要让句号或逗号影响匹配的内容。 “It is closed.” 当关键字为“closed”时将包含在内。但是您可以假设没有其他类型的标点符号。

def word_search(doc_list, keyword):
    """
    Takes a list of documents (each document is a string) and a keyword. 
    Returns list of the index values into the original list for all documents 
    containing the keyword.
    Example:
    doc_list = ["The Learn Python Challenge Casino.", "They bought a car", "Casinoville"]
    >>> word_search(doc_list, 'casino')
    >>> [0]
    """
    # 保存匹配文档索引的列表
    indices = [] 
    # 遍历文档的索引 (i) 和元素 (doc)
    for i, doc in enumerate(doc_list):
        # 将字符串 doc 拆分为单词列表（根据空格）
        tokens = doc.split()
        # 制作一个转换列表，我们在其中“标准化”每个单词以促进匹配。
        # 从每个单词的末尾删除句号和逗号，并将其设置为全部小写。
        normalized = [token.rstrip('.,').lower() for token in tokens]
        # 有匹配的吗？ 如果有，则更新匹配索引列表。
        if keyword.lower() in normalized:
            indices.append(i)
    return indices

测试

doc_list = ["The Learn Python Challenge Casino.", "They bought a car", "Casinoville"]
word_search(doc_list, 'casino')

[0]

3.（25分）

现在研究人员想要提供多个关键字进行搜索。完成下面的功能来帮助她。

（鼓励您在实现此函数时使用您刚刚编写的 word_search 函数。以这种方式重用代码可以使您的程序更加健壮和可读 - 并且可以节省输入！）

def multi_word_search(doc_list, keywords):
    """
    Takes list of documents (each document is a string) and a list of keywords.  
    Returns a dictionary where each key is a keyword, and the value is a list of indices
    (from doc_list) of the documents containing that keyword
    >>> doc_list = ["The Learn Python Challenge Casino.", "They bought a car and a casino", "Casinoville"]
    >>> keywords = ['casino', 'they']
    >>> multi_word_search(doc_list, keywords)
    {'casino': [0, 1], 'they': [1]}
    """
    keyword_to_indices = {}
    for keyword in keywords:
        keyword_to_indices[keyword] = word_search(doc_list, keyword)
    return keyword_to_indices

测试

doc_list = ["The Learn Python Challenge Casino.", "They bought a car and a casino", "Casinoville"]
keywords = ['casino', 'they']
multi_word_search(doc_list, keywords)

{'casino': [0, 1], 'they': [1]}

【数据科学导论】实验六：字符串与字典

字符串与字典

0 下面的字符串的长度是多少？（共25分）

0a.（5分）

0b.（5分）

0c.（5分）

0d.（5分）

0e.（5分）

1.（25分）

2.（25分）

3.（25分）

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

【数据科学导论】实验六：字符串与字典

字符串与字典

0 下面的字符串的长度是多少？（共25分）

0a.（5分）

0b.（5分）

0c.（5分）

0d.（5分）

0e.（5分）

1.（25分）

2.（25分）

3.（25分）

热门文章

最新文章

相关课程

相关电子书

相关实验场景