python处理word文档，如何提取文档中的题目与答案-阿里云开发者社区

python处理word文档，如何提取文档中的题目与答案

2024-01-17 204

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： python处理word文档，如何提取文档中的题目与答案

python处理word文档，如何提取文档中的题目与答案

需求分析

文档格式和题目格式如下，就是需要写出一个对象，然后可以提取出这个文档里面，题目，答案，组号，然后封装成一个对象。

在古代中国社会中，私学是与官学相对而存在的，并且在中国教育上占有重要的地位。我国历史上_______私学规模最大、影响最深远。
a、孔子
b、孟子
c、老子
d、墨子
答案：A 
组号：1

具体代码

class Question:
    def __init__(self, id, text, options, answer):
        self.id = id
        self.text = text
        self.options = options
        self.answer = answer
    def __str__(self):
        return f'{self.id}. {self.text} 选项: {self.options} 答案: {self.answer}'
def extract_question_option(text):
    questions = []
    options = []
    answers = []
    lines = text.split('\n')
    # 预处理：去掉空行并将多行题目合并为一行
    for i in range(len(lines)):
        line = lines[i].strip()
        if not line:
            continue
        if i < len(lines) - 1 and lines[i + 1].startswith(('答案：', '组号：')):
            # 当前行不是题目的最后一行，将其添加到前一个题目中
            if len(questions) > 0:
                questions[-1] += line
        else:
            # 当前行是题目的最后一行，将其添加为新的题目
            questions.append(line)
    # 处理选项和答案
    current_question_idx = -1
    for line in lines:
        line = line.strip()
        if not line:
            continue
        if line.startswith('组号：'):
            # 处理组号
            group_id = line.split('：')[1].strip()
            current_question_idx = 0
        elif line.startswith('答案：'):
            # 处理答案
            answer = line.split('：')[1].strip()
            if current_question_idx >= 0 and current_question_idx < len(questions):
                # 创建新的 Question 对象并添加到列表中
                question = Question(current_question_idx + 1, questions[current_question_idx], options[current_question_idx], answer)
                questions[current_question_idx] = question
            current_question_idx += 1
        else:
            # 处理选项
            options_list = line.split('、')
            options = [opt.strip() for opt in options_list]
            if len(options) > 0:
                # 处理非空选项
                if current_question_idx >= 0 and current_question_idx < len(questions):
                    options.append(options)
            else:
                # 处理空选项（没有填写选项）
                if current_question_idx >= 0 and current_question_idx < len(questions):
                    options.append([])
    return questions
if __name__ == '__main__':
    # 读取 Word 文档并解析题目
    file_path = "D:\\系统默认\\桌面\\测试题目-三组.docx"
    text = read_word_document(file_path)
    questions = extract_question_option(text)
    # 遍历题目并打印结果
    for question in questions:
        print(question)

运行结果：

python处理word文档，如何提取文档中的题目与答案

python处理word文档，如何提取文档中的题目与答案

需求分析

具体代码

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

python处理word文档，如何提取文档中的题目与答案

python处理word文档，如何提取文档中的题目与答案

需求分析

具体代码

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像