【超简单】之基于PaddleSpeech语音听写桌面应用

2022-12-30 486

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 【超简单】之基于PaddleSpeech语音听写桌面应用

一、【超简单】之基于PaddleSpeech语音听写桌面应用

此项目前情回顾：【超简单】之基于PaddleSpeech搭建个人语音听写服务 aistudio.baidu.com/aistudio/pr…

此次主要做了以下工作：

通过 QGUI 实现语音听写服务桌面化，方便非程序员使用。

1.撰写qgui界面
2.结果可视化
3.保存听写结果到txt文件

相关代码已打包并上传至项目根目录。
项目地址: aistudio.baidu.com/aistudio/pr…
热烈欢迎大家fork点赞

二、环境搭建

1.PaddleSpeech简介

PaddleSpeech 是基于飞桨 PaddlePaddle 的语音方向的开源模型库，用于语音和音频中的各种关键任务的开发，包含大量基于深度学习前沿和有影响力的模型，一些典型的应用如下：

语音识别
语音翻译
语音合成

2.PaddleSpeech安装

pip install paddlespeech

2.1 相关依赖

gcc >= 4.8.5
paddlepaddle >= 2.3.1
python >= 3.7
linux(推荐), mac, windows

2.2 win安装注意事项

1.win必须安装 Microsoft C++ 生成工具 - Visual Studiovisualstudio.microsoft.com/zh-hans/vis… 工具，原因是安装非纯 Python 包或编译 Cython 或 Pyrex 文件。
2.参考： WindowsCompilers - Python Wikiwiki.python.org/moin/Window…

3.QGUI简介

QGUI - 低于100k的超轻量桌面图形化框架，可通过几行代码、使用模板来快捷制作出属于你的图形化界面

4.QGUI安装

通用方式 python -m pip install qgui
国内推荐 python -m pip install qgui -i https://mirrors.bfsu.edu.cn/pypi/web/simple
运行Demo/安装测试 python -m qgui

三、应用设计

1.界面设计

主要是wav文件读取，保存文件设置

from qgui import CreateQGUI
from qgui.banner_tools import BaseBarTool, GitHub
from qgui.notebook_tools import ChooseFileTextButton, RunButton, ChooseDirTextButton
import listen
import warnings
warnings.filterwarnings('ignore')
def click(args):
    print("你点到我啦~")
    print("输入框文字为：", args["文件选择"].get())
    print("输入框文字为：", args["保存位置"].get())
    listen.run(args["文件选择"].get(), args["保存位置"].get())
# 创建主界面
main_gui = CreateQGUI(title="PaddleSpeech语音自助听写")
# 在主界面部分添加一个文件选择工具吧~
# 选择录音文件
main_gui.add_notebook_tool(ChooseFileTextButton(name="文件选择", filetypes="wav" ))
# 在主界面添加保存位置
# 选择保存目录
main_gui.add_notebook_tool(ChooseDirTextButton(name="保存位置"))
# 要不要再添加一个运行按钮？，绑定刚刚创建的函数吧~
main_gui.add_notebook_tool(RunButton(click))
# 简单加个简介
main_gui.set_navigation_about(author="JavaRoom",
                              version="0.0.1",
                              github_url="https://aistudio.baidu.com/aistudio/personalcenter/thirdview/89263",
                              other_info=["PaddleSpeech语音自助听写"])
# 跑起来~
main_gui.run()

2.听写服务函数化

主要函数如下：

qiefen(path, ty='audio', mmin_dur=1, mmax_dur=100000, mmax_silence=1, menergy_threshold=55) 实现长语音切分
audio2txt(path) 实现语音转文本
txt2csv(txt_all) 文本保存
correct_punctuation(source_path='听写录音.csv') 标点符号转写
run(wav_file_path, target_dir) 根据给定的文本，完成听写并保存

from paddlespeech.cli.asr.infer import ASRExecutor
import csv
import moviepy.editor as mp
import auditok
import os
import paddle
from paddlespeech.cli import ASRExecutor, TextExecutor
import soundfile
import librosa
import warnings
# 引入auditok库
import auditok
import csv
import datetime
import shutil
warnings.filterwarnings('ignore')
# 输入类别为audio
def qiefen(path, ty='audio', mmin_dur=1, mmax_dur=100000, mmax_silence=1, menergy_threshold=55):
    audio_file = path
    audio, audio_sample_rate = soundfile.read(
        audio_file, dtype="int16", always_2d=True)
    audio_regions = auditok.split(
        audio_file,
        min_dur=mmin_dur,  # minimum duration of a valid audio event in seconds
        max_dur=mmax_dur,  # maximum duration of an event
        # maximum duration of tolerated continuous silence within an event
        max_silence=mmax_silence,
        energy_threshold=menergy_threshold  # threshold of detection
    )
    for i, r in enumerate(audio_regions):
        # Regions returned by `split` have 'start' and 'end' metadata fields
        print(
            "Region {i}: {r.meta.start:.3f}s -- {r.meta.end:.3f}s".format(i=i, r=r))
        epath = ''
        file_pre = str(epath.join(audio_file.split('.')[0].split('/')[-1]))
        mk = 'change'
        if (os.path.exists(mk) == False):
            os.mkdir(mk)
        if (os.path.exists(mk + '/' + ty) == False):
            os.mkdir(mk + '/' + ty)
        if (os.path.exists(mk + '/' + ty + '/' + file_pre) == False):
            os.mkdir(mk + '/' + ty + '/' + file_pre)
        num = i
        # 为了取前三位数字排序
        s = '000000' + str(num)
        file_save = mk + '/' + ty + '/' + file_pre + '/' + \
                    s[-3:] + '-' + '{meta.start:.3f}-{meta.end:.3f}' + '.wav'
        filename = r.save(file_save)
        print("region saved as: {}".format(filename))
    return mk + '/' + ty + '/' + file_pre
# 语音转文本
asr_executor = ASRExecutor()
# 听写
def audio2txt(path):
    # 返回path下所有文件构成的一个list列表
    print(f"path: {path}")
    filelist = os.listdir(path)
    # 保证读取按照文件的顺序
    filelist.sort(key=lambda x: int(os.path.splitext(x)[0][:3]))
    # 遍历输出每一个文件的名字和类型
    words = []
    for file in filelist:
        print(path + '/' + file)
        text = asr_executor(
            audio_file=path + '/' + file,
            device=paddle.get_device(), force_yes=True) # force_yes参数需要注意
        words.append(text)
    return words
# 保存
def txt2csv(txt_all):
    with open(f'听写录音.csv', 'w', encoding='utf-8') as f:
        f_csv = csv.writer(f)
        for row in txt_all:
            f_csv.writerow([row])
# 纠正标点符号
def correct_punctuation(source_path='听写录音.csv'):
    # 拿到新生成的音频的路径
    texts = ''
    with open(source_path, 'r') as f:
        text = f.readlines()
    for i in range(len(text)):
        text[i] = text[i].replace('\n', '')
        texts = texts + text[i]
    # print(texts)
    text_executor = TextExecutor()
    if text:
        result = text_executor(
            text=texts,
            task='punc',
            model='ernie_linear_p3_wudao',
            device=paddle.get_device(),
            # force_yes=True
        )
    # print(result)
    return result
def run(wav_file_path, target_dir):
    # 可替换成自身的录音文件
    # wav_file_path = '录音.wav'
    # 划分音频
    path = qiefen(path=wav_file_path, ty='audio',
                    mmin_dur=0.5, mmax_dur=100000, mmax_silence=0.5, menergy_threshold=55)
    # 音频转文本  需要GPU
    txt_all = audio2txt(path)
    # 存入csv
    txt2csv(txt_all)
    # 纠正标点
    now_date=str(datetime.datetime.now())
    source_path = f'听写转录音_{now_date[:10]}_{txt_all[0]}.txt'
    correct_txt=correct_punctuation()
    print(correct_txt)
    print(len(correct_txt))
    with open(source_path, 'w') as f:
        f.writelines(correct_txt)
    # 删除中间过程文件  
    os.remove("听写录音.csv")
    shutil.rmtree('exp')

四、最终形态

最终形态为一个带节目应用程序，通过给定的音频，自动听写，并保存至文本文件。

五、QGUI个性修改

1.按扩展名选择文件

修改 qgui/nobook_tools.py 第136行内容

# 加扩展名判断，只加了wav
        if filetypes == 'wav':
            self.filetypes = [('WAV Files', 'wav')] 
        else:
            self.filetypes = [('All Files', '*')]

调用时加 filetypes 扩展选项

# 选择录音文件
main_gui.add_notebook_tool(ChooseFileTextButton(name="文件选择", filetypes="wav" ))

2.修改界面github地址

修改qgui/base_frame.py/BaseNavigation 第80行内容为aistudio

class BaseNavigation(_Backbone):
    """
    左侧导航栏基本框架
    """
    def __init__(self, style="primary"):
        super(BaseNavigation, self).__init__(f_style=style)
        self.tabs = dict()
    def add_about(self,
                  author: str = "未知作者",
                  version: str = "0.0.1",
                  github_url: str = None,
                  other_info: List[str] = None):
        bus_cf = CollapsingFrame(self.frame)
        bus_cf.pack(fill='x', pady=0)
        bus_frm = ttk.Frame(bus_cf, padding=5)
        bus_frm.columnconfigure(1, weight=1)
        bus_cf.add(bus_frm, title="相关信息", style='secondary.TButton')
        ttk.Label(bus_frm, text=f"作者:\t{author}", style="TLabel", justify="left", wraplength=160).pack(anchor="nw")
        ttk.Label(bus_frm, text=f"版本:\t{version}", style="TLabel", justify="left", wraplength=160).pack(anchor="nw")
        if other_info:
            for line in other_info:
                ttk.Label(bus_frm, text=line, style="TLabel").pack(anchor="nw")
        if github_url:
            def github_callback(event):
                webbrowser.open_new(github_url)
            github_label = ttk.Label(bus_frm, text=f"> 进入Aistudio", style="info.TLabel", justify="left")
            github_label.pack(anchor="nw")
            github_label.bind("<Button-1>", github_callback)

六、后续计划

1.添加音频转码功能
2.添加视频语音提取功能
3.添加离线部署方案
4.进行打包

【超简单】之基于PaddleSpeech语音听写桌面应用

一、【超简单】之基于PaddleSpeech语音听写桌面应用

二、环境搭建

1.PaddleSpeech简介

2.PaddleSpeech安装

2.1 相关依赖

2.2 win安装注意事项

3.QGUI简介

4.QGUI安装

三、应用设计

1.界面设计

2.听写服务函数化

四、最终形态

五、QGUI个性修改

1.按扩展名选择文件

2.修改界面github地址

六、后续计划

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

【超简单】之基于PaddleSpeech语音听写桌面应用

一、【超简单】之基于PaddleSpeech语音听写桌面应用

二、环境搭建

1.PaddleSpeech简介

2.PaddleSpeech安装

2.1 相关依赖

2.2 win安装注意事项

3.QGUI简介

4.QGUI安装

三、应用设计

1.界面设计

2.听写服务函数化

四、最终形态

五、QGUI个性修改

1.按扩展名选择文件

2.修改界面github地址

六、后续计划

热门文章

最新文章

相关课程

相关电子书

相关实验场景