(zhuan) Speech and Natural Language Processing

简介: Speech and Natural Language Processingobtain from this link: https://github.com/edobashira/speech-language-processingA curated list of speech and natural language processing resources.


Speech and Natural Language Processing

obtain from this link: https://github.com/edobashira/speech-language-processing


A curated list of speech and natural language processing resources. Other lists can be found in this list. If you want to contribute to this list (please do), send me a pull request. All Sub-caterogires are listed in alphabetical order

Finite State Toolkits and Regular Expressions

  • AT&T FSM Library The AT&T FSM libraryTM is a set of general-purpose software tools available for Unix, for building, combining, optimizing, and searching weighted finite-state acceptors and transducers.
  • Carmel Finite-state toolkit, EM and Bayesian (Gibbs sampling) training for FST and context-free derivation forests/
  • Categorial semiring Categorial semiring as described in Sproat et al. 2014
  • dk.brics.automaton Java toolkit for FSAs and regular expression.
  • Fare Fare is a finite state and regular expression libary for the .NET framework written in C#. am is a JavaScript library for working with automata and formal grammars for regular and context-free languages
  • Foma Finite-state compiler and C library
  • fsa Toolkit used in RWTH ASR engine
  • fsm2.0 Thomas Hanneforths fsm 2.0 library written C++ has a few nice operations such as three-way composition
  • fstrain A toolkit for training finite-state models
  • jopenfst Java port of the C++ OpenFst library; originally forked from the CMU Sphinx project
  • Kleene programming language High level finite state programming language built on top of OpenFst.
  • MIT FST Toolkit WFST toolkit no maintained anymore but feature a few commands not found in other toolkits
  • MoMs-for-StochasticLanguages Spectral and other training algorithms for WFSAs.
  • n Shortest Path for PDT n Shortest Path for PDT
  • Noam "Noam is a JavaScript library for working with automata and formal grammars for regular and context-free languages". Also has pretty cool examples using viz.js
  • OpenFst OpenFst is a library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs).
  • openfst-utils Nice set of utilities for OpenFst includes implementation of Categorial semirings.openfst-utils.
  • openlat Toolkit for manipulating word lattice built on top of OpenFst. Includes support for reading and writing HTK compatible lattices.
  • PyFst Python interface to OpenFst
  • SFST - Stuttgart Finite State Transducer Tools "SFST is a toolbox for the implementation of morphological analysers and other tools which are based on finite state transducer technology."
  • Treba "Treba is a basic command-line tool for training, decoding, and calculating with weighted (probabilistic) finite state automata (PFSA) and Hidden Markov Models (HMMs)."

Many of the toools in the machine translation section also implement interesting graph and semiring operations.

Language Modelling Toolkits

  • Bayesian Recurrent Neural Network for Language Modeling This is a C/C++ implementation for Bayesian recurrent neural network for language modeling (BRNNLM)
  • Berkeley LM
  • Bigfatlm Provides Hadoop training of Kneser-ney language models, written in Java.
  • CSLM "Continuous Space Language Model toolkit. CSLM toolkit is open-source software which implements the so-called continuous space language model.
  • DALM Double array language model.
  • KenLM Kenneth Heafield's language model toolkit, uses a very fast and low memory representation.
  • lwlm lwlm is an exact, full Bayesian implementation of the Latent Words Language Model (Deschacht and Moens, 2009).
  • Maximum Entropy Modeling Le Zhang has a comprehensive set of links related MaxEnt models.
  • Maximum entropy language models: SRILM extension "This patch adds the functionality to train and apply maximum entropy (MaxEnt) language models to the SRILM toolkit. Currently, only N-gram features are supported"
  • mitlm My personal favourite LM toolkit, super fast and seems to get slightly higher accuracy.
  • MSRLM "This scalable language-model tool is used to build language models from large amounts of data. It supports modified absolute discounting and Kneser-Ney smoothing."
  • OpenGrm Language modelling toolkit for use with OpenFst.
  • cpyp C++ library for modeling with Pitman-Yor processes
  • RandLM Bloom filter based random language models
  • RNNLM Recurrent neural network language model toolkit.
  • Refr Re-ranking framework from the Johns-Hopkins workshop on confusion language modelling.
  • rwthlm A toolkit for training neural network language models (feedforward, recurrent, and long short-term memory neural networks). The software was written by Martin Sundermeyer.
  • SRILM Very popular toolkit, source code avaliable but only non-free for commerical use.

Speech Recognition

  • AaltoASR Aalto Automatic Speech Recognition tools
  • Barista Barista is an open-source framework for concurrent speech processing.
  • Bavieca New open source toolkit featuring static and dynamic decoders.
  • kaldi-nnet-dur-model Neural network phone duration model on top of the Kaldi speech recognition framework, (Interspeech paper)
  • CMU Sphinx Open Source Toolkit For Speech Recognition Project by Carnegie Mellon University
  • HTK "The Hidden Markov Model Toolkit (HTK) is a portable toolkit for building and manipulating hidden Markov models."
  • Juicer Juicer is a Weighted Finite State Transducer (WFST) based decoder for Automatic Speech Recognition (ASR).
  • Julius "Julius is a high-performance, two-pass large vocabulary continuous speech recognition (LVCSR) decoder software for speech-related researchers and developers."
  • Kaldi Modern open source toolkit lead by Dan Povey featuring many state-of-the-art techniques.
  • OpenDcd An Open Source WFST based Speech Recognition Decoder.
  • Phonetisaurus Josef Novak's super fast WFST based Phoneticizer, site also has some really nice tutorials slides.
  • Sail Align SailAlign is an open-source software toolkit for robust long speech-text alignment implementing an adaptive, iterative speech recognition and text alignment scheme that allows for the processing of very long (and possibly noisy) audio and is robust to transcription errors. It is mainly written as a perl library but its functionality also depends…
  • SCARF: A Segmental CRF Toolkit for Speech Recognition "SCARF is a toolkit for doing speech recognition with segmental conditional random fields."
  • trainc David Rybach and Michael Riley's tool for direct construction of context-dependency transducers (Interspeech best paper).
  • RASR RWTH ASR - The RWTH Aachen University Speech Recognition System

Signal Processing


  • HTS HMM-based speech synthesis
  • RusPhonetizer Grammar rules and dictionaries for the phonetic transcription of Russian sentences

Speech Data

  • cmudict CMUdict (the Carnegie Mellon Pronouncing Dictionary) is a free pronouncing dictionary of English.
  • LibriSpeech ASR corpus LibriSpeech is a corpus of approximately 1000 hours of 16kHz read English speech, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned.
  • TED-LIUM Corpus The TED-LIUM corpus was made from audio talks and their transcriptions available on the TED website.

Machine Translation

  • Berkeley Aligner "...a word alignment software package that implements recent innovations in unsupervised word alignment."
  • cdec "Decoder, aligner, and model optimizer for statistical machine translation and other structured prediction models based on (mostly) context-free formalisms"
  • Jane "Jane is RWTH's open source statistical machine translation toolkit. Jane supports state-of-the-art techniques for phrase-based and hierarchical phrase-based machine translation."
  • Joshua Hierarchical and syntax based machine translation decoder written in Java.
  • Moses Standard open source machine translation toolkit.
  • alignment-with-openfst
  • zmert Nice Java Mert implementation by Omar F. Zaidan

Machine Learning

  • BIDData BIDMat is a matrix library intended to support large-scale exploratory data analysis. Its sister library BIDMach implements the machine learning layer.
  • libFM: Factorization Machine Library
  • sofia-ml Fast incremental learning algorithms for classification, regression, ranking from Google.
  • Spearmint Spearmint is a package to perform Bayesian optimization according to the algorithms outlined in the paper: Practical Bayesian Optimization of Machine Learning Algorithms Jasper Snoek, Hugo Larochelle and Ryan P. Adams Advances in Neural Information Processing Systems, 2012

Deep Learning

  • Benchmarks - Comparison of different convolution network implementations.
  • Cafee - Really active deep learning toolkit with support for cuDNN and lots of other backends.
  • cuDNN - Deep neural network from Nvidia with paper hereTorch 7 has support for cuDnn and here are some Python wrappers.
  • CURRENNT - Munich Open-Source CUDA RecurREnt Neural Network Toolkit described in this paper
  • gensim - Python topic modeling toolkit with word2vec implementation. Extremly easy to use and to install.
  • Glove Global vectors for word representation.
  • GroundHog Neural network based machine translation toolkit.
  • KALDI LSTM C++ implementation of LSTM (Long Short Term Memory), in Kaldi's nnet1 framework. Used for automatic speech recognition, possibly language modeling etc.
  • OxLM: Oxford Neural Language Modelling Toolkit Neural network toolkit for machine translation described in the paper here
  • Neural Probabilistic Language Model Toolkit "NPLM is a toolkit for training and using feedforward neural language models (Bengio, 2003). It is fast even for large vocabularies (100k or more): a model can be trained on a billion words of data in about a week, and can be queried in about 40 μs, which is usable inside a decoder for machine translation."
  • RNNLM2WFST Tool to convert RNNLMs to WFSTs
  • ViennaCL <http://viennacl.sourceforge.net/> - ViennaCL is a free open-source linear algebra library for computations on many-core architectures (GPUs, MIC) and multi-core CPUs.

Natural Language Processing

  • BLLIP reranking parser "BLLIP Parser is a statistical natural language parser including a generative constituent parser (first-stage) and discriminative maximum entropy reranker (second-stage)."
  • OpenNLP The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.
  • SEAL Set expander for any language described in this paper
  • Stanford CoreNLP "Stanford CoreNLP provides a set of natural language analysis tools written in Java"


Other Tools

  • GraphViz.sty Really handy tool adding dot languge directly to a LaTex document, useful for tweaking the small colorized WFST figure in papers and presentations.



数据采集 机器学习/深度学习 自然语言处理
Masked Language Modeling,MLM
Masked Language Modeling(MLM)是一种预训练语言模型的方法,通过在输入文本中随机掩盖一些单词或标记,并要求模型预测这些掩盖的单词或标记。MLM 的主要目的是训练模型来学习上下文信息,以便在预测掩盖的单词或标记时提高准确性。
292 1
机器学习/深度学习 编解码 人工智能
Reading Notes: Human-Computer Interaction System: A Survey of Talking-Head Generation
由于人工智能的快速发展,虚拟人被广泛应用于各种行业,包括个人辅助、智能客户服务和在线教育。拟人化的数字人可以快速与人接触,并在人机交互中增强用户体验。因此,我们设计了人机交互系统框架,包括语音识别、文本到语音、对话系统和虚拟人生成。接下来,我们通过虚拟人深度生成框架对Talking-Head Generation视频生成模型进行了分类。同时,我们系统地回顾了过去五年来在有声头部视频生成方面的技术进步和趋势,强调了关键工作并总结了数据集。 对于有关于Talking-Head Generation的方法,这是一篇比较好的综述,我想着整理一下里面比较重要的部分,大概了解近几年对虚拟人工作的一些发展和
自然语言处理 算法 vr&ar
X-GEAR:Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction
75 0
自然语言处理 数据挖掘 数据处理
【提示学习】Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
目前流行的第四大范式Prompt的主流思路是PVP,即Pattern-Verbalizer-Pair,主打的就是Pattern(模板)与Verbalizer(标签映射器)。   本文基于PVP,提出PET与iPET,但是关注点在利用半监督扩充自己的数据集,让最终模型学习很多样本,从而达到好效果。
机器学习/深度学习 存储 人工智能
Event Extraction by Answering (Almost) Natural Questions论文解读
70 0
机器学习/深度学习 自然语言处理 算法
Multimedia Event Extraction From News With a Unified Contrastive Learning Framework论文解读
107 0
机器学习/深度学习 存储 数据采集
DCFEE: A Document-level Chinese Financial Event Extraction System based on Automatically Labeled论文解读
我们提出了一个事件抽取框架,目的是从文档级财经新闻中抽取事件和事件提及。到目前为止,基于监督学习范式的方法在公共数据集中获得了最高的性能(如ACE 2005、KBP 2015)。这些方法严重依赖于人工标注的训练数据。
69 0
71 0
《Towards Language-Universal Mandarin-English Speech Recognition》电子版地址
Towards Language-Universal Mandarin-English Speech Recognition
51 0
《Towards Language-Universal Mandarin-English Speech Recognition》电子版地址
Whole-Genome Expression Microarray Combined with Machine Learning to Identify Prognostic Biomarke...
摘要 本研究的目的是建立一个框架,以更好地了解高级别胶质瘤(HGG)预后相关的生物标志物。进行全基因组基因表达微阵列以鉴定HGG和低级弥漫性神经胶质瘤之间的差异表达基因。
1391 0