Deep Learning for NLP resources

本文涉及的产品
NLP 自学习平台,3个模型定制额度 1个月
NLP自然语言处理_高级版,每接口累计50万次
NLP自然语言处理_基础版,每接口每天50万次
简介:

State of the art resources for NLP sequence modeling tasks such as machine translation, image captioning, and dialog.

My notes on neural networks, rnn, lstm

Deep Learning for NLP

Stanford Natural Language Processing
Intro NLP course with videos. This has no deep learning. But it is a good primer for traditional nlp. Covers topics such assentence segmentation, word tokenizing, word normalization, n-grams, named entity recognition, part of speech tagging.Currently not available

Stanford CS 224D: Deep Learning for NLP class
Richard Socher. (2016) Class with syllabus, and slides.
Videos: [2015 lectures] (https://www.youtube.com/channel/UCsGC3XXF1ThHwtDo18d7WVw/videos) / [2016 lectures] (https://www.youtube.com/playlist?list=PLcGUo322oqu9n4i0X3cRJgKyVy7OkDdoi)

A Primer on Neural Network Models for Natural Language Processing
Yoav Goldberg. October 2015. No new info, 75 page summary of state of the art.

Oxford Deep Learning for NLP class
Phil Blunsom. (2017) Class by Deep Mind NLP Group.
Lecture slides, videos, and practicals: Github Repository
Currently ongoing

Word Vectors

Resources about word vectors, aka word embeddings, and distributed representations for words.
Word vectors are numeric representations of words where similar words have similar vectors. Word vectors are often used as input to deep learning systems. This process is sometimes called pretraining.

A neural probabilistic language model.
Bengio 2003. Seminal paper on word vectors.


Efficient Estimation of Word Representations in Vector Space
Mikolov et al. 2013. Word2Vec generates word vectors in an unsupervised way by attempting to predict words from a corpus. Describes Continuous Bag-of-Words (CBOW) and Continuous Skip-gram models for learning word vectors.
Skip-gram takes center word and predict outside words. Skip-gram is better for large datasets.
CBOW - takes outside words and predict the center word. CBOW is better for smaller datasets.

[Distributed Representations of Words and Phrases and their Compositionality] (http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf)
Mikolov et al. 2013. Learns vectors for phrases such as "New York Times." Includes optimizations for skip-gram: heirachical softmax, and negative sampling. Subsampling frequent words. (i.e. frequent words like "the" are skipped periodically to speed things up and improve vector for less frequently used words)

Linguistic Regularities in Continuous Space Word Representations
Mikolov et al. 2013. Performs well on word similarity and analogy task. Expands on famous example: King – Man + Woman = Queen
Word2Vec source code
Word2Vec tutorial in TensorFlow

word2vec Parameter Learning Explained
Rong 2014

Articles explaining word2vec: Deep Learning, NLP, and Representations and The amazing power of word vectors


GloVe: Global vectors for word representation
Pennington, Socher, Manning. 2014. Creates word vectors and relates word2vec to matrix factorizations. Evalutaion section led to controversy by Yoav Goldberg
Glove source code and training data


Enriching Word Vectors with Subword Information
Bojanowski, Grave, Joulin, Mikolov 2016
FastText Code

Sentiment Analysis

Thought vectors are numeric representations for sentences, paragraphs, and documents. This concept is used for many text classification tasks such as sentiment analysis.

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
Socher et al. 2013. Introduces Recursive Neural Tensor Network and dataset: "sentiment treebank." Includes demo site. Uses a parse tree.

Distributed Representations of Sentences and Documents
Le, Mikolov. 2014. Introduces Paragraph Vector. Concatenates and averages pretrained, fixed word vectors to create vectors for sentences, paragraphs and documents. Also known as paragraph2vec. Doesn't use a parse tree.
Implemented in gensim. See doc2vec tutorial

Deep Recursive Neural Networks for Compositionality in Language
Irsoy & Cardie. 2014. Uses Deep Recursive Neural Networks. Uses a parse tree.

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
Tai et al. 2015 Introduces Tree LSTM. Uses a parse tree.

Semi-supervised Sequence Learning
Dai, Le 2015
Approach: "We present two approaches that use unlabeled data to improve sequence learning with recurrent networks. The first approach is to predict what comes next in a sequence, which is a conventional language model in natural language processing. The second approach is to use a sequence autoencoder..."
Result: "With pretraining, we are able to train long short term memory recurrent networks up to a few hundred timesteps, thereby achieving strong performance in many text classification tasks, such as IMDB, DBpedia and 20 Newsgroups."

Bag of Tricks for Efficient Text Classification
Joulin, Grave, Bojanowski, Mikolov 2016 Facebook AI Research.
"Our experiments show that our fast text classifier fastText is often on par with deep learning classifiers in terms of accuracy, and many orders of magnitude faster for training and evaluation."
FastText blog
FastText Code

##Neural Machine Translation In 2014, neural machine translation (NMT) performance became comprable to state of the art statistical machine translation(SMT).

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation (abstract)
Cho et al. 2014 Breakthrough deep learning paper on machine translation. Introduces basic sequence to sequence model which includes two rnns, an encoder for input and a decoder for output.

Neural Machine Translation by jointly learning to align and translate (abstract)
Bahdanau, Cho, Bengio 2014.
Implements attention mechanism. "Each time the proposed model generates a word in a translation, it (soft-)searches for a set of positions in a source sentence where the most relevant information is concentrated"
Result: "comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation."
English to French Demo

On Using Very Large Target Vocabulary for Neural Machine Translation
Jean, Cho, Memisevic, Bengio 2014.
"we try replacing each [UNK] token with the aligned source word or its most likely translation determined by another word alignment model."
Result: English -> German bleu score = 21.59 (target vocabulary of 50,000)

Sequence to Sequence Learning with Neural Networks
Sutskever, Vinyals, Le 2014. (nips presentation). Uses seq2seq to generate translations.
Result: English -> French bleu score = 34.8 (WMT’14 dataset)
A key contribution is improvements from reversing the source sentences.
seq2seq tutorial in TensorFlow.

Addressing the Rare Word Problem in Neural Machine Translation (abstract)
Luong, Sutskever, Le, Vinyals, Zaremba 2014
Replace UNK words with dictionary lookup.
Result: English -> French BLEU score = 37.5.

Effective Approaches to Attention-based Neural Machine Translation
Luong, Pham, Manning. 2015
2 models of attention: global and local.
Result: English -> German 25.9 BLEU points

Context-Dependent Word Representation for Neural Machine Translation
Choi, Cho, Bengio 2016
"we propose to contextualize the word embedding vectors using a nonlinear bag-of-words representation of the source sentence."
"we propose to represent special tokens (such as numbers, proper nouns and acronyms) with typed symbols to facilitate translating those words that are not well-suited to be translated via continuous vectors."

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Wu et al. 2016
blog post
"WMT’14 English-to-French, our single model scores 38.95 BLEU"
"WMT’14 English-to-German, our single model scores 24.17 BLEU"

Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
Johnson et al. 2016
blog post
Translations between untrained language pairs.

Google has started rolling out NMT to it's production system, and it's a significant improvement.

Image Captioning

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Xu et al. 2015 Creates captions by feeding image into a CNN which feeds into hidden state of an RNN that generates the caption. At each time step the RNN outputs next word and the next location to pay attention to via a probability over grid locations. Uses 2 types of attention soft and hard. Soft attention uses gradient descent and backprop and is deterministic. Hard attention selects the element with highest probability. Hard attention uses reinforcement learning, rather than backprop and is stochastic.

Open source implementation in TensorFlow

##Conversation modeling / Dialog Neural Responding Machine for Short-Text Conversation
Shang et al. 2015 Uses Neural Responding Machine. Trained on Weibo dataset. Achieves one round conversations with 75% appropriate responses.

A Neural Network Approach to Context-Sensitive Generation of Conversational Responses
Sordoni et al. 2015. Generates responses to tweets.
Uses Recurrent Neural Network Language Model (RLM) architecture of (Mikolov et al., 2010). source code: RNNLM Toolkit

Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models
Serban, Sordoni, Bengio et al. 2015. Extends hierarchical recurrent encoder-decoder neural network (HRED).

Attention with Intention for a Neural Network Conversation Model
Yao et al. 2015 Architecture is three recurrent networks: an encoder, an intention network and a decoder.

A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues
Serban, Sordoni, Lowe, Charlin, Pineau, Courville, Bengio 2016
Proposes novel architecture: VHRED. Latent Variable Hierarchical Recurrent Encoder-Decoder
Compares favorably against LSTM and HRED.


A Neural Conversation Model
Vinyals, Le 2015. Uses LSTM RNNs to generate conversational responses. Uses seq2seq framework. Seq2Seq was originally designed for machine translation and it "translates" a single sentence, up to around 79 words, to a single sentence response, and has no memory of previous dialog exchanges. Used in Google Smart Reply feature for Inbox

Incorporating Copying Mechanism in Sequence-to-Sequence Learning
Gu et al. 2016 Proposes CopyNet, builds on seq2seq.

A Persona-Based Neural Conversation Model
Li et al. 2016 Proposes persona-based models for handling the issue of speaker consistency in neural response generation. Builds on seq2seq.

Deep Reinforcement Learning for Dialogue Generation
Li et al. 2016. Uses reinforcement learing to generate diverse responses. Trains 2 agents to chat with each other. Builds on seq2seq.


Deep learning for chatbots
Article summary of state of the art, and challenges for chatbots.
Deep learning for chatbots. part 2
Implements a retrieval based dialog agent using dual encoder lstm with TensorFlow, based on the Ubuntu dataset [paper] includes source code

##Memory and Attention Models Attention mechanisms allows the network to refer back to the input sequence, instead of forcing it to encode all information into one fixed-length vector. - Attention and Memory in Deep Learning and NLP

Memory Networks Weston et. al 2014, and End-To-End Memory Networks Sukhbaatar et. al 2015.
Memory networks are implemented in MemNN. Attempts to solve task of reason attention and memory.
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
Weston 2015. Classifies QA tasks like single factoid, yes/no etc. Extends memory networks.
Evaluating prerequisite qualities for learning end to end dialog systems
Dodge et. al 2015. Tests Memory Networks on 4 tasks including reddit dialog task.
See Jason Weston lecture on MemNN

Neural Turing Machines
Graves, Wayne, Danihelka 2014.
We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-toend, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples. Olah and Carter blog on NTM

Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets
Joulin, Mikolov 2015. Stack RNN source code and blog post

Reasoning, Attention and Memory RAM workshop at NIPS 2015. slides included

https://github.com/andrewt3000/DL4NLP/blob/master/README.md


本文转自 stock0991 51CTO博客,原文链接:http://blog.51cto.com/qing0991/1913377


相关文章
|
机器学习/深度学习 自然语言处理 知识图谱
ICLR2021对比学习(Contrastive Learning)NLP领域论文进展梳理(二)
ICLR2021对比学习(Contrastive Learning)NLP领域论文进展梳理(二)
363 0
ICLR2021对比学习(Contrastive Learning)NLP领域论文进展梳理(二)
|
自然语言处理
ICLR2021对比学习(Contrastive Learning)NLP领域论文进展梳理(一)
ICLR2021对比学习(Contrastive Learning)NLP领域论文进展梳理(一)
469 0
ICLR2021对比学习(Contrastive Learning)NLP领域论文进展梳理(一)
|
自然语言处理 数据挖掘 C++
对比学习(Contrastive Learning)在CV与NLP领域中的研究进展(三)
对比学习(Contrastive Learning)在CV与NLP领域中的研究进展(三)
218 0
对比学习(Contrastive Learning)在CV与NLP领域中的研究进展(三)
|
自然语言处理 计算机视觉
对比学习(Contrastive Learning)在CV与NLP领域中的研究进展(二)
对比学习(Contrastive Learning)在CV与NLP领域中的研究进展(二)
382 0
对比学习(Contrastive Learning)在CV与NLP领域中的研究进展(二)
|
机器学习/深度学习 自然语言处理 算法
对比学习(Contrastive Learning)在CV与NLP领域中的研究进展(一)
对比学习(Contrastive Learning)在CV与NLP领域中的研究进展(一)
1026 0
对比学习(Contrastive Learning)在CV与NLP领域中的研究进展(一)
|
1月前
|
机器学习/深度学习 人工智能 自然语言处理
AI技术在自然语言处理中的应用与挑战
【10月更文挑战第3天】本文将探讨AI技术在自然语言处理(NLP)领域的应用及其面临的挑战。我们将分析NLP的基本原理,介绍AI技术如何推动NLP的发展,并讨论当前的挑战和未来的趋势。通过本文,读者将了解AI技术在NLP中的重要性,以及如何利用这些技术解决实际问题。
|
2月前
|
机器学习/深度学习 数据采集 自然语言处理
深度学习在自然语言处理中的应用与挑战
本文探讨了深度学习技术在自然语言处理(NLP)领域的应用,包括机器翻译、情感分析和文本生成等方面。同时,讨论了数据质量、模型复杂性和伦理问题等挑战,并提出了未来的研究方向和解决方案。通过综合分析,本文旨在为NLP领域的研究人员和从业者提供有价值的参考。
|
1月前
|
自然语言处理 算法 Python
自然语言处理(NLP)在文本分析中的应用:从「被动收集」到「主动分析」
【10月更文挑战第9天】自然语言处理(NLP)在文本分析中的应用:从「被动收集」到「主动分析」
46 4
|
1月前
|
机器学习/深度学习 人工智能 自然语言处理
探索AI在自然语言处理中的创新应用
【10月更文挑战第7天】本文将深入探讨人工智能在自然语言处理领域的最新进展,揭示AI技术如何改变我们与机器的互动方式,并展示通过实际代码示例实现的具体应用。
35 1
|
2月前
|
机器学习/深度学习 人工智能 自然语言处理
AI技术在自然语言处理中的应用
【9月更文挑战第17天】本文主要介绍了AI技术在自然语言处理(NLP)领域的应用,包括文本分类、情感分析、机器翻译和语音识别等方面。通过实例展示了AI技术如何帮助解决NLP中的挑战性问题,并讨论了未来发展趋势。

热门文章

最新文章