Paper:《Generating Sequences With Recurrent Neural Networks》的翻译和解读

简介: Paper:《Generating Sequences With Recurrent Neural Networks》的翻译和解读

目录

Generating Sequences With Recurrent Neural Networks

Abstract

1、Introduction

2 Prediction Network 预测网络

2.1 Long Short-Term Memory

3 Text Prediction  文本预测

3.1 Penn Treebank Experiments Penn Treebank实验

3.2 Wikipedia Experiments  维基百科的实验

4 Handwriting Prediction  笔迹的预测

4.1 Mixture Density Outputs  混合密度输出

4.2 Experiments

4.3 Samples  样品

5 Handwriting Synthesis  字合成

5.1 Synthesis Network  合成网络

5.2 Experiments  实验

5.3 Unbiased Sampling  公正的抽样

5.4 Biased Sampling  有偏见的抽样

5.5 Primed Sampling  启动采样

6 Conclusions and Future Work  结论与未来工作

Acknowledgements  致谢

References

Generating Sequences With Recurrent Neural Networks

利用递归神经网络生成序列


论文原文:Generating Sequences With Recurrent Neural Networks

作者:

Alex Graves   Department of Computer Science  

University of Toronto      graves@cs.toronto.edu

Abstract

image.png

1、Introduction 介绍

image.png

image.png

image.png

image.png

image.png

2 Prediction Network 预测网络

image.png

image.png

image.png

2.1 Long Short-Term Memory

image.png

image.png

3 Text Prediction  文本预测


image.png

image.png

3.1 Penn Treebank Experiments Penn Treebank实验

image.png

image.png

image.png

image.png

3.2 Wikipedia Experiments  维基百科的实验

image.png

image.png

image.png

image.png

image.png

4 Handwriting Prediction  笔迹的预测

image.png

image.png

image.png

4.1 Mixture Density Outputs  混合密度输出

image.png

image.png

image.png

image.png

image.png

image.png

image.png

image.png

4.3 Samples  样品

image.png

5 Handwriting Synthesis  字合成

image.png

image.png

image.png

image.png

5.1 Synthesis Network  合成网络

image.png

image.png

image.png

image.png

5.2 Experiments  实验

image.pngimage.png

5.3 Unbiased Sampling  公正的抽样

image.png

5.4 Biased Sampling  有偏见的抽样

image.png

image.png

image.png

5.5 Primed Sampling  启动采样

image.png

6 Conclusions and Future Work  结论与未来工作

image.png

Acknowledgements  致谢


image.png


Reference[1] Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157–166, March 1994.

[2] C. Bishop. Mixture density networks. Technical report, 1994.

[3] C. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, Inc., 1995.

[4] N. Boulanger-Lewandowski, Y. Bengio, and P. Vincent. Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. In Proceedings of the Twenty-nine International Conference on Machine Learning (ICML’12), 2012.

[5] J. G. Cleary, Ian, and I. H. Witten. Data compression using adaptive coding and partial string matching. IEEE Transactions on Communications, 32:396–402, 1984.

[6] D. Eck and J. Schmidhuber. A first look at music composition using lstm recurrent neural networks. Technical report, IDSIA USI-SUPSI Instituto Dalle Molle.

[7] F. Gers, N. Schraudolph, and J. Schmidhuber. Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research, 3:115–143, 2002.

[8] A. Graves. Practical variational inference for neural networks. In Advances in Neural Information Processing Systems, volume 24, pages 2348–2356. 2011.

[9] A. Graves. Sequence transduction with recurrent neural networks. In ICML Representation Learning Worksop, 2012.

[10] A. Graves, A. Mohamed, and G. Hinton. Speech recognition with deep recurrent neural networks. In Proc. ICASSP, 2013.

[11] A. Graves and J. Schmidhuber. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18:602–610, 2005.

[12] A. Graves and J. Schmidhuber. Offline handwriting recognition with multidimensional recurrent neural networks. In Advances in Neural Information Processing Systems, volume 21, 2008.

[13] P. D. Gr¨unwald. The Minimum Description Length Principle (Adaptive Computation and Machine Learning). The MIT Press, 2007.

[14] G. Hinton. A Practical Guide to Training Restricted Boltzmann Machines. Technical report, 2010.

[15] S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-term Dependencies. In S. C. Kremer and J. F. Kolen, editors, A Field Guide to Dynamical Recurrent Neural Networks. 2001.

[16] S. Hochreiter and J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735–1780, 1997.

[17] M. Hutter. The Human Knowledge Compression Contest, 2012. [18] K.-C. Jim, C. Giles, and B. Horne. An analysis of noise in recurrent neural networks: convergence and generalization. Neural Networks, IEEE Transactions on, 7(6):1424 –1438, 1996. [19] S. Johansson, R. Atwell, R. Garside, and G. Leech. The tagged LOB corpus user’s manual; Norwegian Computing Centre for the Humanities, 1986.

[20] B. Knoll and N. de Freitas. A machine learning perspective on predictive coding with paq. CoRR, abs/1108.3298, 2011.

[21] M. Liwicki and H. Bunke. IAM-OnDB - an on-line English sentence database acquired from handwritten text on a whiteboard. In Proc. 8th Int. Conf. on Document Analysis and Recognition, volume 2, pages 956– 961, 2005.

[22] M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz. Building a large annotated corpus of english: The penn treebank. COMPUTATIONAL LINGUISTICS, 19(2):313–330, 1993.

[23] T. Mikolov. Statistical Language Models based on Neural Networks. PhD thesis, Brno University of Technology, 2012.

[24] T. Mikolov, I. Sutskever, A. Deoras, H. Le, S. Kombrink, and J. Cernocky. Subword language modeling with neural networks. Technical report, Unpublished Manuscript, 2012.

[25] A. Mnih and G. Hinton. A Scalable Hierarchical Distributed Language Model. In Advances in Neural Information Processing Systems, volume 21, 2008.

[26] A. Mnih and Y. W. Teh. A fast and simple algorithm for training neural probabilistic language models. In Proceedings of the 29th International Conference on Machine Learning, pages 1751–1758, 2012.

[27] T. N. Sainath, A. Mohamed, B. Kingsbury, and B. Ramabhadran. Lowrank matrix factorization for deep neural network training with highdimensional output targets. In Proc. ICASSP, 2013.

[28] M. Schuster. Better generative models for sequential data problems: Bidirectional recurrent mixture density networks. pages 589–595. The MIT Press, 1999.

[29] I. Sutskever, G. E. Hinton, and G. W. Taylor. The recurrent temporal restricted boltzmann machine. pages 1601–1608, 2008.

[30] I. Sutskever, J. Martens, and G. Hinton. Generating text with recurrent neural networks. In ICML, 2011.

[31] G. W. Taylor and G. E. Hinton. Factored conditional restricted boltzmann machines for modeling motion style. In Proc. 26th Annual International Conference on Machine Learning, pages 1025–1032, 2009.

[32] T. Tieleman and G. Hinton. Lecture 6.5 - rmsprop: Divide the gradient by a running average of its recent magnitude, 2012.

[33] R. Williams and D. Zipser. Gradient-based learning algorithms for recurrent networks and their computational complexity. In Back-propagation: Theory, Architectures and Applications, pages 433–486. 1995.






相关文章
采用zookeeper的EPHEMERAL节点机制实现服务集群的陷阱
在集群管理中使用Zookeeper的EPHEMERAL节点机制存在很多的陷阱,毛估估,第一次使用zk来实现集群管理的人应该有80%以上会掉坑,有些坑比较隐蔽,在网络问题或者异常的场景时才会出现,可能很长一段时间才会暴露出来。
14868 1
|
机器学习/深度学习 数据采集 自然语言处理
【机器学习实战】10分钟学会Python怎么用LASSO回归进行正则化(十二)
【机器学习实战】10分钟学会Python怎么用LASSO回归进行正则化(十二)
4429 0
|
并行计算 Linux Docker
Docker【部署 07】镜像内安装tensorflow-gpu及调用GPU多个问题处理Could not find cuda drivers+unable to find libcuda.so...
Docker【部署 07】镜像内安装tensorflow-gpu及调用GPU多个问题处理Could not find cuda drivers+unable to find libcuda.so...
1276 0
|
8月前
|
算法
一次推理,实现六大3D点云分割任务!华科发布大一统算法UniSeg3D,性能新SOTA
华中科技大学研究团队提出了一种名为UniSeg3D的创新算法,该算法通过一次推理即可完成六大3D点云分割任务(全景、语义、实例、交互式、指代和开放词汇分割),并基于Transformer架构实现任务间知识共享与互惠。实验表明,UniSeg3D在多个基准数据集上超越现有SOTA方法,为3D场景理解提供了全新统一框架。然而,模型较大可能限制实际部署。
646 15
|
算法 量子技术
|
机器学习/深度学习 人工智能 运维
智能化运维:提升IT服务效率的新引擎###
本文深入浅出地探讨了智能化运维(AIOps)如何革新传统IT运维模式,通过大数据、机器学习与自动化技术,实现故障预警、快速定位与处理,从而显著提升IT服务的稳定性和效率。不同于传统运维依赖人工响应,AIOps强调预测性维护与自动化流程,为企业数字化转型提供强有力的支撑。 ###
|
数据采集 分布式计算 并行计算
Dask与Pandas:无缝迁移至分布式数据框架
【8月更文第29天】Pandas 是 Python 社区中最受欢迎的数据分析库之一,它提供了高效且易于使用的数据结构,如 DataFrame 和 Series,以及大量的数据分析功能。然而,随着数据集规模的增大,单机上的 Pandas 开始显现出性能瓶颈。这时,Dask 就成为了一个很好的解决方案,它能够利用多核 CPU 和多台机器进行分布式计算,从而有效地处理大规模数据集。
757 1
|
机器学习/深度学习 人工智能 自然语言处理
AI 古籍修复技术
AI 古籍修复技术
969 0
|
存储 缓存 安全
解决Edge浏览器提示“此网站已被人举报不安全”
【8月更文挑战第19天】如果Edge浏览器提示“此网站已被人举报不安全”,首先确认网站可信度及安全证书有效性,避免访问可疑网站。检查浏览器是否需要更新,并确保自动更新功能已开启。可暂时关闭Microsoft Defender SmartScreen(不建议长期关闭),清除缓存和Cookies,或检查第三方安全软件设置。若问题持续,考虑重置Edge浏览器设置,保留重要数据。如仍无法解决,联系网站管理员或微软支持。
2742 7
|
算法 Java 程序员
解锁Python高效之道:并发与异步在IO与CPU密集型任务中的精准打击策略!
在数据驱动时代,高效处理大规模数据和高并发请求至关重要。Python凭借其优雅的语法和强大的库支持,成为开发者首选。本文将介绍Python中的并发与异步编程,涵盖并发与异步的基本概念、IO密集型任务的并发策略、CPU密集型任务的并发策略以及异步IO的应用。通过具体示例,展示如何使用`concurrent.futures`、`asyncio`和`multiprocessing`等库提升程序性能,帮助开发者构建高效、可扩展的应用程序。
530 0