DL:深度学习算法(神经网络模型集合)概览之《THE NEURAL NETWORK ZOO》的中文解释和感悟(二)

简介: DL:深度学习算法(神经网络模型集合)概览之《THE NEURAL NETWORK ZOO》的中文解释和感悟

Feed forward neural networks (FF or FFNN) and perceptrons (P) are very straight forward, they feed information from the front to the back (input and output, respectively). Neural networks are often described as having layers, where each layer consists of either input, hidden or output cells in parallel. A layer alone never has connections and in general two adjacent layers are fully connected (every neuron form one layer to every neuron to another layer). The simplest somewhat practical network has two input cells and one output cell, which can be used to model logic gates. One usually trains FFNNs through back-propagation, giving the network paired datasets of “what goes in” and “what we want to have coming out”. This is called supervised learning, as opposed to unsupervised learning where we only give it input and let the network fill in the blanks. The error being back-propagated is often some variation of the difference between the input and the output (like MSE or just the linear difference). Given that the network has enough hidden neurons, it can theoretically always model the relationship between the input and output. Practically their use is a lot more limited but they are popularly combined with other networks to form new networks.

       前馈神经网络(FF或FFNN)和感知器(P)是非常直接的,它们将信息从前面反馈到后面(分别是输入和输出)。神经网络通常被描述为具有多个层,其中每个层由并行的输入、隐藏或输出单元组成。单独的一层永远不会有连接,通常相邻的两层是完全连接的(每个神经元从一层到另一层)。

     最简单而实用的网络有两个输入单元和一个输出单元,可以用它们来建模逻辑门。人们通常通过反向传播来训练FFNNs,给网络配对的数据集“输入什么”和“输出什么”。这叫做监督学习,而不是我们只给它输入,让网络来填补空白的非监督学习。

     反向传播的错误通常是输入和输出之间的差异的一些变化(如MSE或只是线性差异)。假设网络有足够多的隐藏神经元,理论上它总是可以模拟输入和输出之间的关系。实际上,它们的使用非常有限,但是它们通常与其他网络结合在一起形成新的网络。


Rosenblatt, Frank. “The perceptron: a probabilistic model for information storage and organization in the brain.” Psychological review 65.6 (1958): 386.

Original Paper PDF



RBF

image.png



     Radial basis function (RBF) networks are FFNNs with radial basis functions as activation functions. There’s nothing more to it. Doesn’t mean they don’t have their uses, but most FFNNs with other activation functions don’t get their own name. This mostly has to do with inventing them at the right time.

     径向基函数网络是以径向基函数为激活函数的神经网络。没有别的了。这并不意味着它们没有自己的用途,但是大多数带有其他激活函数的FFNNs都没有自己的名称。这主要与在正确的时间发明它们有关。


Broomhead, David S., and David Lowe. Radial basis functions, multi-variable functional interpolation and adaptive networks. No. RSRE-MEMO-4148. ROYAL SIGNALS AND RADAR ESTABLISHMENT MALVERN (UNITED KINGDOM), 1988.

Original Paper PDF



RNN

image.png



     Recurrent neural networks (RNN) are FFNNs with a time twist: they are not stateless; they have connections between passes, connections through time. Neurons are fed information not just from the previous layer but also from themselves from the previous pass. This means that the order in which you feed the input and train the network matters: feeding it “milk” and then “cookies” may yield different results compared to feeding it “cookies” and then “milk”. One big problem with RNNs is the vanishing (or exploding) gradient problem where, depending on the activation functions used, information rapidly gets lost over time, just like very deep FFNNs lose information in depth. Intuitively this wouldn’t be much of a problem because these are just weights and not neuron states, but the weights through time is actually where the information from the past is stored; if the weight reaches a value of 0 or 1 000 000, the previous state won’t be very informative. RNNs can in principle be used in many fields as most forms of data that don’t actually have a timeline (i.e. unlike sound or video) can be represented as a sequence. A picture or a string of text can be fed one pixel or character at a time, so the time dependent weights are used for what came before in the sequence, not actually from what happened x seconds before. In general, recurrent networks are a good choice for advancing or completing information, such as autocompletion.

      递归神经网络(RNN)是具有时间扭曲的FFNNs:它们不是无状态的;它们之间有联系,时间上的联系。神经元不仅接受来自前一层的信息,还接受来自前一层的自身信息。这意味着输入和训练网络的顺序很重要:先给它“牛奶”,然后再给它“饼干”,与先给它“饼干”,然后再给它“牛奶”相比,可能会产生不同的结果。

     RNNs的一个大问题是消失(或爆炸)梯度问题,根据使用的激活函数,随着时间的推移,信息迅速丢失,就像非常深的FFNNs在深度上丢失信息一样。直觉上这不会是个大问题因为这些只是权重而不是神经元的状态,但是时间的权重实际上是储存过去信息的地方;如果权重达到0或1,000 000,则前面的状态不会提供太多信息。

     RNNs原则上可以在许多领域中使用,因为大多数没有时间轴的数据形式(与声音或视频不同)都可以表示为序列。一张图片或一串文本可以一次输入一个像素或字符,所以时间相关的权重用于序列中之前发生的内容,而不是x秒之前发生的内容。一般来说,递归网络是一个很好的选择,用于推进或完成信息,如自动完成。


Elman, Jeffrey L. “Finding structure in time.” Cognitive science 14.2 (1990): 179-211.

Original Paper PDF



LSTM


image.png


      Long / short term memory (LSTM) networks try to combat the vanishing / exploding gradient problem by introducing gates and an explicitly defined memory cell. These are inspired mostly by circuitry, not so much biology. Each neuron has a memory cell and three gates: input, output and forget. The function of these gates is to safeguard the information by stopping or allowing the flow of it. The input gate determines how much of the information from the previous layer gets stored in the cell. The output layer takes the job on the other end and determines how much of the next layer gets to know about the state of this cell. The forget gate seems like an odd inclusion at first but sometimes it’s good to forget: if it’s learning a book and a new chapter begins, it may be necessary for the network to forget some characters from the previous chapter. LSTMs have been shown to be able to learn complex sequences, such as writing like Shakespeare or composing primitive music. Note that each of these gates has a weight to a cell in the previous neuron, so they typically require more resources to run.


      长/短期内存(LSTM)网络试图通过引入门和显式定义的内存单元来解决渐变消失/爆炸的问题。这些灵感主要来自电路,而不是生物学。

      每个神经元都有一个记忆细胞和三个门:输入、输出和遗忘。这些门的功能是通过阻止或允许信息流动来保护信息。输入门决定前一层的信息有多少存储在单元格中。输出层接受另一端的任务,并确定下一层对这个单元格的状态了解多少。“忘记门”一开始看起来很奇怪,但有时候忘记也是有好处的:如果你正在学习一本书,并且翻开了新的一章,那么网络可能有必要忘记前一章中的一些字符。

     LSTMs已经被证明能够学习复杂的序列,比如像莎士比亚那样的写作或者创作原始音乐。注意,这些门中的每一个都对前一个神经元中的一个细胞有一个权重,因此它们通常需要更多的资源来运行。


Hochreiter, Sepp, and Jürgen Schmidhuber. “Long short-term memory.” Neural computation 9.8 (1997): 1735-1780.

Original Paper PDF



GRU


image.png


      Gated recurrent units (GRU) are a slight variation on LSTMs. They have one less gate and are wired slightly differently: instead of an input, output and a forget gate, they have an update gate. This update gate determines both how much information to keep from the last state and how much information to let in from the previous layer. The reset gate functions much like the forget gate of an LSTM but it’s located slightly differently. They always send out their full state, they don’t have an output gate. In most cases, they function very similarly to LSTMs, with the biggest difference being that GRUs are slightly faster and easier to run (but also slightly less expressive). In practice these tend to cancel each other out, as you need a bigger network to regain some expressiveness which then in turn cancels out the performance benefits. In some cases where the extra expressiveness is not needed, GRUs can outperform LSTMs.

     门控循环单位(GRU)是LSTM的一个微小变化。它们少了一个门,接线方式略有不同:它们没有输入、输出和忘记门,而是有一个更新门。这个更新门决定了与上一个状态保持多少信息,以及从上一层允许多少信息。

    重置门的功能与LSTM的忘记门非常相似,但其位置略有不同。它们总是发送它们的完整状态,它们没有输出门。在大多数情况下,它们的功能与lstms非常相似,最大的区别在于grus的速度稍快,运行起来也更容易(但表达能力也稍差)。在实践中,它们往往会相互抵消,因为您需要一个更大的网络来重新获得一些表现力,而这反过来又会抵消性能优势。在一些不需要额外表现力的情况下,GRUS可以优于LSTM。


Chung, Junyoung, et al. “Empirical evaluation of gated recurrent neural networks on sequence modeling.” arXiv preprint arXiv:1412.3555 (2014).

Original Paper PDF



BiRNN, BiLSTM and BiGRU


      Bidirectional recurrent neural networks, bidirectional long / short term memory networks and bidirectional gated recurrent units (BiRNN, BiLSTM and BiGRU respectively) are not shown on the chart because they look exactly the same as their unidirectional counterparts. The difference is that these networks are not just connected to the past, but also to the future. As an example, unidirectional LSTMs might be trained to predict the word “fish” by being fed the letters one by one, where the recurrent connections through time remember the last value. A BiLSTM would also be fed the next letter in the sequence on the backward pass, giving it access to future information. This trains the network to fill in gaps instead of advancing information, so instead of expanding an image on the edge, it could fill a hole in the middle of an image.

      图中没有显示双向递归神经网络、双向长/短期记忆网络和双向门控递归单元(BiRNN、BiLSTM和BiGRU),因为它们看起来与单向递归单元完全相同。不同之处在于,这些网络不仅与过去相连,而且与未来相连。例如,单向LSTMs可以通过逐个输入字母来训练预测单词“fish”,其中通过时间的重复连接记住最后一个值。BiLSTM还将在向后传递时按顺序输入下一个字母,让它访问未来的信息。这就训练了网络来填补空白,而不是推进信息,因此它可以填补图像中间的一个洞,而不是在边缘扩展图像。


Schuster, Mike, and Kuldip K. Paliwal. “Bidirectional recurrent neural networks.” IEEE Transactions on Signal Processing 45.11 (1997): 2673-2681.

Original Paper PDF


 


相关文章
|
7天前
|
机器学习/深度学习 人工智能 算法
青否数字人声音克隆算法升级,16个超真实直播声音模型免费送!
青否数字人的声音克隆算法全面升级,能够完美克隆真人的音调、语速、情感和呼吸。提供16种超真实的直播声音模型,支持3大AI直播类型和6大核心AIGC技术,60秒快速开播,助力商家轻松赚钱。AI讲品、互动和售卖功能强大,支持多平台直播,确保每场直播话术不重复,智能互动和真实感十足。新手小白也能轻松上手,有效规避违规风险。
|
8天前
|
分布式计算 Java 开发工具
阿里云MaxCompute-XGBoost on Spark 极限梯度提升算法的分布式训练与模型持久化oss的实现与代码浅析
本文介绍了XGBoost在MaxCompute+OSS架构下模型持久化遇到的问题及其解决方案。首先简要介绍了XGBoost的特点和应用场景,随后详细描述了客户在将XGBoost on Spark任务从HDFS迁移到OSS时遇到的异常情况。通过分析异常堆栈和源代码,发现使用的`nativeBooster.saveModel`方法不支持OSS路径,而使用`write.overwrite().save`方法则能成功保存模型。最后提供了完整的Scala代码示例、Maven配置和提交命令,帮助用户顺利迁移模型存储路径。
|
12天前
|
机器学习/深度学习 人工智能 算法
【车辆车型识别】Python+卷积神经网络算法+深度学习+人工智能+TensorFlow+算法模型
车辆车型识别,使用Python作为主要编程语言,通过收集多种车辆车型图像数据集,然后基于TensorFlow搭建卷积网络算法模型,并对数据集进行训练,最后得到一个识别精度较高的模型文件。再基于Django搭建web网页端操作界面,实现用户上传一张车辆图片识别其类型。
49 0
【车辆车型识别】Python+卷积神经网络算法+深度学习+人工智能+TensorFlow+算法模型
|
28天前
|
机器学习/深度学习 算法 搜索推荐
django调用矩阵分解推荐算法模型做推荐系统
django调用矩阵分解推荐算法模型做推荐系统
19 4
|
16天前
|
算法 安全 数据安全/隐私保护
基于game-based算法的动态频谱访问matlab仿真
本算法展示了在认知无线电网络中,通过游戏理论优化动态频谱访问,提高频谱利用率和物理层安全性。程序运行效果包括负载因子、传输功率、信噪比对用户效用和保密率的影响分析。软件版本:Matlab 2022a。完整代码包含详细中文注释和操作视频。
|
1天前
|
算法 数据挖掘 数据安全/隐私保护
基于FCM模糊聚类算法的图像分割matlab仿真
本项目展示了基于模糊C均值(FCM)算法的图像分割技术。算法运行效果良好,无水印。使用MATLAB 2022a开发,提供完整代码及中文注释,附带操作步骤视频。FCM算法通过隶属度矩阵和聚类中心矩阵实现图像分割,适用于灰度和彩色图像,广泛应用于医学影像、遥感图像等领域。
|
2天前
|
算法 调度
基于遗传模拟退火混合优化算法的车间作业最优调度matlab仿真,输出甘特图
车间作业调度问题(JSSP)通过遗传算法(GA)和模拟退火算法(SA)优化多个作业在并行工作中心上的加工顺序和时间,以最小化总完成时间和机器闲置时间。MATLAB2022a版本运行测试,展示了有效性和可行性。核心程序采用作业列表表示法,结合遗传操作和模拟退火过程,提高算法性能。
|
3天前
|
存储 算法 决策智能
基于免疫算法的TSP问题求解matlab仿真
旅行商问题(TSP)是一个经典的组合优化问题,目标是寻找经过每个城市恰好一次并返回起点的最短回路。本文介绍了一种基于免疫算法(IA)的解决方案,该算法模拟生物免疫系统的运作机制,通过克隆选择、变异和免疫记忆等步骤,有效解决了TSP问题。程序使用MATLAB 2022a版本运行,展示了良好的优化效果。
|
2天前
|
机器学习/深度学习 算法 芯片
基于GSP工具箱的NILM算法matlab仿真
基于GSP工具箱的NILM算法Matlab仿真,利用图信号处理技术解析家庭或建筑内各电器的独立功耗。GSPBox通过图的节点、边和权重矩阵表示电气系统,实现对未知数据的有效分类。系统使用MATLAB2022a版本,通过滤波或分解技术从全局能耗信号中提取子设备的功耗信息。
|
2天前
|
机器学习/深度学习 算法 5G
基于MIMO系统的SDR-AltMin混合预编码算法matlab性能仿真
基于MIMO系统的SDR-AltMin混合预编码算法通过结合半定松弛和交替最小化技术,优化大规模MIMO系统的预编码矩阵,提高信号质量。Matlab 2022a仿真结果显示,该算法能有效提升系统性能并降低计算复杂度。核心程序包括预编码和接收矩阵的设计,以及不同信噪比下的性能评估。
14 3

热门文章

最新文章