Geoffrey Hinton's Capsule Networks: A Novel Approach to Deep Learning

简介: The Capsule Network proposed by Dr. Geoffrey Hinton brings a new perspective to Deep Learning as compared to Convolutional Neural Networks.

PrincipleofSearchingForSimilarImages_part1

In 2012, Geoffrey Hinton changed the way machines "see" the world. Along with two of his students, Alex Krizhevsky and Ilya Sutskever, Dr. Hinton published a paper titled ImageNet Classification with Deep Convolutional Neural Networks. In the paper, he proposed a deep convolution neural network model named AlexNet, which won first prize in a large-scale image recognition competition in that year. AlexNet reduced the errors of Rank-1 and Rank-5 to 37.5% and 17.0% respectively, a significant improvement in terms of image recognition accuracy. With this success, Hinton joined Google Brain, and AlexNet became one of the most classic image recognition models widely used in the industry.

Solving Issues from Tradition Convolutional Neural Networks (CNN) using Capsule Network

In 2017, together with his two colleagues at Google Brain, Sara Sabour and Nicholas Frosst, Hinton published the paper Dynamic Routing Between Capsules. The team proposed a new neural network model called the Capsule Network, which has better results for specific task than the traditional convolutional neural network (CNN). Unlike CNN, the Capsule Network helps machines understand images by giving them a new perspective, similar to the three-dimensional perspective that humans have.

Although there is much room for improvement, the Capsule Network has reached the highest accuracy in MNIST, Analysis as analyzed by Aurélien Géron, the author of the Hands-On Machine Learning with Scikit-Learn and TensorFlow. Its performance on the CIFAR10 dataset can be further enhanced, which is reassuring for the development of Capsule Network.

The Capsule Network also requires less training data. It delivers equivariant mapping, allowing for the preservation of position and pose information. This is promising for image segmentation and object detection. In addition, routing by agreement is great for overlapping objects. Capsule activations nicely map the hierarchy of parts, assigning each part to a whole. It offers robustness to rotation, translation and other affine transformations. Activation vectors are easier to interpret. Finally, as the idea of Master Hinton, "it is undoubtedly forward-looking."

The New York Times published an article recently about a visit to Hinton's laboratory in Toronto, interviewing Hinton and Sara Sabour, author of Dynamic Routing Between Capsules, who described her ambitious vision of the Capsule Network.

1

Can you combine the two models in the picture into a pyramid? This seemingly simple task is beyond the capabilities of most computers, and even humans. Image source: New York Times

If a traditional neural network is trained on images that show a coffee cup only from the side, for example, it is unlikely to recognize a coffee cup turned upside down. This is the limitation of traditional CNNs. However, Hinton wants to use the Capsule Network to realize human 3D vision.

In the report, the reporter described a two-piece pyramid puzzle held by Hinton and Sabour, as shown in the above picture. The two gypsum models can actually be put together to form a tetrahedral pyramid. Conceptually, it doesn't seem too hard, but most people actually fail this test, including the reporter and two tenured professors at the Massachusetts Institute of Technology. One declined to try, and the other insisted it wasn't possible.

Mr. Hinton explained, "We picture the whole thing sitting in three-dimensional space. And because of the way the puzzle cuts the pyramid in two, it prevents us from picturing it in 3-D space as we normally would."

With his capsule networks, Mr. Hinton aims to finally give machines the same three-dimensional perspective that humans have — allowing them to recognize a coffee cup from any angle after learning what it looks like from only one.

Utilizing Soft Decision Trees for Better Classification

While overcoming the shortcomings of traditional neural networks, Hinton has also been devoted to understanding deep neural networks.

Recently, the 16th International Conference of the Italian Association for Artificial Intelligence (AIIA) was held. The Comprehensibility and Explanation in AI and ML, CEX workshop was also held concurrently with the AIIA 2017. As its name implies, the CEX workshop addresses fundamental questions for the nature of "comprehensibility" and "explanation" in an AI and ML context from a theoretical and an applied perspective. Research into philosophical approximations to what an explanation in AI and ML is (or can be) or how the comprehensibility of an intelligent system can formally be defined will be presented. Next to work addressing practical questions of how to assess a systems comprehensibility from a psychological perspective, or how to design and build better explainable AI and ML systems.

2

At the workshop, Hinton and his colleague Nicholas Frosst at Google Brain co-authored and submitted a paper entitled "Distilling a Neural Network Into a Soft Decision Tree".

3

Summary of the Paper

Deep neural networks have proved to be a very effective way to perform classification tasks. They excel when the input data is high dimensional, the relationship between the input and the output is detailed, and the number of labeled training examples is large. But it is hard to explain why a learned network makes a particular classification decision on a particular test case. This is due to their reliance on distributed hierarchical representations. If we could take the knowledge acquired by the neural net and express the same knowledge in a model that relies on hierarchical decisions instead, explaining a particular decision would be much easier. We describe a way of using a trained neural net to create a type of soft decision tree that generalizes better than one learned directly from the training data.

The excellent generalization abilities of deep neural nets depend on their use of distributed representations in their hidden layers, but these representations are hard to understand. For the first hidden layer we can understand what causes an activation of a unit and for the last hidden layer we can understand the effects of activating a unit, but for the other hidden layers it is much harder to understand the causes and effects of a feature activation in terms of variables that are meaningful such as the input and output variables.

Also, the units in a hidden layer factor the representation of the input vector into a set of feature activations in such a way that the combined effects of the active features can cause an appropriate distributed representation in the next hidden layer. This makes it very difficult to understand the functional role of any particular feature activation in isolation since its marginal effect depends on the effects of all the other units in the same layer.

These difficulties are further compounded by the fact that deep neural nets can make decisions by modeling a very large number of weak statistical regularities in the relationship between the inputs and outputs of the training data. However, there is nothing in the neural network to distinguish the weak regularities that are true properties of the data from the spurious regularities that are created by the sampling peculiarities of the training set. Faced with all these difficulties, it seems wise to abandon the idea of trying to understand how a deep neural network makes a classification decision by understanding what the individual hidden units do.

By contrast, it is easy to explain how a decision tree makes any particular classification because this depends on a relatively short sequence of decisions and each decision is based directly on the input data. Decision trees, however, do not usually generalize as well as deep neural nets. Unlike the hidden units in a neural net, a typical node at the lower levels of a decision tree is only used by a very small fraction of the training data so the lower parts of the decision tree tend to overfit unless the size of the training set is exponentially large compared with the depth of the tree.

In this paper, the authors proposed a novel way of resolving the tension between generalization and interpretability. Instead of trying to understand how a deep neural network makes its decisions, they use the deep neural network to train a decision tree that mimics the input-output function discovered by the neural network but in a completely different way. Now they have got a model, through which the decision made is interpretable.

4

Classification using a soft decision tree

The image above shows the visualization of a soft decision tree of depth 4 trained on MNIST. The images at the inner nodes are the learned filters, and the images at the leaves are visualizations of the learned probability distribution over classes. The final most likely classification at each leaf, as well as the likely classifications at each edge are annotated. If we take for example the right most internal node, we can see that at that level in the tree the potential classifications are only 3 or 8, thus the learned filter is simply learning to distinguish between those two digits. The result is a filter that looks for the presence of two areas that would join the ends of the 3 to make an 8.

5

Analyzing the image of a Connect4 game

This is a visualization of the first 2 layers of a soft decision tree trained on the Connect4 data set. From examining the learned filters we can see that the game can be split into two distinct sub types of games - games where the players have placed pieces on the edges of the board, and games where the players have placed pieces in the center of the board.

The main motivation behind this work was to create a model whose behavior is easy to explain; in order to fully understand why a particular example was given a particular classification, one can simply examine all the learned filters along the path between the root and the classification's leaf node. The crux of this model is that it does not rely on hierarchical features, it relies on hierarchical decisions instead. The hierarchical features of a traditional neural network allow it to learn robust and novel representations of the input space, but past a single level or two, they become extremely difficult to engage with.

In this paper, the authors mentioned, "Some current attempts at explanations for neural networks rely on the use of gradient descent to find an input that particularly excites a given neuron, but this results is a single point on a manifold of inputs, meaning that other inputs could yield the same pattern of neural excitement, and so it does not reflect the entire manifold". Ribeiro et al. propose a strategy which relies on fitting some explainable model which "acts over absence/presence of interpretable components" to the behavior of a deep neural net around some area of interest in the input space. This is accomplished by sampling from the input space and querying the model around the area of interest and then fitting an explainable model to the output of the model. This avoids the problem of attempting to explain a particular output by visualizing a single point on a manifold but introduces the problem of necessitating a new explainable model for every area of interest in the input space, and attempting to explain changes in the model's behavior by first order changes in a discretized interpretation of the input space. "By relying on hierarchical decisions instead of hierarchical features we side-step these problems, as each decision is made at a level of abstraction that the reader can engage with directly."

If there is a large amount of unlabeled data, the neural net can be used to create a much larger labeled data set to train a decision tree, thus overcoming the statistical inefficiency of decision trees. Even if unlabeled data is unavailable, it may be possible to use recent advances in generative modeling (such as GAN) to generate synthetic unlabeled data from a distribution that is close to the data distribution. Moreover, without using unlabeled data, it is still possible to transfer the generalization abilities of the neural net to a decision tree by using a technique called distillation (Hinton et al., 2015; Buciluǎ et al., 2006) and a type of decision tree that makes soft decisions.

The Future of Capsule Network

What's the latest progress in Hinton's work? Let's quote New York Times' story.

Hinton believes that the Capsule Network will eventually be applicable beyond computer vision and enjoy much broader prospects, including conversational computing. Hinton knows that many people are skeptical about the Capsule Network, just as many people were skeptical about the neural networks five years ago. According to the NYT report, Hinton stated his confidence in Capsule Network with the quote, "History will prove the same result, I think so".

目录
相关文章
|
机器学习/深度学习 搜索推荐 算法
Learning Disentangled Representations for Recommendation | NIPS 2019 论文解读
近年来随着深度学习的发展,推荐系统大量使用用户行为数据来构建用户/商品表征,并以此来构建召回、排序、重排等推荐系统中的标准模块。普通算法得到的用户商品表征本身,并不具备可解释性,而往往只能提供用户-商品之间的attention分作为商品粒度的用户兴趣。我们在这篇文章中,想仅通过用户行为,学习到本身就具备一定可解释性的解离化的用户商品表征,并试图利用这样的商品表征完成单语义可控的推荐任务。
23834 0
Learning Disentangled Representations for Recommendation | NIPS 2019 论文解读
|
3月前
|
机器学习/深度学习 移动开发 算法
【博士每天一篇文献-模型】Deep learning incorporating biologically inspired neural dynamics and in memory
本文介绍了一种结合生物学启发的神经动力学和内存计算的深度学习方法,提出了脉冲神经单元(SNU),该单元融合了脉冲神经网络的时间动力学和人工神经网络的计算能力,通过实验分析显示,在手写数字识别任务中,基于SNU的网络达到了与循环神经网络相似或更高的准确性。
21 1
【博士每天一篇文献-模型】Deep learning incorporating biologically inspired neural dynamics and in memory
|
3月前
|
机器学习/深度学习 算法 调度
【博士每天一篇文献-算法】Neurogenesis Dynamics-inspired Spiking Neural Network Training Acceleration
NDSNN(Neurogenesis Dynamics-inspired Spiking Neural Network)是一种受神经发生动态启发的脉冲神经网络训练加速框架,通过动态稀疏性训练和新的丢弃与生长策略,有效减少神经元连接数量,降低训练内存占用并提高效率,同时保持高准确性。
45 3
|
3月前
|
机器学习/深度学习 算法 TensorFlow
【文献学习】Analysis of Deep Complex-Valued Convolutional Neural Networks for MRI Reconstruction
本文探讨了使用复数卷积神经网络进行MRI图像重建的方法,强调了复数网络在保留相位信息和减少参数数量方面的优势,并通过实验分析了不同的复数激活函数、网络宽度、深度以及结构对模型性能的影响,得出复数模型在MRI重建任务中相对于实数模型具有更优性能的结论。
36 0
【文献学习】Analysis of Deep Complex-Valued Convolutional Neural Networks for MRI Reconstruction
|
机器学习/深度学习 编解码 算法
【5分钟 Paper】Dueling Network Architectures for Deep Reinforcement Learning
【5分钟 Paper】Dueling Network Architectures for Deep Reinforcement Learning
118 0
|
机器学习/深度学习 人工智能 算法
【5分钟 Paper】Reinforcement Learning with Deep Energy-Based Policies
【5分钟 Paper】Reinforcement Learning with Deep Energy-Based Policies
125 0
|
机器学习/深度学习 算法
Keyphrase Extraction Using Deep Recurrent Neural Networks on Twitter论文解读
该论文针对Twitter网站的信息进行关键词提取,因为Twitter网站文章/对话长度受到限制,现有的方法通常效果会急剧下降。作者使用循环神经网络(recurrent neural network,RNN)来解决这一问题,相对于其他方法取得了更好的效果。
108 0
|
机器学习/深度学习 搜索推荐 算法
SysRec2016 | Deep Neural Networks for YouTube Recommendations
YouTube有很多用户原创内容,其商业模式和Netflix、国内的腾讯、爱奇艺等流媒体不同,后者是采购或自制的电影,并且YouTube的视频基数巨大,用户难以发现喜欢的内容。本文根据典型的两阶段信息检索二分法:首先描述一种深度候选生成模型,接着描述一种分离的深度排序模型。
260 0
SysRec2016 | Deep Neural Networks for YouTube Recommendations
|
机器学习/深度学习 数据挖掘 计算机视觉
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第四章(一)
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第四章
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第四章(一)
|
机器学习/深度学习 数据挖掘 计算机视觉
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第四章(三)
CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第四章