CV之Image Caption:Image Caption算法的相关论文、设计思路、关键步骤相关配图之详细攻略(二)

简介: CV之Image Caption:Image Caption算法的相关论文、设计思路、关键步骤相关配图之详细攻略

3、《What value Do Explicit High Level Concepts Have in Vision to Language Problems?》


http://cn.arxiv.org/pdf/1506.01144v6      该论文使用高层语义提高了模型效果。

image.png



Abstract:   Much recent progress in Vision-to-Language (V2L) problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). This approach does not explicitly represent high-level semantic concepts, but rather seeks to progress directly from image features to text. In this paper we investigate whether this direct approach succeeds due to, or despite, the fact that it avoids the explicit representation of high-level information. We propose a method of incorporating high-level concepts into the successful CNN-RNN approach, and show that it achieves a significant improvement on the state-of-the-art in both image captioning and visual question answering. We also show that the same mechanism can be used to introduce external semantic information and that doing so further improves performance. We achieve the best reported results on both image captioning and VQA on several benchmark datasets, and provide an analysis of the value of explicit high-level concepts in V2L problems.

image.png



4、《Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation》


https://www.cv-foundation.org/openaccess/content_cvpr_2015/app/2A_022_ext.pdf


image.png


       A good image description is often said to “paint a picture in your mind’s eye.” The creation of a mental image may play a significant role in sentence comprehension in humans [3]. In fact, it is often this mental image that is remembered long after the exact sentence is forgotten [5, 7]. As an illustrative example, Figure 1 shows how a mental image may vary and increase in richness as a description is read. Could computer vision algorithms that comprehend and generate image captions take advantage of similar evolving visual representations? Recently, several papers have explored learning joint feature spaces for images and their descriptions [2, 4, 9]. These approaches project image features and sentence features into a common space, which may be used for image search or for ranking image captions. Various approaches were used to learn the projection, including Kernel Canonical Correlation Analysis (KCCA) [2], recursive neural networks [9], or deep neural networks [4]. While these approaches project both semantics and visual features to a common embedding, they are not able to perform the inverse projection. That is, they cannot generate novel sentences or visual depictions from the embedding.




5、《From Captions to Visual Concepts and Back》


https://arxiv.org/pdf/1411.4952v2.pdf


image.png


Abstract:This paper presents a novel approach for automatically generating image descriptions: visual detectors and language models learn directly from a dataset of image captions. We use Multiple Instance Learning to train visual detectors for words that commonly occur in captions, including many different parts of speech such as nouns, verbs, and adjectives. The word detector outputs serve as conditional inputs to a maximum-entropy language model. The language model learns from a set of over 400,000 image descriptions to capture the statistics of word usage. We capture global semantics by re-ranking caption candidates using sentence-level features and a deep multimodal similarity model. When human judges compare the system captions to ones written by other people, the system captions have equal or better quality over 23% of the time.



image.png





相关文章
|
5月前
|
算法 程序员 数据处理
【数据结构与算法】使用单链表实现队列:原理、步骤与应用
【数据结构与算法】使用单链表实现队列:原理、步骤与应用
|
5月前
|
存储 算法 编译器
【数据结构与算法】使用数组实现栈:原理、步骤与应用
【数据结构与算法】使用数组实现栈:原理、步骤与应用
|
6月前
|
机器学习/深度学习 算法
【算法 | 实验7】以最小的步骤收集所有硬币(算法正确性还没想清楚)
题目 最小步骤收集硬币 有许多相邻排列的硬币堆。我们需要以最少的步骤收集所有这些硬币,在一个步骤中,我们可以收集一个水平线的硬币或垂直线的硬币,收集的硬币应该是连续的。 输入描述 输入第一行整数N表示硬币堆的数量
78 0
|
6月前
|
机器学习/深度学习 算法 数据可视化
【机器学习】描述K-means算法的步骤
【5月更文挑战第11天】【机器学习】描述K-means算法的步骤
|
6月前
|
算法 安全
死锁相关知识点以及银行家算法(解题详细步骤)
死锁相关知识点以及银行家算法(解题详细步骤)
304 2
|
算法 测试技术 计算机视觉
2023年秋招算法面经:Tp-link cv图像算法面经
2023年秋招算法面经:Tp-link cv图像算法面经
77 0
|
6月前
|
存储 算法
图解Kmp算法——配图详解(超级详细)
图解Kmp算法——配图详解(超级详细)
|
机器学习/深度学习 传感器 人工智能
CV面试题目总结(二) - 深度学习算法
CV面试题目总结(二) - 深度学习算法
278 0
|
算法 测试技术 图形学
2023秋招算法提前批:小红书CV图像渲染算法岗面经
2023秋招算法提前批:小红书CV图像渲染算法岗面经
99 0
|
算法 C语言
【数学建模系列】TOPSIS法的算法步骤及实战应用——MATLAB实现
客观评价方法中的一种,亦称为理想解法,是一种有效的多指标评价方法。这种方法通过构造评价问题的正理想解和负理想解,即各指标的最优解和最劣解,通过计算每个方案到理想方案的相对贴近度,即靠近止理想解和远离负理想解的程度,来对方案进行排序,从而选出最优方案。
【数学建模系列】TOPSIS法的算法步骤及实战应用——MATLAB实现
下一篇
无影云桌面