CV之Image Caption：Image Caption算法的相关论文、设计思路、关键步骤相关配图之详细攻略（一）-阿里云开发者社区

CV之Image Caption：Image Caption算法的相关论文、设计思路、关键步骤相关配图之详细攻略（一）

2021-10-29 204

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： CV之Image Caption：Image Caption算法的相关论文、设计思路、关键步骤相关配图之详细攻略

IC算法的相关论文

1、《Show and Tell: A Neural Image Caption Generator》

https://arxiv.org/pdf/1411.4555.pdf 该论文中的Encoder结构，修改为CNN 以用于Image Caption。

Abstract：Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. Our model is often quite accurate, which we verify both qualitatively and quantitatively. For instance, while the current state-of-the-art BLEU-1 score (the higher the better) on the Pascal dataset is 25, our approach yields 59, to be compared to human performance around 69. We also show BLEU-1 score improvements on Flickr30k, from 56 to 66, and on SBU, from 19 to 28. Lastly, on the newly released COCO dataset, we achieve a BLEU-4 of 27.7, which is the current state-of-the-art.

2、《Show, Attend and Tell: Neural Image Caption Generation with Visual Attention》

https://arxiv.org/pdf/1502.03044v1.pdf 该论文又进一步引入了注意力机制。

Abstract：Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images. We describe how we can train this model in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound. We also show through visualization how the model is able to automatically learn to fix its gaze on salient objects while generating the corresponding words in the output sequence. We validate the use of attention with state-of-theart performance on three benchmark datasets: Flickr8k, Flickr30k and MS COCO.

CV之Image Caption：Image Caption算法的相关论文、设计思路、关键步骤相关配图之详细攻略（一）

IC算法的相关论文

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

CV之Image Caption：Image Caption算法的相关论文、设计思路、关键步骤相关配图之详细攻略（一）

IC算法的相关论文

热门文章

最新文章

相关课程

相关电子书

相关实验场景