StackGAN 论文阅读笔记（一）-阿里云开发者社区

StackGAN 论文阅读笔记（一）

2023-02-23 101

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： StackGAN 论文阅读笔记（一）

论文结构

1.Introduction

2.Related Work

3.Stacked Generative Adversarial

Networks

3.1 Preliminaries

3.2 Conditioning Augmentation

3.3 Stage-I GAN

3.4 Stage-II GAN

3.5 Implementation details

4.Experiments

4.1 Datasets and evaluation metrics

4.2 Quantitative and qualitative results

4.3 Component analysis

5.Conclusions

摘要

原文

Synthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications. Samples generated by existing text-to-image approaches can roughly reflect the meaning of the given descriptions, but they fail to contain necessary details and vivid object parts. In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) to generate 256x256 photo-realistic images conditioned on text descriptions. We decompose the hard problem into more manageable sub-problems through a sketch-refinement process. The Stage-I GAN sketches the primitive shape and colors of the object based on the given text description, yielding Stage-I low-resolution images. The Stage-II GAN takes Stage-I results and text descriptions as inputs, and generates high-resolution images with photo-realistic details. It is able to rectify defects in Stage-I results and add compelling details with the refinement process. To improve the diversity of the synthesized images and stabilize the training of the conditional-GAN, we introduce a novel Conditioning Augmentation technique that encourages smoothness in the latent conditioning manifold. Extensive experiments and comparisons with state-of-the-arts on benchmark datasets demonstrate that the proposed method achieves significant improvements on generating photo-realistic images conditioned on text descriptions.

核心

现有文本到图像方法生成的样本，可以大致表达出给定的文本含义，但是图像细节和质量不佳

StackGAN能基于文本描述，生成256*256分辨率的照片级图像

把问题进行了分解，采用草图绘制-精细绘制两阶段过程

阶段1的GAN根据给定的文本描述，来绘制对象的原始形状和颜色；阶段2的GAN使用文本描述和阶段1的输出来作为输入，通过纠正草图中的缺陷和细节生成，来最终得到更高分辨率的图像

还提出了一种条件增强方法，能够增强潜在条件流形的平滑性

大量实验表明，以上方法在以文本描述为条件的照片级图像生成上取得了显著进步

研究背景

Research background

Energy-Based(EB) GAN

•

将判别器视作一个energy function，函数值（非负）越小代表data越可能是真实数据

•

使用自编码作为判别器（energy function）

•

判别器可以单独使用真实数据进行提前的预训练

•

可以基于ImageNet数据集训练，生成256*256分辨率的图片

文本生成图像

• VAE

• DRAW(Deep Recurrent Attention Writer)

•使用循环神经网络+注意力机制

•依次生成一个个对象叠加在一起得到最终结果

• GAN

在生成器中，text embedding跟随机噪声融合后一起输入到生成网络中

鉴别器会对错误情况进行分类，一种是生成的fake图像匹配了正确的文本，另一种是真实图像但匹配了错误文本

StackGAN 论文阅读笔记（一）

论文结构

摘要

原文

核心

研究背景

Energy-Based(EB) GAN

文本生成图像

热门文章

最新文章

相关课程

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

StackGAN 论文阅读笔记（一）

论文结构

摘要

原文

核心

研究背景

Energy-Based(EB) GAN

文本生成图像

热门文章

最新文章

相关课程

相关电子书