2022年最新文本生成图像研究开源工作速览（Papers with code）

2022-08-23 978

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 2022年最新文本生成图像研究开源工作速览（Papers with code）

@[TOC](Papers with code)

这篇博文将简要介绍一些已经开源的文本生成图像研究工作，基本上都是2022年的最新研究成果：

1、DALL-E 2

《Hierarchical Text-Conditional Image Generation with CLIP Latents》

OpenAI的最新工作，目前是文本到图像的 SOTA

论文：https://cdn.openai.com/papers/dall-e-2.pdf

代码：https://github.com/lucidrains/DALLE2-pytorch（非官方）

2、Recurrent Affine Transformation for Text-to-image Synthesis

《Recurrent Affine Transformation for Text-to-image Synthesis》

提出了一种用于生成对抗网络的递归仿射变换 (RAT)，将所有融合块与递归神经网络连接起来，以模拟它们的长期依赖关系，跟DF-GAN很类似。

论文：https://arxiv.org/pdf/2204.10482.pdf

代码：https://github.com/senmaoy/Recurrent-Affine-Transformation-for-Text-to-image-Synthesis

3、Vector Quantized Diffusion Model for Text-to-Image Synthesis

《Vector Quantized Diffusion Model for Text-to-Image Synthesis》

第一次把矢量量化扩散 (VQ-Diffusion) 模型用于文本到图像生成，与之前基于 GAN 的文本到图像的方法相比，VQ-Diffusion 可以处理更复杂的场景并大幅提高合成图像的质量。

会议：CVPR 2022

论文：https://arxiv.org/abs/2111.14822

代码：https://github.com/microsoft/vq-diffusion

4、Autoregressive Image Generation using Residual Quantization

《Autoregressive Image Generation using Residual Quantization》

由残差量化 VAE (RQ-VAE) 和 RQ-Transformer 组成的两阶段框架生成高分辨率图像。RQ-VAE 可以精确地逼近图像的特征图，并将图像表示为离散码的堆叠图。然后，RQ-Transformer 通过预测下一个代码栈来学习预测下一个位置的量化特征向量。

会议：CVPR 2022

论文：https://arxiv.org/abs/2203.01941

代码：https://github.com/kakaobrain/rq-vae-transformer

5、LAFITE

《LAFITE: Towards Language-Free Training for Text-to-Image Generation》

第一次提出在没有任何文本数据的情况下训练文本到图像生成模型的工作，利用了强大的预训练 CLIP 模型。

会议：CVPR 2022

论文：https://arxiv.org/abs/2111.13792

代码：https://github.com/drboog/Lafite

6、DF-GAN

《DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis》

抛弃了传统GAN的堆叠式结构，采用了单级主干，生成器中引入一种新颖的深度文本图像融合块，包含了仿射块的结构，鉴别器引入匹配感知梯度惩罚和单向输出。

会议：CVPR 2022

论文：https://arxiv.org/abs/2008.05865

代码：https://github.com/tobran/DF-GAN

精读：https://blog.csdn.net/air__Heaven/article/details/124288473

7、Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

《Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors》

正在进行的工作，引入了几个新功能：（i）场景编辑，（ii）带有锚场景的文本编辑，（iii）克服分布式文本提示，以及（iv）故事插图生成（即由故事生成插图）

论文：https://arxiv.org/abs/2203.13131

代码：https://github.com/CasualGANPapers/Make-A-Scene

8、DALL-Eval：Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers,

《Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers》

研究了文本到图像生成转换器的推理能力和社会偏见。第一测量了四种视觉推理技能：物体识别、物体计数、颜色识别和空间关系理解。提出了 PaintSkills诊断数据集和评估工具包，用于测量这四种视觉推理技能。第二，基于预训练的图像字幕、图像文本检索和图像分类模型来测量生成图像的文本对齐和质量。第三，评估了模型中的社会偏见

论文：https://arxiv.org/abs/2202.04053

代码：https://github.com/j-min/DallEval

2022年最新文本生成图像研究开源工作速览（Papers with code）

1、DALL-E 2

2、Recurrent Affine Transformation for Text-to-image Synthesis

3、Vector Quantized Diffusion Model for Text-to-Image Synthesis

4、Autoregressive Image Generation using Residual Quantization

5、LAFITE

6、DF-GAN

7、Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

8、DALL-Eval：Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers,

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

2022年最新文本生成图像研究 开源工作速览（Papers with code）

1、DALL-E 2

2、Recurrent Affine Transformation for Text-to-image Synthesis

3、Vector Quantized Diffusion Model for Text-to-Image Synthesis

4、Autoregressive Image Generation using Residual Quantization

5、LAFITE

6、DF-GAN

7、Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

8、DALL-Eval：Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers,

热门文章

最新文章

相关课程

相关电子书

相关实验场景

2022年最新文本生成图像研究开源工作速览（Papers with code）