Adversarial Text to Continuous Image Generation

King Abdullah University of Science and Technology
HyperCGAN teaser image.

The red rectangles indicate the resolution boundaries that our HyperCGAN model was trained. By design, our model can synthesize meaningful pixels at surrounding (x,y) coordinates beyond these boundaries without any explicit training.

Abstract

Existing GAN-based text-to-image models treat images as 2D pixel arrays. In this paper, we approach the text-to-image task from a different perspective, where a 2D image is represented as an implicit neural representation (INR). We show that straightforward conditioning of the unconditional INR-based GAN method on text inputs is not enough to achieve good performance. We propose a word-level attention-based weight modulation operator that controls the generation process of INR-GAN based on hypernetworks. Our experiments on benchmark datasets show that HyperCGAN achieves competitive performance to existing pixel-based methods and retains the properties of continuous generative models.

Architecture

HyperCGAN architecture image.

The architecture of the proposed HyperCGAN: Linear layers are used as hypernetworks. Overall, given text embeddings and noise vector, hypernetworks generate parameters for modulating weights of INR-based decoder.

Qualitative Results

Qualitative Results

Qualitative results on three datasets: MS-COCO 2562, CUB 2562, and ArtEmis 2562.

Sensitivity to Words

Qualitative Results

Here, the input noise z is kept fixed while varying color names in the prompt ”a small {color}, bird with white and dark gray wingbars and white breast and long tail”, aiming to assess the model’s sensitivity to word-level modulation.

Continous Properties

Qualitative Results

the capability of our models in terms of continuous image synthesis: extrapolation and superresolution.

Attention Maps

Qualitative Results

Attention maps per modulating word-based weights.

BibTeX

@InProceedings{Haydarov_2024_CVPR,
    author    = {Haydarov, Kilichbek and Muhamed, Aashiq and Shen, Xiaoqian and Lazarevic, Jovana and Skorokhodov, Ivan and Galappaththige, Chamuditha Jayanga and Elhoseiny, Mohamed},
    title     = {Adversarial Text to Continuous Image Generation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {6316-6326}
}