Abstract
As a subtopic of text-to-image synthesis, text-to-face generation has great potential in face-related applications. In this paper, we propose a generic text-to-face framework, namely, TextFace, to achieve diverse and high-quality face image generation from text descriptions. We introduce text-to-style mapping, a novel method where the text description can be directly encoded into the latent space of a pretrained StyleGAN. Guided by our text-image similarity matching and face captioning-based text alignment, the textual latent code can be fed into the generator of a well-trained StyleGAN to produce diverse face images with high resolution (1024×1024). Furthermore, our model inherently supports semantic face editing using text descriptions. Finally, experimental results quantitatively and qualitatively demonstrate the superior performance of our model.
Original language | English |
---|---|
Pages (from-to) | 3409-3419 |
Number of pages | 11 |
Journal | IEEE Transactions on Multimedia |
Volume | 25 |
DOIs | |
Publication status | Published - 2023 |
Externally published | Yes |
Keywords
- GANs
- cross modal
- text-guided semantic face manipulation
- text-to-face generation
- text-to-image generation
ASJC Scopus subject areas
- Signal Processing
- Electrical and Electronic Engineering
- Media Technology
- Computer Science Applications