Muse, a New Text-To-Image Transformer Model Unveiled by Google AI
In a research paper, Google AI describes Muse, a new Text-To-Image Generation via Masked Generative Transformers that can generate photos at a rate that is much faster while maintaining a level of quality that is comparable to that of rival models like the DALL-E 2 and Imagen.
With the help of the text embedding from a sizable language model that has already been trained, Muse is taught to anticipate randomly masked image tokens. Masked modeling in discrete token space is part of this job. Instead of using pixel-space diffusion or autoregressive models, Muse creates visuals using a 900 million parameter model called a masked generative transformer.
The diffusion model used by Google, which they claim offers a “unprecedented degree of photorealism” and a “deep level of language understanding,” can create a 256 by 256 image in as little as 0.5 seconds using a TPUv4 chip, as opposed to 9.1 seconds using Imagen. Tensor Processing Units (TPUs) are proprietary chips designed by Google as specialized AI accelerators.
The study found that conditioning on a pre-trained large language model is essential for producing photorealistic, high-quality images. Google AI trained a series of Muse models with varying sizes, ranging from 632 million to 3 billion parameters.
Muse is more than 10 times faster at inference time than the Imagen-3B or Parti-3B models and is three times faster than Stable Diffusion v1.4 based on tests using equivalent hardware. Muse also outperforms Parti, a cutting-edge autoregressive model.
As nouns, verbs, and adjectives are present in the input captions, Muse creates images that correspond to these words. Furthermore, it demonstrates understanding of both visual aesthetic and multi-object characteristics like compositionality and cardinality.
In recent years, generative image models have come a long way thanks to new training methods and better deep learning architectures. Because they can make very realistic and detailed images, these models are becoming more useful tools for a wide range of industries and applications.