Versatile Diffusion
We built Versatile Diffusion (VD), the first unified multi-flow multimodal diffusion framework, as a step towards Universal Generative AI. VD can natively support image-to-text, image-variation, text-to-image, and text-variation, and can be further extended to other applications such as semantic-style disentanglement, image-text dual-guided generation, latent image-to-text-to-image editing, and more. Future versions will support more modalities such as speech, music, video and 3D.
Xingqian Xu, Atlas Wang, Eric Zhang, Kai Wang, and Humphrey Shi [arXiv] [GitHub]
  Description: Generate image from text prompt.
Examples
Text Input | Seed |
---|
  Description: Generate image conditioned on reference image.
0 1
0 1
  Fidelity: How likely the output image looks like the referece image (0-dislike (default), 1-same).
  Focus: What the output image should focused on (0-semantic, 0.5-balanced (default), 1-style).
Examples
Image Input | Fidelity (Dislike -- Same) | Focus (Semantic -- Style) | Color Adjustment | Seed |
---|
  Description: Generate text from reference image.
Examples
Image Input | Seed |
---|
  Description: Generate text from reference text prompt.
Examples
Text Input | Seed |
---|