Why Multi-Object Generation Fails: A Deep Dive into Attention Mechanisms
Investigating Limitations in Stable Diffusion
Why do objects disappear, blend, or appear awkwardly when using multi-object prompts in diffusion models?
Introduction
Recent advances in Text-to-Image (T2I) generation are remarkable.
A simple prompt like "A photo of a cat, dog, rabbit, eagle and horse" can generate high-resolution, photo-realistic images.