Stability AI takes on OpenAI Dall-E 3 with the new Stable Diffusion 3.0 model

Stability AI has unveiled Stable Diffusion 3, its latest text-to-image generation model. The company claims the new iteration offers improved performance in handling complex prompts, generating higher-quality images, and correcting spelling errors.

Currently, Stable Diffusion 3 is not publicly available. However, Stability AI has opened a waitlist for an early preview program, allowing users to test the model and provide feedback before its official release.

Regarding scalability, the model comes in various sizes, ranging from 800 million to 8 billion parameters. This approach aims to cater to user needs and hardware capabilities, potentially making the technology more accessible.

Stability Diffusion 3 combines diffusion transformer architecture and flow matching techniques. A detailed technical report will provide further information about the model’s inner workings.

The Stable Diffusion 3 suite of models currently range from 800M to 8B parameters. This approach aims to align with our core values and democratize access, providing users with various options for scalability and quality to best meet their creative needs.

Stability AI has also emphasized its commitment to responsible AI development. The company has implemented safeguards to prevent misuse and collaborates with experts to ensure the model’s safe and ethical deployment. We cannot afford another Taylor Swift saga.

Additionally, in the context of recent advancements in text-to-image generation, the ongoing competition in this field is worth noting. Companies like OpenAI, with its DALL-E 3 model, woke Google Gemini, and Midjourney (which is in talks with X), are also actively developing and refining their text-to-image capabilities.

Overall, Stable Diffusion 3 hopefully presents itself as a potentially significant advancement in text-to-image generation technology, but it is yet to be seen.

More here.