From predicting text to simulating reality
Google introduced Gemini Omni, a model designed to generate samples in any output modality from any input - part of a broader shift the company describes as AI moving from predicting text to simulating reality through world models.
What it does
- Omni combines Gemini's reasoning with Google's generative media models, which Google frames as a significant step forward in world understanding.
- The first model in the family, Gemini Omni Flash, starts with video outputs, with image and text generation to be enabled over time.
- It's available starting now in the Gemini app, Google Flow, and YouTube Shorts, with rollout to developers and enterprise customers via APIs in the coming weeks.
The bigger picture
Omni sits alongside Google's other world-simulation work shown at I/O - including Project Genie, which generates explorable real-world places - and reflects a strategic bet that unifying intelligence with generative media is the next frontier. Combined with the breakout success of Google's Nano Banana image models (more than 50 billion images generated to date), it underscores how central generative media has become to Google's roadmap. The natural question Omni raises, as with any high-quality generative video, is provenance - which is why Google paired its I/O media news with an expansion of SynthID watermarking and Content Credentials, now joined by partners including OpenAI, Kakao, and ElevenLabs.
