Vivold Consulting

Gemini Omni is Google's any-modality-in, any-modality-out model, starting with video

Key Insights

Google announced Gemini Omni, a new model family capable of generating output in any modality from any input - combining Gemini's intelligence with Google's generative media models in what it frames as a leap in world understanding. The first release, Gemini Omni Flash, starts with video outputs (with image and text to follow) and is available today in the Gemini app, Google Flow, and YouTube Shorts. API access for developers and enterprises follows in the coming weeks.

Stay Updated

Get the latest insights delivered to your inbox

From predicting text to simulating reality

Google introduced Gemini Omni, a model designed to generate samples in any output modality from any input - part of a broader shift the company describes as AI moving from predicting text to simulating reality through world models.

What it does

- Omni combines Gemini's reasoning with Google's generative media models, which Google frames as a significant step forward in world understanding.
- The first model in the family, Gemini Omni Flash, starts with video outputs, with image and text generation to be enabled over time.
- It's available starting now in the Gemini app, Google Flow, and YouTube Shorts, with rollout to developers and enterprise customers via APIs in the coming weeks.

The bigger picture

Omni sits alongside Google's other world-simulation work shown at I/O - including Project Genie, which generates explorable real-world places - and reflects a strategic bet that unifying intelligence with generative media is the next frontier. Combined with the breakout success of Google's Nano Banana image models (more than 50 billion images generated to date), it underscores how central generative media has become to Google's roadmap. The natural question Omni raises, as with any high-quality generative video, is provenance - which is why Google paired its I/O media news with an expansion of SynthID watermarking and Content Credentials, now joined by partners including OpenAI, Kakao, and ElevenLabs.

Related Articles

A US export order pulled Anthropic's top models offline worldwide, igniting an AI-sovereignty backlash

A US export-control directive forced Anthropic to abruptly disable Fable 5 and Mythos 5 for all foreign nationals on June 13, just four days after launch - briefly cutting off even its own overseas staff. Washington cited a jailbreak vulnerability; Anthropic disputed its severity but had to pull global access because it couldn't filter users by nationality in real time. Europe and Canada reacted with alarm, treating it as proof that frontier-AI access can be switched off by a single government overnight.

Huawei's agent-native HarmonyOS 7 moves into the China AI gap Apple can't fill

Four days after Apple confirmed Siri AI won't launch in China, Huawei unveiled HarmonyOS 7, restructuring the OS around an agent-native architecture it calls the beginning of the agent era. Its assistant Xiaoyi, rebuilt as a system-level agent, now drives 2,100+ system capabilities and coordinates 2,000+ third-party AI agents, atop the upgraded openPangu foundation model. With HarmonyOS already past iOS in China's smartphone share, independence forced by US sanctions has become a structural advantage in the one market Apple can't reach at the AI level.

US government orders Anthropic to pull its most powerful models, citing national security

The US government issued an export-control directive forcing Anthropic to immediately disable Fable 5 and Mythos 5 for all customers, citing national security and a reported jailbreak. Anthropic is complying but disputes the basis, arguing the cited technique surfaces only minor, already-known vulnerabilities that rival models can find without any bypass. Every other Claude model remains unaffected and available.