A frontier model tuned for speed and cost
At I/O 2026, Google introduced Gemini 3.5 Flash, the first in a new series of models built to combine frontier-level intelligence with the ability to take action. The pitch is that you no longer have to trade capability for speed or cost.
What's new
- Against the previous 3.1 Pro, the new Flash is better across almost all benchmarks, with particularly large gains in coding and a striking jump on GDPVal, a benchmark meant to capture real-world, economically valuable tasks.
- On the intelligence-versus-speed tradeoff, Google places it in a class of its own - by its measure roughly 4x faster in output tokens per second than other frontier models while remaining comparable to the best on quality.
- It delivers those capabilities at less than half the price of comparable frontier models, which Google frames as a major lever for companies burning through token budgets.
Why it matters
Google leaned hard on the economics, claiming a company processing around a trillion tokens a day could save over $1 billion annually by shifting roughly 80% of its workloads from other frontier models to 3.5 Flash. The model is already woven into Google's own development - the company says its internal AI dev tools now process more than 3 trillion tokens a day, up from half a trillion in March, creating a feedback loop that improved 3.5. Gemini 3.5 Flash is available today across Google's products and APIs, with a more capable Gemini 3.5 Pro promised the following month.
