Google's Gemini 3.5 Flash pairs frontier-level intelligence with speed at under half the price

May 19, 2026

Key Insights

Google introduced Gemini 3.5 Flash, the first in a model series combining frontier intelligence with agentic action - beating the prior 3.1 Pro on nearly all benchmarks, with a big jump on the real-world GDPVal task suite. It runs about 4x faster than other frontier models at under half the price, and is available across Google's products and APIs today. A more capable Gemini 3.5 Pro is due the following month.

Stay Updated

Get the latest insights delivered to your inbox

A frontier model tuned for speed and cost

At I/O 2026, Google introduced Gemini 3.5 Flash, the first in a new series of models built to combine frontier-level intelligence with the ability to take action. The pitch is that you no longer have to trade capability for speed or cost.

What's new

- Against the previous 3.1 Pro, the new Flash is better across almost all benchmarks, with particularly large gains in coding and a striking jump on GDPVal, a benchmark meant to capture real-world, economically valuable tasks.
- On the intelligence-versus-speed tradeoff, Google places it in a class of its own - by its measure roughly 4x faster in output tokens per second than other frontier models while remaining comparable to the best on quality.
- It delivers those capabilities at less than half the price of comparable frontier models, which Google frames as a major lever for companies burning through token budgets.

Why it matters

Google leaned hard on the economics, claiming a company processing around a trillion tokens a day could save over $1 billion annually by shifting roughly 80% of its workloads from other frontier models to 3.5 Flash. The model is already woven into Google's own development - the company says its internal AI dev tools now process more than 3 trillion tokens a day, up from half a trillion in March, creating a feedback loop that improved 3.5. Gemini 3.5 Flash is available today across Google's products and APIs, with a more capable Gemini 3.5 Pro promised the following month.

Source: blog.google

An AWS knowledge-graph deployment turned 6-month research cycles into 3 weeks - and the blueprint transfers far beyond pharma

An AWS GraphRAG deployment in pharmaceutical research cut R&D cycles by 87% - initial discovery that took six months now closes in three weeks - by fusing siloed internal databases and public literature into one queryable knowledge graph on Amazon Neptune Analytics and Bedrock (running Claude). Every answer comes with verifiable citations and a mapped reasoning path, which is exactly what regulated industries need for compliance. The architecture is modular and, crucially, transferable: any enterprise drowning in fragmented legacy data can copy this pattern.

July 9, 2026

SpaceX, Anthropic, and OpenAI listings will out-value every US VC-backed exit since 2000 - reshaping vendor economics for everyone

The new NVCA-Pitchbook Venture Monitor dropped a stunning claim: the pending OpenAI and Anthropic IPOs, together with SpaceX's listing, will generate more value than every US VC-backed exit since 2000 combined. SpaceX is already public at $1.77 trillion, and with both AI labs pushing toward trillion-dollar debuts, the trio should land north of $4 trillion - against roughly $70 billion in total US IPO proceeds last year. For anyone buying AI services, the labs' shift to public-market scrutiny will reshape pricing, transparency, and vendor stability.

July 9, 2026

A 14-person open-source team just became the default way 8.9M developers run local AI - and a lever for slashing inference bills

Ollama, the open-source tool that lets developers run open-weight AI models on their own machines in minutes, raised a $65M Series B led by Theory Ventures ($88M total), revealing it now serves 8.9 million developers monthly and sits inside 85% of the Fortune 500 - with just 14 employees. Founders Jeff Morgan and Michael Chiang previously built Docker Desktop, and they're repeating the play: abstract away the hardware pain, then monetise a cloud tier priced on GPU time rather than tokens. The backdrop is the industry's loudest cost debate: every company with heavy inference bills is under existential pressure to shift routine workloads to open models.

July 9, 2026

Key Insights

Stay Updated

A frontier model tuned for speed and cost

What's new

Why it matters

Related Articles

An AWS knowledge-graph deployment turned 6-month research cycles into 3 weeks - and the blueprint transfers far beyond pharma

SpaceX, Anthropic, and OpenAI listings will out-value every US VC-backed exit since 2000 - reshaping vendor economics for everyone

A 14-person open-source team just became the default way 8.9M developers run local AI - and a lever for slashing inference bills