A new benchmark suggests 'agentic' AI still struggles with real workraising the bar for enterprise adoption claims

January 22, 2026

Key Insights

A new benchmark from Mercor suggests AI agents still fall short on practical workplace tasks, despite major progress in planning and research. The takeaway for buyers is to demand measurable task success rates, tooling integration, and guardrailsbecause 'agentic' marketing can outrun real operational readiness.

Stay Updated

Get the latest insights delivered to your inbox

AI agents talk a big gamethis benchmark suggests the day job is still messy

The agent narrative is seductive: give a model tools, let it plan, and it will do knowledge work. But production work is full of edge cases, ambiguous requirements, and systems that don't behave like clean APIs.

Why benchmarks like this matter

- They pressure vendors to show task completion, not just impressive traces.
- They help separate 'agents that can plan' from 'agents that can finish.'

The gap between a good demo and a usable coworker

Workplace usefulness depends on things agents routinely struggle with:
- Handling partial information without hallucinating missing details.
- Recovering from errors when a tool call fails or returns unexpected formats.
- Knowing when to stop and ask a human a clarifying question.

What this means for enterprises deploying agents in 2026

- Treat agents as workflow components, not autonomous employees.
- Invest in guardrails: approvals, logging, and constraints on what the agent can change.
- Measure success like you would any automation: completion rate, time saved, failure modes, and escalation cost.

The opportunity hiding inside the skepticism

This doesn't kill agents. It clarifies what needs building:
- Better tool interfaces, more deterministic action layers, and tighter integration with business systems.
- Evaluation harnesses that mirror real ops, not toy tasks.

If your roadmap assumes agents will 'replace roles' soon, this is a reminder to get specific. The companies that win won't be the ones with the most agent hypethey'll be the ones that make agents reliable in the unglamorous corners of real work.

Source: techcrunch.com

A new video model triggers fresh IP and labor anxiety as studios brace for faster synthetic production

Seedance 2.0 lands as another step-change in text-to-video capability, reigniting concerns over training data provenance, likeness rights, and labor displacement. Studios and unions are effectively asking: who gets paid when models learn from decades of film language?

February 15, 2026

Glean bets the next enterprise battleground is the AI layer that orchestrates every app, not the apps themselves

Glean is positioning itself as an enterprise AI control plane that sits under chat interfaces, wiring LLMs into search, knowledge, and workflows across SaaS. The bet: whoever owns the permissions-aware retrieval + action layer becomes the default entry point for workregardless of which LLM is popular this quarter.

February 15, 2026

AI-driven 'software disruption' fear is spilling into broader marketsand leadership narratives are getting muddier

A fresh wave of AI automation anxiety is rattling public markets, with investors questioning which software and services firms can defend margins as models become more agentic. The story here isn't 'AI is big' it's that the boundary between app and model is blurring, forcing a repricing of incumbents' moats.

February 15, 2026

Key Insights

Stay Updated

AI agents talk a big gamethis benchmark suggests the day job is still messy

Why benchmarks like this matter

The gap between a good demo and a usable coworker

What this means for enterprises deploying agents in 2026

The opportunity hiding inside the skepticism

Related Articles

A new video model triggers fresh IP and labor anxiety as studios brace for faster synthetic production

Glean bets the next enterprise battleground is the AI layer that orchestrates every app, not the apps themselves

AI-driven 'software disruption' fear is spilling into broader marketsand leadership narratives are getting muddier