Vivold Consulting

A new benchmark suggests 'agentic' AI still struggles with real workraising the bar for enterprise adoption claims

Key Insights

A new benchmark from Mercor suggests AI agents still fall short on practical workplace tasks, despite major progress in planning and research. The takeaway for buyers is to demand measurable task success rates, tooling integration, and guardrailsbecause 'agentic' marketing can outrun real operational readiness.

Stay Updated

Get the latest insights delivered to your inbox

AI agents talk a big gamethis benchmark suggests the day job is still messy

The agent narrative is seductive: give a model tools, let it plan, and it will do knowledge work. But production work is full of edge cases, ambiguous requirements, and systems that don't behave like clean APIs.

Why benchmarks like this matter


- They pressure vendors to show task completion, not just impressive traces.
- They help separate 'agents that can plan' from 'agents that can finish.'

The gap between a good demo and a usable coworker


Workplace usefulness depends on things agents routinely struggle with:
- Handling partial information without hallucinating missing details.
- Recovering from errors when a tool call fails or returns unexpected formats.
- Knowing when to stop and ask a human a clarifying question.

What this means for enterprises deploying agents in 2026


- Treat agents as workflow components, not autonomous employees.
- Invest in guardrails: approvals, logging, and constraints on what the agent can change.
- Measure success like you would any automation: completion rate, time saved, failure modes, and escalation cost.

The opportunity hiding inside the skepticism


This doesn't kill agents. It clarifies what needs building:
- Better tool interfaces, more deterministic action layers, and tighter integration with business systems.
- Evaluation harnesses that mirror real ops, not toy tasks.

If your roadmap assumes agents will 'replace roles' soon, this is a reminder to get specific. The companies that win won't be the ones with the most agent hypethey'll be the ones that make agents reliable in the unglamorous corners of real work.

Related Articles

Salesforce Unveils AI-Powered Slack Makeover with 30 New Features

Salesforce has announced a major update to Slack, introducing over 30 new AI-driven features aimed at enhancing workplace productivity and collaboration. Key enhancements include: - Advanced Slackbot capabilities for drafting content, summarizing conversations, and answering queries. - Integration with Salesforce CRM and third-party apps to provide context-aware assistance. - Proactive recommendations during video calls, such as surfacing relevant Salesforce records when key names are mentioned.

Salesforce Ramps Up Agentic AI Research with New Foundry Project

Salesforce has launched the AI Foundry, a new initiative aimed at accelerating agentic AI research and development. The project focuses on: - Bridging foundational research and product innovation through collaboration with strategic customers and academic partners. - Developing AI tools for high-impact enterprise areas, including simulated environments for testing AI agents and enhancing solutions like Agentforce Voice. - Exploring ambient intelligence to provide proactive, context-aware assistance without constant user input.

VHA Deploys Salesforce-Powered Agentic Operating System, Saving Thousands of Staff Hours for Front-Line Veteran Care

The Veterans Health Administration (VHA) has implemented a Salesforce-powered agentic operating system, resulting in significant operational efficiencies. Key outcomes include: - Transitioning from static reporting to automated problem-solving, eliminating administrative silos. - Freeing thousands of staff hours, allowing more focus on direct Veteran support. - Creating a connected performance management layer, enhancing care delivery across facilities.