The agent question is the wrong question

The most common opening question we get on a discovery call in 2026 is some version of: should we build an agent?

It is the wrong question. We say so as kindly as possible, because the buyer asking it is usually a thoughtful person reading the right things, doing the right kind of homework. The question is wrong because it skips the harder one: the one whose answer decides whether the engagement ships and still works in month four.

The harder question is: is this workflow worth automating, and where is the human-in-the-loop boundary that bounds the cost of being wrong?

Most agent failures are not failures of the agent. They are failures of the framing that produced the agent.

What “should we build an agent?” does to a procurement conversation

It narrows the buy decision before it is clear whether automation is the right answer at all. The phrasing presupposes the answer (yes, an agent) and turns the rest of the engagement into a vendor selection problem. Which platform? Which framework? Which orchestration library? Buy or build? Six weeks vanish into a comparison matrix.

What that conversation skips is a question that costs nothing to ask: how often does this workflow happen, what does it cost when it goes wrong, and what does “going wrong” actually look like? If the workflow happens twelve times a year, the answer is probably no automation at all. If it happens 800 times a month and a wrong answer costs $20, the answer is probably a 50-line script and a queue. The agent shape, with its multi-step deliberation, multiple tools, and ability to recover from intermediate failures, only earns its complexity in a specific subset of cases.

Anthropic’s Building Effective Agents post from December 2024 put this plainly. The taxonomy distinguishes a workflow (a system where the LLM and tools are orchestrated through predefined code paths) from an agent (a system where the LLM dynamically directs its own tool use and process). Most production deployments, and most of the ones still running in month four, are workflows. The buyers who arrive asking about agents are usually describing workflows.

Where the boundary actually goes

The second half of the right question, where is the human-in-the-loop boundary, is the part most teams underweight, and it is the part that shows up later.

A bounded failure is one whose worst outcome is recoverable: a wrong summarization, a misclassified ticket, a draft email the person reviews before sending. An unbounded failure is one that propagates: a sent wire, a posted message, a contract redline that someone signed assuming it had been reviewed. In computer-use systems, the question of where the human reviews is the design decision, and everything else is downstream. That holds for Claude Code and Claude for Chrome on the Anthropic side, for OpenAI’s Codex app and Atlas browser, for Perplexity’s Comet browser, and for the open-source OpenCUA project.

The teams who get this right ask the question early: which of the agent’s actions are reversible, which are not, and where do we put the gate? The teams who get it wrong ask it after the first expensive incident. The difference is not technical sophistication. It is whether the boundary was scoped before the system was built or after.

What changes when you reframe

Three things, roughly in the order you notice them.

The scoping conversation gets faster. Instead of should we build an accounting agent? it becomes what step in your accounts-payable process happens 800 times a month, has a checkable output, and costs us 30 seconds of reviewer time per instance? The agent-shaped question pulls you into vendor selection, framework comparison, and architecture review: six weeks of forward motion before anything ships. The workflow-shaped question takes one meeting.

The system you end up shipping gets smaller. You usually do not need an agent. You need a workflow with one tool call, a clear input contract, and an audit log. You ship it in two weeks. You evaluate it in four. You decide whether to extend it in six. The shape of the work is closer to a data pipeline with one LLM step in the middle (extract, transform, load, with the model doing the transform) than to anything Hollywood means by agent.

The cost of being wrong stays bounded. You put the human in the obvious place. You log enough that you can re-run the failure cases later. You do not deploy a system that can take an irreversible action, on its own, before you have any history with how it behaves on edge cases. This is not a controversial position. The teams shipping production agentic work converged on it years ago. The buyers asking about agents in 2026 have not always caught up.

Why the question persists

Two reasons. The first is that agent is genuinely useful as a category: there are systems that earn the label and the complexity, and the taxonomy work of the past two years has made the word more useful, not less. The second is that the marketing energy around the word vastly exceeds the share of production deployments that need it. In conferences, vendor pitches, and board updates, the word agent signals seriousness and forward motion. Workflow sounds like 2010.

We are not against building agents. We have built several. We have written about what the word actually means in 2026, the failure modes, and the evaluation checklist that distinguishes the systems that work from the ones that look like they should. The position here is narrower: the question that opens a procurement conversation should not be should we build an agent? It should be a question that ends in a workflow definition, not a vendor name.

What we ask instead

When the agent question comes up, we redirect. The conversation that follows is some version of:

What is the highest-frequency, most-rule-based step in the work this team currently does manually?
What does the input look like, and how often is it the same shape?
What does success look like, in terms a person could check?
What does failure cost, and what is the worst version of failure?
Where does the person already check the output, and could they keep checking it?

If those answers point to a workflow with a clean input, a checkable output, a bounded downside, and enough volume to evaluate against, we have something to scope. The thing we scope might end up being agentic in some sense. It might end up being a script. The shape gets decided by the work, not by the word.

The buyers who come in asking about an agent and leave with a workflow are the ones whose systems are still quietly running a quarter later. The ones who come in asking about an agent and leave with an agent, but only after the workflow question got answered first, are doing fine too.

The ones who skip the question are the ones who chase whatever shape the buzz happens to be wearing this quarter: agent platforms one month, multi-agent orchestrators the next, autonomous research systems the one after that. They come back later with a bigger problem than the one they started with, and usually with a contract for whichever vendor’s pitch deck looked the cleanest.