What we tell first-time AI buyers when they ask where to start

When a company is buying AI consulting for the first time, the question they bring to the first call is almost always some variant of: where should we start?

Most of them are expecting an answer with a brand name in it. Sometimes the answer they want is Claude or GPT-5 or Gemini. Sometimes it is Cursor or Copilot or one of the agent platforms. Sometimes it is a vendor, a category, or a budget number.

The honest answer doesn’t have a brand name in it. It’s structural, and the shape doesn’t change with the buyer or the industry.

Start with one bounded workflow that someone in the company already wishes were faster. Audit the result honestly. Decide what you have learned.

That’s the whole answer. The rest of this post is why it’s the answer, what each phrase means, and what changes if you take it seriously.

The case against starting with a tool

When you start with the tool, you spend the first six weeks evaluating which platform, and never confront the underlying question of what you would actually do differently. The tool evaluation has its own forward motion: demos, comparisons, RFPs, vendor calls. It feels like progress because it produces artifacts. By the end you have a recommendation memo and an enterprise license. You have not, at any point, done the harder thing, which is to identify a piece of work in your company where the AI question is concrete enough to test.

This isn’t hypothetical. We keep meeting buyers who arrive with a signed annual contract for a model provider, an agent platform, or both, and no shipped use case to evaluate against. The contract was the easy part. It does nothing for the hard part.

The bounded-workflow filter

A workflow worth starting with has three properties, and it needs all three.

A well-defined input. Something has to come in. It has to come in often enough and in a similar enough shape that you can measure performance. Customer support tickets qualifies; all incoming email does not. Vendor invoices qualifies; all PDFs does not. The shape of the input determines the shape of the eval, which determines whether you can ever know if the system works.

A checkable output. Someone in the company has to be able to look at the output and say yes, that is right or no, that is wrong without a six-week training program. A summary the lead reviews before sending qualifies; a strategic recommendation does not. The check is what gives you data. Without it you have no signal, and no way to decide what to do next.

Recurrence. The workflow has to happen more than a few times a month. Recurrence is what gives you the volume to evaluate. A workflow that happens four times a year cannot be evaluated; it can only be hoped for. A workflow that happens 300 times a month gives you a calibrated sense of how often it works, when it fails, and what the failure modes are within thirty days.

The intersection of those three properties is small. That’s the point. Most of the workflows your team executes don’t pass all three filters.

A clarification, because this is where the conversation usually goes sideways. AI in the workflow-or-agent sense, the kind that gets scoped into a build engagement and shows up as software in your stack, does not help with workflows that fail the filter. AI in the chat-and-brainstorm sense, where someone opens Claude or ChatGPT to draft a memo, talk through a decision, or sanity-check a plan, can still help almost anyone, almost anywhere. Those are different uses of the same technology, and the bounded-workflow filter applies only to the first.

The workflows that do pass all three filters are where the first engagement should sit: contract review, weekly report generation, accounts-payable triage, customer-support routing, code-PR summarization, internal-document Q&A.

What “audit honestly” actually means

Once the workflow is shipped, the audit is the part most teams skip. It is also the part where most of the value compounds.

The audit needs measurement. Not vibes. Specifically:

How long did it actually save, end to end? Including the review time, the failure handling, and the time spent re-prompting when the output was wrong.
What was the error rate? Stratified by input type when input types are heterogeneous.
What were the failure modes? Not summary statistics: the actual cases. The specific input that produced the wrong output. Pattern-recognition on the failures is where the next iteration of work comes from.
Did the people doing the work want to keep using it? Adoption is a result, not an input. If the answer is no, the system has failed regardless of the metrics.

You don’t need a sophisticated eval platform for the first audit. A spreadsheet will do. Once the workflow earns a second iteration, the open-source Inspect framework from the UK AI Security Institute is a clean place to formalize the eval. Commercial options like Braintrust and LangSmith start to make sense once several workflows run in parallel. On day one, none of them do.

What you actually buy

Most first-time buyers arrive with a budget that is too large for the right scope and too small for the wrong scope. They have heard what an AI transformation is supposed to cost, and their willingness-to-spend is anchored to that number. The right first engagement costs a fraction of it: one workflow, two weeks of scoping, four weeks of building, six weeks of running. It produces a runbook, an eval suite, and a clear answer about whether to extend.

The rest of the budget is better held back. Spend it on the second and third workflows, after the first one tells you what kinds of work pay off. Spend it on the people who maintain the systems once they exist; that maintenance half decides whether the system is still running in month four, and we wrote a whole essay on it. Don’t spend it on a platform license you aren’t yet using.

Why this works

The companies that get AI right are not the ones who picked the best vendor. They are the ones who picked the right first workflow. Once you have one workflow shipped, audited, and decided, the next decision is not abstract anymore. You know what your team’s review tolerance looks like. You know which model handled your data well and which did not. You know what the 80th-percentile failure looks like, and whether your people wanted the system to keep running.

Build vs. buy, which vendor, which platform, what architecture: most of the strategic AI questions a leadership team asks are easier to answer with one shipped workflow behind you than with zero. The first workflow is not the AI strategy. It is the thing that lets the AI strategy be written by people who know something concrete.

That’s why we lead with this answer. The buyers who take it seriously have a different conversation in three months than the ones who don’t. The buyers who skip it usually call back from the same place they started, six months later, with a contract they haven’t yet used.