I’ve watched too many chatbot projects get sold as strategy when they’re really interface decisions with a lot of wishful thinking attached.
A vendor demos a polished text box. The team imagines deflected support tickets, happier customers, maybe a little brand halo from having something “AI-native” on the site. Then the real questions start landing. The odd edge cases. The half-documented product behavior. The tickets that require judgment, not retrieval. Suddenly the thing that looked clean in the demo starts demanding constant babysitting.
That is not a model problem. It is a scoping problem.
What You’re Actually Buying
When you buy a “chatbot,” you’re buying an interface pattern: a box that accepts language and returns language. That can be useful. But on its own it says almost nothing about what work the system is supposed to do, what it is allowed to touch, or how anyone will know when it has gone off the rails.
What tends to work better is much narrower. A support triage operator. A renewal-risk flagger. A workflow that drafts responses for one specific ticket type. Something with edges.
That usually means defining a few boring things up front:
- what event starts the workflow
- what information the system can use
- what shape the output should take
- who reviews it before anything customer-facing happens
- how bad outputs get corrected
That list is not glamorous. It is, however, the actual work.
Why These Projects Stall
The teams that get stuck usually get stuck for one of three reasons.
Scope mush. The system is supposed to handle “customer questions,” which sounds clear until you look at the actual mix. Pricing questions. Setup questions. Broken integrations. Edge-case billing issues. Feature misunderstandings. Renewal anxiety. Those are different jobs pretending to be one job.
Maintenance surprise. Generic assistants sound lightweight until the upkeep starts. Product changes have to be reflected. Exceptions accumulate. Confidence drops the first few times the system gives a polished wrong answer. Then someone on the team quietly becomes the chatbot babysitter. (Nobody wanted that role, by the way.)
No usable feedback loop. A bad customer-facing answer is expensive because the customer sees it first. A bad internal draft is annoying, but teachable. That distinction matters more than most teams realize.
What A Better First Build Looks Like
If a support team has manageable ticket volume but response times are slipping, my instinct is not to put a chatbot in front of customers. My instinct is to look for the narrowest internal workflow that keeps repeating and ask whether the team is doing the same coordination work over and over again.
Take support triage. A ticket arrives. Someone figures out what kind of issue it is. They pull the relevant docs or past cases. They draft a response or handoff note. Someone else reviews. Then it goes out.
That is already a workflow. The question is whether you are willing to define it clearly enough for a system to help.
That kind of first build gives you somewhere to learn. You can measure how often the draft is usable. You can see where the system reaches for the wrong information. You can tell whether the review burden is dropping or just moving around. Most importantly, you get those lessons without making customers the QA layer.
And that is usually the missing move. Teams reach for the public-facing artifact first because it looks more impressive. The internal operator is less flashy. It is also much more likely to survive contact with reality.
The Scope Creep Trap
Once a vendor is in motion, there is always a temptation to widen the job: while we’re here, can it also handle refunds? Can it look up accounts? Can it process cancellations? Can it route feature requests to product?
Each addition sounds reasonable in isolation. Together they turn one understandable workflow into a pile of loosely related behaviors. That is when teams end up with something brittle, overconnected, and hard to trust.
I prefer a first workflow that does one thing cleanly. Give it one job. Let the team see it work. Let the review loop teach you where the real complexity lives. Expansion is much easier once you have earned some clarity.
What To Look For In A Vendor
If you are evaluating AI vendors right now, the useful signal is not how magical the demo feels. It is how quickly they move from interface talk to workflow definition.
The better conversations sound like this:
- What is the highest-volume repeatable workflow you want to tighten up first?
- What starts it?
- What information does it need?
- What does a good output look like?
- Who reviews it?
- What counts as failure?
That is operations language. It is grounded, a little less sexy, and much more likely to ship.
The polished chatbot demo has its place. But if nobody can describe the workflow underneath it, the team is buying theater with a chat input.
If you’re evaluating AI vendors and want a second opinion on scope, I do short audits of workflow automation proposals. Get in touch if you want to talk through it.