“Is AI actually working for us?”
If you have to ask, the answer is probably no, not because AI isn’t working, but because nobody set it up to be measurable.
ROI on AI isn’t mystical. It’s the same math as anything else: what did it cost, what did it return, how confidently can you tell. Most companies fail at the third part. They deploy something, vibes say it’s helpful, and six months later there’s no defensible number.
Here’s how to fix that, before, during, and after you build.
Why most AI ROI conversations go sideways
Three usual reasons:
- No baseline. Nobody wrote down what the workflow cost in time, money, or errors before AI touched it. So “improvement” is a feeling.
- The wrong metric. Teams measure “messages sent” or “tickets summarized” instead of the business outcome those activities exist for.
- Hidden costs ignored. Model spend, integration maintenance, prompt iteration, oversight time, all real, often uncounted.
A good ROI framework forces all three into the open.
The Majoto ROI framework
We use four numbers per agent. That’s it.
1. Time reclaimed
Hours per week (or per case) that humans no longer spend on the automated work. Convert to dollars at loaded cost, not base salary. A $30/hour role usually costs the business closer to $45-55/hour fully loaded.
This is the obvious number. It’s also the easiest to inflate. Be honest: if a human still reviews every output, you didn’t reclaim the full hour. You reclaimed the drafting time, not the deciding time.
2. Outcome lift
The actual business outcome the workflow was supposed to drive. Examples:
- Lead-to-booked-call rate
- Time to first response on inbound leads
- No-show rate after AI reminders + rebooking
- Tickets resolved without human escalation
- Renewal rate after AI-driven check-ins
- Days sales outstanding after automated AR follow-up
This is the number that separates real ROI from cosmetic ROI. A faster first response that also lifts the conversion rate is real. A faster first response that doesn’t move conversion is just speed for its own sake.
3. Total cost to run
Everything it takes to keep the agent alive:
- Model and infrastructure spend
- Third-party tools the agent uses (CRM seats, calendar APIs, etc.)
- Maintenance: prompt updates, integration fixes, data refreshes
- Oversight: human review time, including edge-case escalations
If you don’t measure cost honestly, your “ROI” is fiction.
4. Risk-adjusted return
Some workflows have a downside if the agent gets it wrong: a misquoted price, a wrong appointment, a compliance-adjacent error. The ROI calculation has to account for the cost of mistakes and the cost of the controls that prevent them.
Risk-adjusted return is just: (time reclaimed value + outcome lift value) − (run cost + expected error cost).
That’s the whole framework. Four numbers per agent, one verdict per quarter.
Baseline before you build
The single biggest mistake we see: building first, baselining never.
Before any agent ships, write down, in a document, signed off by the operator who owns the workflow, the current state on each of the four numbers:
- How many hours per week does this workflow consume today?
- What is the outcome rate today?
- What does the current process cost (tools, time, error rework)?
- What does it cost when it goes wrong?
This takes a few hours. It pays back forever. It’s the only way to make “before vs after” a real comparison instead of a vibe.
A serious architecture review includes this step. If a vendor wants to ship AI for you without baselining first, they’re selling you something they don’t want measured.
Pick metrics that match the agent
Different narrow agents earn their keep differently. A starter map:
Lead follow-up agent
- Time to first response (target: minutes, not hours)
- % of inbound leads that reach a qualified conversation
- Lead-to-booked-call rate
- Loaded SDR hours reclaimed
Scheduling agent
- Booking completion rate from first contact
- No-show rate after AI reminders
- Front-desk minutes reclaimed per day
- Patient/client satisfaction signal (CSAT or NPS proxy)
Internal knowledge agent
- Time to answer common internal questions
- Onboarding time for new hires
- % of repeat questions deflected from senior staff
- Support escalation rate
Operations / back-office agent
- Cycle time from input to “done”
- Error rate vs. baseline manual process
- Rework hours per week
- Throughput per FTE
If you’re tracking metrics the agent doesn’t control, you’ll get noise. If you’re tracking activity instead of outcome, you’ll get theater.
A simple review cadence
You don’t need a dashboard the size of a war room. You need a rhythm.
- Weekly (first 4-6 weeks after launch): 15-minute review. Is the agent doing its job? Where is it failing? What needs a prompt, data, or guardrail change?
- Monthly: the four numbers, reported plainly. Time reclaimed, outcome lift, run cost, risk events.
- Quarterly: the ROI verdict. Keep, expand, rework, or kill.
Killing an agent that isn’t earning its keep is a feature, not a failure. It’s the discipline that makes the rest of your AI portfolio trustworthy.
What good AI ROI actually looks like
Two patterns we see when narrow AI is working:
Pattern A, concentrated payback. One agent collapses a multi-hour daily workflow into minutes. The math is obvious within 60 days. This is the most common shape for first builds.
Pattern B, small lift across a large volume. A modest improvement (e.g., a few percentage points on conversion or no-show rate) on a high-volume workflow. The per-event lift is small, the total lift is large. This is common in scheduling, lead follow-up, and renewals.
A red flag pattern: agents that produce a lot of output (“we sent 4,000 AI-drafted messages this month”) without a matching outcome lift. That’s activity, not ROI.
Costs operators usually forget
Make sure the cost side of the ledger includes:
- Model usage at production volume, not test volume
- Vector store / retrieval infrastructure if you’re running RAG
- Integration fees and per-seat costs on tools the agent uses
- Time spent reviewing edge cases (especially in the first 90 days)
- Periodic re-tuning when your business changes (pricing, products, policies)
These aren’t reasons not to build. They’re reasons to build deliberately, with the math written down.
How to set yourself up to actually measure
A few practical moves:
- One agent, one owner. A human operator who is accountable for the four numbers.
- One scoreboard. A single doc or dashboard, not five. Reviewed on a calendar invite, not a whim.
- Versioned changes. When you change the agent (prompts, data, tools), log it. So you can correlate changes with metric movement.
- Comparable baselines. Don’t let the baseline drift. If you change the workflow, restate the baseline.
- An exit criterion. Define, in writing, what would cause you to retire or replace the agent. This is what makes the system trustworthy.
How Majoto builds for measurable ROI
When we architect a narrow agent, the metrics are part of the build, not an afterthought:
- We baseline the workflow before we touch it.
- We instrument the agent so the four numbers are observable from day one.
- We define the review cadence and the exit criteria with the operator.
- We deliver an agent you can defend in a board meeting, not just one that demos well.
That’s what an AI system looks like when it’s built for a business, not for a pitch deck.
FAQ
Ready to find the first workflow worth automating?
Book a free architecture review. We’ll map the bottlenecks, identify the safest first build, and show where AI can create leverage without adding operational mess.
Book a Free Architecture Review