“Is AI actually working for us?”

If you have to ask, the answer is probably no, not because AI isn’t working, but because nobody set it up to be measurable.

ROI on AI isn’t mystical. It’s the same math as anything else: what did it cost, what did it return, how confidently can you tell. Most companies fail at the third part. They deploy something, vibes say it’s helpful, and six months later there’s no defensible number.

Here’s how to fix that, before, during, and after you build.

Why most AI ROI conversations go sideways

Three usual reasons:

No baseline. Nobody wrote down what the workflow cost in time, money, or errors before AI touched it. So “improvement” is a feeling.
The wrong metric. Teams measure “messages sent” or “tickets summarized” instead of the business outcome those activities exist for.
Hidden costs ignored. Model spend, integration maintenance, prompt iteration, oversight time, all real, often uncounted.

A good ROI framework forces all three into the open.

The Majoto ROI framework

We use four numbers per agent. That’s it.

1. Time reclaimed

Hours per week (or per case) that humans no longer spend on the automated work. Convert to dollars at loaded cost, not base salary. A $30/hour role usually costs the business closer to $45-55/hour fully loaded.

This is the obvious number. It’s also the easiest to inflate. Be honest: if a human still reviews every output, you didn’t reclaim the full hour. You reclaimed the drafting time, not the deciding time.

2. Outcome lift

The actual business outcome the workflow was supposed to drive. Examples:

Lead-to-booked-call rate
Time to first response on inbound leads
No-show rate after AI reminders + rebooking
Tickets resolved without human escalation
Renewal rate after AI-driven check-ins
Days sales outstanding after automated AR follow-up

This is the number that separates real ROI from cosmetic ROI. A faster first response that also lifts the conversion rate is real. A faster first response that doesn’t move conversion is just speed for its own sake.

3. Total cost to run

Everything it takes to keep the agent alive:

Model and infrastructure spend
Third-party tools the agent uses (CRM seats, calendar APIs, etc.)
Maintenance: prompt updates, integration fixes, data refreshes
Oversight: human review time, including edge-case escalations

If you don’t measure cost honestly, your “ROI” is fiction.

4. Risk-adjusted return

Some workflows have a downside if the agent gets it wrong: a misquoted price, a wrong appointment, a compliance-adjacent error. The ROI calculation has to account for the cost of mistakes and the cost of the controls that prevent them.

Risk-adjusted return is just: (time reclaimed value + outcome lift value) − (run cost + expected error cost).

That’s the whole framework. Four numbers per agent, one verdict per quarter.

Baseline before you build

The single biggest mistake we see: building first, baselining never.

Before any agent ships, write down, in a document, signed off by the operator who owns the workflow, the current state on each of the four numbers:

How many hours per week does this workflow consume today?
What is the outcome rate today?
What does the current process cost (tools, time, error rework)?
What does it cost when it goes wrong?

This takes a few hours. It pays back forever. It’s the only way to make “before vs after” a real comparison instead of a vibe.

A serious architecture review includes this step. If a vendor wants to ship AI for you without baselining first, they’re selling you something they don’t want measured.

Pick metrics that match the agent

Different narrow agents earn their keep differently. A starter map:

Lead follow-up agent

Time to first response (target: minutes, not hours)
% of inbound leads that reach a qualified conversation
Lead-to-booked-call rate
Loaded SDR hours reclaimed

Scheduling agent

Booking completion rate from first contact
No-show rate after AI reminders
Front-desk minutes reclaimed per day
Patient/client satisfaction signal (CSAT or NPS proxy)

Internal knowledge agent

Time to answer common internal questions
Onboarding time for new hires
% of repeat questions deflected from senior staff
Support escalation rate

Operations / back-office agent

Cycle time from input to “done”
Error rate vs. baseline manual process
Rework hours per week
Throughput per FTE

If you’re tracking metrics the agent doesn’t control, you’ll get noise. If you’re tracking activity instead of outcome, you’ll get theater.

A simple review cadence

You don’t need a dashboard the size of a war room. You need a rhythm.

Weekly (first 4-6 weeks after launch): 15-minute review. Is the agent doing its job? Where is it failing? What needs a prompt, data, or guardrail change?
Monthly: the four numbers, reported plainly. Time reclaimed, outcome lift, run cost, risk events.
Quarterly: the ROI verdict. Keep, expand, rework, or kill.

Killing an agent that isn’t earning its keep is a feature, not a failure. It’s the discipline that makes the rest of your AI portfolio trustworthy.

What good AI ROI actually looks like

Two patterns we see when narrow AI is working:

Pattern A, concentrated payback. One agent collapses a multi-hour daily workflow into minutes. The math is obvious within 60 days. This is the most common shape for first builds.

Pattern B, small lift across a large volume. A modest improvement (e.g., a few percentage points on conversion or no-show rate) on a high-volume workflow. The per-event lift is small, the total lift is large. This is common in scheduling, lead follow-up, and renewals.

A red flag pattern: agents that produce a lot of output (“we sent 4,000 AI-drafted messages this month”) without a matching outcome lift. That’s activity, not ROI.

Costs operators usually forget

Make sure the cost side of the ledger includes:

Model usage at production volume, not test volume
Vector store / retrieval infrastructure if you’re running RAG
Integration fees and per-seat costs on tools the agent uses
Time spent reviewing edge cases (especially in the first 90 days)
Periodic re-tuning when your business changes (pricing, products, policies)

These aren’t reasons not to build. They’re reasons to build deliberately, with the math written down.

How to set yourself up to actually measure

A few practical moves:

One agent, one owner. A human operator who is accountable for the four numbers.
One scoreboard. A single doc or dashboard, not five. Reviewed on a calendar invite, not a whim.
Versioned changes. When you change the agent (prompts, data, tools), log it. So you can correlate changes with metric movement.
Comparable baselines. Don’t let the baseline drift. If you change the workflow, restate the baseline.
An exit criterion. Define, in writing, what would cause you to retire or replace the agent. This is what makes the system trustworthy.

How Majoto builds for measurable ROI

When we architect a narrow agent, the metrics are part of the build, not an afterthought:

We baseline the workflow before we touch it.
We instrument the agent so the four numbers are observable from day one.
We define the review cadence and the exit criteria with the operator.
We deliver an agent you can defend in a board meeting, not just one that demos well.

That’s what an AI system looks like when it’s built for a business, not for a pitch deck.

FAQ

Ready to find the first workflow worth automating?

Book a free architecture review. We’ll map the bottlenecks, identify the safest first build, and show where AI can create leverage without adding operational mess.

Book a Free Architecture Review