“Is AI actually working for us?”

If you have to ask, the answer is probably no, not because AI isn’t working, but because nobody set it up to be measurable.

ROI on AI isn’t mystical. It’s the same math as anything else: what did it cost, what did it return, how confidently can you tell. Most companies fail at the third part. They deploy something, vibes say it’s helpful, and six months later there’s no defensible number.

Here’s how to fix that, before, during, and after you build.

Why most AI ROI conversations go sideways

Three usual reasons:

  1. No baseline. Nobody wrote down what the workflow cost in time, money, or errors before AI touched it. So “improvement” is a feeling.
  2. The wrong metric. Teams measure “messages sent” or “tickets summarized” instead of the business outcome those activities exist for.
  3. Hidden costs ignored. Model spend, integration maintenance, prompt iteration, oversight time, all real, often uncounted.

A good ROI framework forces all three into the open.

The Majoto ROI framework

We use four numbers per agent. That’s it.

1. Time reclaimed

Hours per week (or per case) that humans no longer spend on the automated work. Convert to dollars at loaded cost, not base salary. A $30/hour role usually costs the business closer to $45-55/hour fully loaded.

This is the obvious number. It’s also the easiest to inflate. Be honest: if a human still reviews every output, you didn’t reclaim the full hour. You reclaimed the drafting time, not the deciding time.

2. Outcome lift

The actual business outcome the workflow was supposed to drive. Examples:

This is the number that separates real ROI from cosmetic ROI. A faster first response that also lifts the conversion rate is real. A faster first response that doesn’t move conversion is just speed for its own sake.

3. Total cost to run

Everything it takes to keep the agent alive:

If you don’t measure cost honestly, your “ROI” is fiction.

4. Risk-adjusted return

Some workflows have a downside if the agent gets it wrong: a misquoted price, a wrong appointment, a compliance-adjacent error. The ROI calculation has to account for the cost of mistakes and the cost of the controls that prevent them.

Risk-adjusted return is just: (time reclaimed value + outcome lift value) − (run cost + expected error cost).

That’s the whole framework. Four numbers per agent, one verdict per quarter.

Baseline before you build

The single biggest mistake we see: building first, baselining never.

Before any agent ships, write down, in a document, signed off by the operator who owns the workflow, the current state on each of the four numbers:

This takes a few hours. It pays back forever. It’s the only way to make “before vs after” a real comparison instead of a vibe.

A serious architecture review includes this step. If a vendor wants to ship AI for you without baselining first, they’re selling you something they don’t want measured.

Pick metrics that match the agent

Different narrow agents earn their keep differently. A starter map:

Lead follow-up agent

Scheduling agent

Internal knowledge agent

Operations / back-office agent

If you’re tracking metrics the agent doesn’t control, you’ll get noise. If you’re tracking activity instead of outcome, you’ll get theater.

A simple review cadence

You don’t need a dashboard the size of a war room. You need a rhythm.

Killing an agent that isn’t earning its keep is a feature, not a failure. It’s the discipline that makes the rest of your AI portfolio trustworthy.

What good AI ROI actually looks like

Two patterns we see when narrow AI is working:

Pattern A, concentrated payback. One agent collapses a multi-hour daily workflow into minutes. The math is obvious within 60 days. This is the most common shape for first builds.

Pattern B, small lift across a large volume. A modest improvement (e.g., a few percentage points on conversion or no-show rate) on a high-volume workflow. The per-event lift is small, the total lift is large. This is common in scheduling, lead follow-up, and renewals.

A red flag pattern: agents that produce a lot of output (“we sent 4,000 AI-drafted messages this month”) without a matching outcome lift. That’s activity, not ROI.

Costs operators usually forget

Make sure the cost side of the ledger includes:

These aren’t reasons not to build. They’re reasons to build deliberately, with the math written down.

How to set yourself up to actually measure

A few practical moves:

  1. One agent, one owner. A human operator who is accountable for the four numbers.
  2. One scoreboard. A single doc or dashboard, not five. Reviewed on a calendar invite, not a whim.
  3. Versioned changes. When you change the agent (prompts, data, tools), log it. So you can correlate changes with metric movement.
  4. Comparable baselines. Don’t let the baseline drift. If you change the workflow, restate the baseline.
  5. An exit criterion. Define, in writing, what would cause you to retire or replace the agent. This is what makes the system trustworthy.

How Majoto builds for measurable ROI

When we architect a narrow agent, the metrics are part of the build, not an afterthought:

That’s what an AI system looks like when it’s built for a business, not for a pitch deck.

FAQ

Ready to find the first workflow worth automating?

Book a free architecture review. We’ll map the bottlenecks, identify the safest first build, and show where AI can create leverage without adding operational mess.

Book a Free Architecture Review