How to Implement AI in Small Business: Measure, Cut, Build — Dad's Advice About a Treehouse

May 30, 2026 • Jake Botticello

My father had a line he’d deploy any time I reached for the wrong tool: “Use the right tool for the job, or waste all your time.” I heard it a hundred times, but the lesson that stuck came from a treehouse. I wanted to start nailing boards to the maple in the backyard the day we bought the lumber. He made me climb the tree first — with a tape measure. Measure the trunk. Measure the spread of the branches. Figure out what the tree could actually hold before a single cut. Then cut to the measurements. Then build. And every spring after, he’d send me up to check the joints, because the tree keeps growing whether your treehouse does or not.

Thirty-some years later, I watch businesses do the opposite with AI every single week. They buy the lumber first — the chatbot, the agent platform, the automation suite — nail it to whatever branch is closest, and then stand back wondering why the floor’s crooked.

Here’s the thesis, and it’s the same one that runs through everything we build: AI alone doesn’t fix a business. It’s a tool — a genuinely remarkable one — that only pays off as part of a bigger solution built around how your business actually works. If you’ve read our pillar guide on how to scale small business operations, you already know the doctrine: process first, then technology. AI doesn’t get an exemption from that rule. If anything, AI punishes you harder for skipping it.

So this is the method, in my father’s three words: Measure, Cut, Build. Measure the tree. Cut to fit. Build it — and keep climbing up to check the joints. If you’d rather start by measuring, meet Cyris, our AI Operations Associate, and run your readiness consultation — then come back for the method behind the numbers.

What we’ll cover:

Why Does AI Adoption Fail in Small Business? (It’s Not the AI)
Stage 1 — Measure: Map How Your Team Actually Works Before You Buy Anything
How to Run an AI Readiness Assessment (and Set Your Pre-AI Benchmarks)
Stage 2 — Cut: Choosing the AI Investments That Actually Fit the Job
Hype vs. Fit: A Decision Filter for Choosing AI Tools
Stage 3 — Build: Implementation Is the Easy Part. Governance Is the Part Everyone Skips.
Who Uses Which Agent, With Which SOP: An Oversight Framework
Measuring AI ROI Against Your Pre-AI Benchmarks: Closing the Loop
Preventing AI Drift: The Tree Keeps Growing
The Closed Loop: How Measure, Cut, Build Compounds

Why Does AI Adoption Fail in Small Business? (It’s Not the AI)

AI adoption fails primarily because nobody redesigns the workflow around the tool — not because the AI is weak. Businesses bolt AI onto undefined processes, skip baseline measurement, and never establish oversight, so the tool has no defined job, no yardstick, and no maintenance. The technology gets blamed for a planning failure.

The numbers on this are genuinely brutal:

MIT’s NANDA research found 95% of enterprise generative-AI pilots fail to deliver measurable P&L impact — and the core issue wasn’t model quality. It was what MIT calls the “learning gap”: organizations that never adapted their processes to the tool, or the tool to their processes.
42% of companies abandoned most of their AI initiatives in 2025, up from 17% the year before — and RAND puts overall AI project failure above 80%, roughly double the failure rate of ordinary technology projects.
The companies that do win look structurally different: McKinsey found AI high performers are nearly three times as likely to have fundamentally redesigned their workflows, and workflow redesign was one of the strongest predictors of actual business impact.

Read those three findings together and the pattern is impossible to miss. The tool isn’t the variable. The work done around the tool is the variable. Nailing boards to an unmeasured tree fails at the same rate whether the lumber is premium or not.

The rest of this article is the work done around the tool.

Stage 1 — Measure: Map How Your Team Actually Works Before You Buy Anything

Before my dad let a single board near that maple, I was up the tree with a tape measure. Before a single AI subscription touches your business, you’re doing the same climb: gathering actual data about how your employees work — not the org-chart version, not the version in your head, the real one.

This means digging into the live workstreams. Who actually handles an inbound lead, in what tool, with how many handoffs? How does a project really move from sold to delivered — including the part where it sits in someone’s inbox for two days waiting on an approval everyone forgot exists? Where does information get re-typed from one system into another by a human being whose time you’re paying for?

We covered the full audit method in the pillar’s Process Mapping 101 section, and that exact exercise is the front door to AI readiness. But for AI specifically, you’re hunting two extra things:

Where the repetitive, rules-describable work lives. AI earns its keep on work that’s frequent, pattern-shaped, and currently eating skilled-human hours. You can’t identify that work from memory — you find it in the data, and it’s almost never where the owner guesses it is.

What the friction between tools is already costing you. Harvard Business Review research found workers toggle between apps roughly 1,200 times a day, losing about four hours a week just re-orienting — call it five working weeks a year, per person, spent switching contexts. If that sounds familiar, it’s the same disease we diagnosed in the pillar’s broken tech stack section — and bolting an unintegrated AI tool onto that pile doesn’t cure the toggle tax. It adds a toll booth.

The output of Stage 1 is a map of the real work and — critically — the numbers attached to it. Which brings us to the part almost everyone skips.

How to Run an AI Readiness Assessment (and Set Your Pre-AI Benchmarks)

An AI readiness assessment answers two questions: is this business prepared to implement AI successfully, and what do our key processes cost us today? That second half — the baseline — is the step nearly every business skips, and skipping it is why they can never prove their AI worked. Here’s the practitioner’s version:

Inventory your core processes. Same list as any operations audit: lead handling, sales, onboarding, delivery, communication, invoicing. Five to eight big rivers.
For each process, capture the baseline numbers. Time per instance (how long does one proposal, one onboarding, one report actually take, end to end?). Volume (how many per week?). Touch count (how many people and handoffs?). Error and rework rate (how often does it come back?). These don’t need to be laboratory-grade — they need to be honest and written down.
Document the current SOP for each process — even if it’s ugly. If the process lives in someone’s head, this is the moment it comes out. This matters more than people realize, because your SOPs are what any AI agent will eventually be trained on or measured against. Garbage in, confident-sounding garbage out.
Score data readiness. Where does each process’s information live? One system, or scattered across email, spreadsheets, and a group text? AI can only work with what it can reach.
Score the team. Who’s eager, who’s skeptical, who’s quietly terrified? Adoption is a people problem wearing a technology costume.
Rank processes by AI opportunity. High volume + high repetition + clean data + measurable baseline = top of the list. Anything you couldn’t benchmark goes to the bottom — if you can’t measure it now, you’ll never know if AI improved it.
Seal the baseline. Date it. Save it. This document is the single most valuable artifact of your entire AI journey, because in Stage 3 it becomes the yardstick. Without it, “is the AI working?” gets answered with vibes — and vibes always say yes for the first ninety days.

Take our AI Readiness & Benchmark Assessment — run the readiness check and seal your baseline with Cyris, our AI Operations Associate. The seven steps above, made live.

Stage 2 — Cut: Choosing the AI Investments That Actually Fit the Job

Now — and only now — you’re allowed in the lumberyard.

“Cut” means cutting to the measurements: matching specific AI and automation investments to the specific jobs your Stage 1 data identified, instead of buying whatever the keynote speaker was excited about. The difference sounds subtle. It’s the whole game. McKinsey’s finding bears repeating here: the businesses getting real impact from AI are the ones that redesigned the workflow around it — meaning the tool was chosen for a defined workflow, not parachuted into an undefined one. Meanwhile, only 39% of AI adopters report any earnings impact at all, and most of those under 5%. The gap between those two groups isn’t budget. It’s sequence.

Your Stage 1 ranking already did the hard part: it told you which two or three processes have the volume, repetition, and clean baseline that make them genuine AI candidates. Stage 2 is about resisting everything else. The lumberyard is enormous and everything in it is on sale and shouting. Your measurements are the only defense.

Hype vs. Fit: A Decision Filter for Choosing AI Tools

To choose the right AI tool, evaluate fit against your mapped workflow — not features against a demo. A tool earns adoption when it serves a specific measured process, integrates with the systems that process already touches, and can be evaluated against your pre-AI baseline. Run every candidate through this filter:

Does it map to a Stage 1 process? If you can’t name the exact workflow — and its baseline numbers — this tool would improve, it’s hype. Pass.
Does it fit the workflow, or does the workflow have to contort to fit it? Small contortions are normal. If adopting the tool means rebuilding how your team works around the tool’s opinions, the tail is wagging the dog.
Does it integrate with where the data already lives? A brilliant AI that can’t reach your actual systems just adds another app to the 1,200 daily toggles.
Can you measure it against the baseline? If there’s no clean before/after comparison available, you will never be able to defend the spend — to yourself or anyone else.
Who maintains it? Every AI tool is a small employee that never sleeps but also never tells you when it’s confused. Someone owns it, or nobody does — and “nobody” is how Stage 3 fails.
Build, buy, or partner? The data here is humbling for the DIY-inclined: MIT found purpose-built tools from specialized vendors and partnered builds succeed about 67% of the time, while internal DIY builds succeed only a third as often. So: buy off-the-shelf for commodity work, skip the solo DIY build, and for the processes that are your business — the ones generic tools will never quite fit — a partnered custom build gets you proprietary fit without the DIY failure rate. Either way, spend your energy on the integration and governance — which, conveniently, is where we’re headed.

Pass all six and you’ve cut a board that fits. Most tools fail at the first gate, which is the filter doing its job. The goal of Stage 2 isn’t a long list of AI investments. It’s a short one you can actually govern.

Stage 3 — Build: Implementation Is the Easy Part. Governance Is the Part Everyone Skips.

AI governance for a small business is the set of rules and routines that control who uses which AI tools, on which processes, with what data — plus how usage is tracked, how outputs are checked, and who corrects course when something’s off. It’s quality control for a workforce that works at machine speed and never raises its hand.

Here’s the uncomfortable truth about the build stage: the implementation itself — the configuring, the connecting, the launching — is the easy part. Vendors have gotten good at onboarding. The treehouse goes up fast.

What nobody hands you at onboarding is the maintenance schedule. My dad’s spring ritual — climb up, check the joints, see what the tree had moved — wasn’t paranoia. It was the understanding that anything anchored to a living thing goes out of true quietly, and an hour with a level is a lot cheaper than a surprise.

In AI terms, governance at SMB scale means deciding, in writing: which tools are sanctioned for which processes (and which are off-limits — your client data does not belong in whatever free chatbot an employee found on Tuesday); what data each tool may touch; what gets human review before it leaves the building; who owns each tool’s performance; and on what rhythm the whole thing gets checked. None of this requires an enterprise compliance department. For a fifteen-person service business, the entire governance layer can be two pages and a monthly hour. But it has to exist, and it has to be written, because — as the next section covers — the alternative is an org full of agents and people playing an expensive game of telephone.

Who Uses Which Agent, With Which SOP: An Oversight Framework

This is the section the rest of the internet hasn’t written, so let’s write it properly.

The moment AI agents enter your workflows, you’ve created a new species of dependency: agents trained on, or guided by, your SOPs. That’s enormously powerful — your documented processes become leverage instead of binder-filler. It’s also the single most dangerous joint in the structure, for one reason: train an agent on the wrong thing, and you don’t know for sure where to stop unraveling the sweater. A human employee taught a bad process makes bounded mistakes and eventually complains. An agent taught a bad process executes it flawlessly, at volume, with total confidence, until someone notices — and by then the bad output has propagated into downstream work, client deliverables, and possibly other agents’ inputs.

So the oversight framework, at minimum, is a living register with four columns:

The agent/tool — every AI surface in the business, named. Including the unofficial ones; the register’s first draft always surfaces two tools nobody approved.
Who uses it — which roles, for which tasks. Not “everyone, for stuff.”
Which SOP governs it — the versioned, dated process document the agent’s behavior is built on or checked against. When the SOP updates, the agent’s instructions update the same day, or you’ve forked reality.
How it’s checked — the review cadence, the sample size, and who signs off. High-stakes outputs (anything client-facing, anything touching money) get human review every time; low-stakes internal work gets spot-checked on a schedule.

Then track usage. Not surveillance — effectiveness. Which agents get used, how often, by whom, with what outcomes against the Stage 1 baseline. Usage data tells you which investments are earning their keep, which need retraining, and which are shelf-ware wearing a subscription. And there’s a compounding bonus most businesses never reach: once you can see usage and outcomes across the whole register, the cross-pollination starts — the prompt pattern that’s working in sales turns out to fix the bottleneck in onboarding. The register is what makes those connections visible. That’s the thing that really makes it go.

Measuring AI ROI Against Your Pre-AI Benchmarks: Closing the Loop

Here’s where Stage 1 pays for itself, and where almost everyone else is flying blind. Forbes’ 2025 research found only 12% of executives are even using AI to help measure their AI’s ROI, 39% name ROI measurement as a top obstacle, and just 3% report significant returns. Industry-wide, the standard approach to “is it working?” is a shrug with a dashboard on it.

You don’t have that problem, because you sealed a baseline in Stage 1. ROI measurement now is just arithmetic:

Take each process the AI touched and re-measure the exact numbers you benchmarked. Time per instance, volume capacity, touch count, error and rework rate. Same definitions, same honesty. Proposal turnaround was 6 days, now it’s 36 hours? That’s a number. Rework on onboarding docs dropped from 1-in-5 to 1-in-20? Number. Nothing moved? Also a number — an important one, because it tells you to retrain, reconfigure, or cut the tool loose before renewal, not after three more quarters of hope.

Run the comparison quarterly, against the register from the oversight framework, and AI ROI stops being a leap of faith and becomes a line item. This is the closed loop the whole method exists for: the measurements from before the build are the only honest way to grade the build. Skip Stage 1 and Stage 3’s question is unanswerable — which is exactly the trap the other 88% of the market is standing in.

Preventing AI Drift: The Tree Keeps Growing

One more thing my dad understood that the AI industry is just now learning: the tree keeps growing. The treehouse that fit perfectly in June is tilted by next spring — not because anyone broke it, but because the living thing it’s attached to kept changing.

Your business is the tree. Your processes evolve, your offerings shift, your SOPs get updated — and every AI agent configured against last year’s reality starts to drift: still running, still confident, increasingly wrong. Drift is the most insidious failure mode in the whole stack precisely because nothing breaks. The outputs just degrade — a little staler, a little less accurate, quietly compounding as agent outputs feed people and people’s outputs feed agents, the signal mutating a little at each handoff like a giant game of telephone. By the time a client notices, the drift is months deep.

The defenses are unglamorous and they work:

The same-day rule: when an SOP changes, every agent that depends on it gets updated the same day. The register tells you which ones those are — this is half of why it exists.
The seasonal climb: a scheduled review (quarterly is right for most SMBs) where each agent’s outputs get sampled against the current process definition, not the one it was born with.
Watch the benchmark trendline: your quarterly ROI re-measurement doubles as a drift detector. Numbers that improved and then quietly slide backward are drift announcing itself — listen.
Keep humans at the joints: the people using the agents daily notice staleness first. Give them a dead-simple way to flag “this output smells off,” and treat every flag as free inspection data.

Drift prevention is the least exciting section of this article and the one most predictive of whether your AI still works in eighteen months. The businesses that climb the tree every season keep their treehouse level — they catch the shifted branch, the pulled joint, the platform sitting a few degrees off true, and they adjust it while it’s still a Saturday-morning fix. The ones that don’t discover the structural problem the hard way — usually with somebody standing on it.

The Closed Loop: How Measure, Cut, Build Compounds

Step back and look at what the three stages do together. Measure produces the map and the baseline. Cut uses the map to pick tools that actually fit. Build implements them under governance — and then grades them against the baseline, feeding what you learn back into better measurements, sharper selections, and tighter oversight. It’s not a checklist you finish. It’s a loop that compounds, the same way the operational systems in our scaling guide compound: every cycle, the business gets more measured, more deliberate, and harder for chaos to reclaim.

And the upside for getting the sequence right is real: 91% of small businesses using AI report revenue gains, and 78% plan to increase their investment. The technology works. It just doesn’t work alone — it works as the right tool, cut to honest measurements, checked every season. Dad was right about the treehouse. He’s right about this too.

If you’d rather not run the loop solo, this is precisely what we build at Pyris: the readiness audit, the benchmarks, and then the build itself — custom AI agents and automations designed for your business and wired into your actual workflows, under a governance layer that keeps it all honest. Not tool recommendations. Built systems, done the way the 95% skipped.

Take our AI Readiness & Benchmark Assessment or book a free discovery call — 20 minutes, no pitch deck, no pressure. Bring us the chaos, or the half-built treehouse. We’ve seen both.

Jake Botticello is the founder of Pyris Consulting, where he and his team build custom processes, integrated systems, and purpose-built AI automations for founder-led service businesses. He writes from the practitioner’s seat — every framework in this article comes from real builds, not theory. The treehouse is real too.