STAY · We don't disappear after go-live
The 90-day post-launch playbook: what a real AI agency does after go-live
Four phases. One internal owner. A Day 75 independence test. A Day 91 decision that is never a default.
The question almost nobody asks before signing
Most of the AI-agency conversation is written as if the only question is which agency should I hire to build it. It isn't. The harder question, the one almost nobody asks until it's too late, is who is going to be looking at this system on a random Tuesday in month four.
Picture a founder who's paid roughly $80,000 to a well-known consulting firm for a lead-scoring system. We could be describing any of three or four agencies' clients we've inherited in the last eighteen months. The shape was identical every time, so we're composing a single picture from real material we can't attribute by name. The system shipped. The slide deck was beautiful. Then on Day 91 the inbox went quiet, the consultants moved on to the next engagement, and the ops lead opened the dashboard about twice more before she stopped opening it at all. The system didn't break. The team just stopped using it. By month seven it might as well not have existed. The invoice was paid. The build was done. Nobody owned what came next.
If you're skimming, the bolded beats carry the article. If you've got ten minutes, the rest is where the actual answer lives, including a real Day 23 metrics scenario about halfway down that's the most important section in this piece. We mean it.
This is the failure mode an August 2025 MIT NANDA study tracked. 95% of enterprise GenAI pilots produced zero P&L impact. The core issue, per the report, was the "learning gap for both tools and organizations." Translation: the build worked in the demo. The build worked in the first week. Then production data drifted, vendor pricing moved, and the team that was supposed to own it had been quietly assigned three other priorities. The same MIT study found that buying from specialized vendors and building partnerships succeeded about 67% of the time, while internal builds succeeded only one third as often. The 67/33 split is not about the model. It's about which version of the agency was still in the room on Day 23.
We built the 90-Day Playbook because we got tired of inheriting clients from agencies who priced the build and forgot the rest of the year. The playbook is the rest of the year, written down.
Day 1-7. The handoff that isn't an email
The first week is not a handover email. It's a protocol.
We name a single internal owner before launch day. Not a committee. One person with the authority and the calendar to run the system. This is the rule we won't bend on: name your internal owner before you hire, give them real authority and real accountability, or don't start. If we can't get one name during scoping, we stop. We've walked away from contracts because the answer to who owns this on Monday was we'll figure that out later. "Later" is the most expensive word in this category.
The handoff itself is documented. A runbook with the workflow logic, the prompts, the credentials, the failure modes, and the rollback plan. Three video walkthroughs at most, because more than three nobody watches. Access transfer with every key rotated. The baseline metrics locked in. Whatever the system is supposed to do, we measure it on Day 1, so we know exactly what drift looks like by Day 30.
Day 7 is also the first review call, and it's not what most agencies run. We sit with the internal owner and watch them operate the system for an hour. No coaching from us. We are looking for the small fumbles they wouldn't mention on a status call. Those fumbles are the rest of the engagement.
Day 8-30. Drift hides here
The first month after launch is when drift hides. The system works on the demo data. It works on the first week of production data. Then production data drifts, a vendor changes their API, a model gets deprecated, and the team starts using the system for an edge case nobody scoped. None of it announces itself.
We watch six metrics daily for the first 30 days. Throughput. Error rate. Manual override rate. Average response time. Token cost per run. Champion engagement, measured by whether the named owner is actually logging in. Manual override rate is the one that catches the most rot. If the team is bypassing the system to do it the old way, the build is failing and nobody has told us yet.
Help Net Security observed in early 2026 that roughly one in five AI systems showed material drift inside the first 60 days post-launch. Drift is not failure. Drift is the system doing exactly what it was built to do while the world around it changes. The job in this phase is prompt tuning, threshold adjustment, and catching vendor pricing or API changes before they become outages.
Most agencies don't do this work, because it isn't glamorous and it doesn't bill well as a separate engagement. We price it into the build because it's the actual difference between a system that's running at Day 90 and a system that's been quietly retired by Day 23.
Day 31-60. The conversation that gets harder
By Day 31, every edge case the original scope missed has shown up. This is where the conversation gets harder, and where most agencies pull the bait and switch.
There are three categories of edge case, and each one has a different rule.
One: edge cases the build should have handled and didn't. We fix those inside the engagement. No upcharge. The original scope was the contract. If we missed something obvious during scoping, that's on us.
Two: edge cases the team is asking the system to do that were never in scope. We surface those, price them honestly, and let the client decide. This is where most agencies pull the bait. They scope a narrow build, watch the team use it for more, then sell the expansion at panic prices in month three. We document the request, give a real number with a real timeline, and the client says yes or no on the facts.
Three: edge cases that reveal the original scope was wrong. The team thought they needed lead scoring. The actual problem was lead routing. The audit missed it. We surface that on Day 35, not Day 90. The honest conversation is always faster than the polite one.
The Big 4 version of this failure has been described plainly in 2026 industry writeups: pilots that don't scale, strategies that don't stick, clients who nod through the final presentation and then quietly file the deck. The reason pilots don't scale is rarely the model. It's that nobody ran the edge case triage. The pilot worked in the demo and broke on contact with the seventh real edge case. Day 31-60 is the phase that catches that, and we think it's the most overlooked stretch of the entire engagement, because there's no shiny artifact at the end of it. Just decisions, written down, signed off.
Day 61-90. Designing for our own exit
By Day 61 we are designing for our exit. The independence test happens on Day 75, and it's specific enough that we want to spell it out.
What "watch" actually means. For two weeks the named internal owner runs the system entirely without us. We monitor the same six metrics from the drift detection phase, plus we log every Slack message, every email, every "quick question" the owner sends to our support channel. We do not respond to operational questions during the 14 days. Genuine outages we step in for. Everything else we record and address at the end.
The pass/fail bar. Pass means three things at once: the six metrics stay inside the bands they held during Days 31-60, the owner reaches for our channel fewer than three times in the two weeks, and the team's usage of the system stays at or above the Day 30 baseline. Anything less than all three is a fail, and a fail is information, not embarrassment.
The artifact at the end. A two-page independence report. Page one is the metrics with annotations on every dip or spike. Page two is the gap list, ranked, with closure plans for each item. The owner signs it. We sign it. That report is what either ends the engagement cleanly or extends it for the right reasons.
The independence test on Day 75 is how we prevent the Day 91 silent exit.
The last 15 days of the engagement close whatever the independence test surfaced. By Day 90 the client has one of three real outcomes: a system they can run themselves with a runbook that survives the owner leaving, a renewed retainer because they want us watching, or an honest acknowledgement that the build needs work and a concrete plan for what comes next. All three are real outcomes. The default is not "we disappear and you find out at month four."
Curious whether your situation fits this playbook?
The assessment screens us as much as it screens you. If your situation doesn't fit the 90-Day Playbook, we'll tell you in the results, not three months into a contract. Fifteen minutes. Free. Written report by Monday.
Take the 5-min assessmentTheater vs. the playbook
A lot of agencies will sell you what sounds like the playbook and turns out to be theater. The shape is the same. The substance is missing.
| Theater | The playbook | |---|---| | "We'll be on retainer for the first 90 days." | Four phases, each with specific deliverables and a Day 75 independence test. | | Status meeting once a month. | Six metrics watched daily for the first 30 days. | | Tickets logged if something breaks. | Prompt tuning + vendor monitoring + edge case triage as ongoing work. | | "We'll hand off documentation at the end." | Runbook written before launch, tested under independence on Day 75. | | Day 91 is a renewal email. | Day 91 is a decision named on Day 75, never a surprise. |
The consulting version of the theater model gets described the same way across every 2026 industry writeup we've read: the partner who sold the engagement is exceptional, the team that delivers it is more variable, and when that partner moves on to the next engagement, continuity disappears. That's the model in one paragraph. Sell senior, deliver junior, exit before anyone notices. It's also why MIT NANDA found internal builds succeed only about a third of the time. The agency that promised to be your partner was your partner until the next pitch.
We've watched it firsthand. A client we inherited last year got exactly this from a Big 4 firm. The consultants had already moved on to the next client by Day 91. The system was technically delivered. The team that knew how it worked was two states away on a different engagement. The build went dark inside six months. We've stopped being surprised when this is the story we walk into.
The Air AI bankruptcy in 2025 and the wave of vendor disappearances tracked by the AI Graveyard project (where dozens of AI vendors have shut down, been acquired, or disappeared in the last year) are the extreme version of the same pattern. Vendor risk is real, and it has to be designed for, not hoped away. Designing for it means the build can survive the agency. The playbook is how we design for it.
The Day 91 conversation
Day 91 is a conversation, not a default. It sounds like one of three things, and we know which one it'll be by Day 75.
Renewal sounds like: "We want you watching this for another quarter, and here's the next workflow we want to scope." We walk through the metrics, the retainer scope, and the next build. The retainer is optional. Most clients keep it the first year because vendors raise prices, models get deprecated, and APIs change, and watching that is genuine work.
Release sounds like: "The runbook is solid, the owner can run it, we'll check in at the quarter." We close the engagement, archive the project files with the client, and they own the system outright. We've done this. It's a successful outcome. The agency does not have to be the answer forever, and pretending otherwise is the thing that breaks trust.
Silence is the outcome we plan against. The clients who go silent on agencies go silent because the agency surprised them at month three. The independence test on Day 75 is how we remove the surprise. By the time Day 90 arrives, we already know which of the three conversations we're having, and so does the client. The Day 90 meeting becomes a confirmation, never a discovery.
Why agencies skip the post-launch work
Most agencies are optimized for new-project revenue. The math is brutal. A new project bills five to ten times what a retention retainer brings in the same month. The incentive is to ship the build and pitch the next one. Retention work shows up on the agency P&L as a tax on growth.
It also shows up on the client P&L as a tax on results when it's missing. The MIT 67/33 finding is the proof. Specialized vendors who stay win twice as often as internal builds. That delta isn't the model. It's the agency that noticed on Day 23 that token cost per run had jumped 34% because a vendor changed their pricing tier. That kind of watching doesn't happen on the agency that already pitched the next deal in week six.
We price the 90 days into the build because the build is not done at go-live. The build is done at Day 91, when the system either renews, releases cleanly, or honestly fails into a smaller scope. Anything before Day 91 is still the demo.
What to do tomorrow
Whether you hire Wordwise, hire someone else, or build internally, three actions move the dial inside the next week.
One. Name the internal owner of every AI system you've already deployed. One person, real authority, calendar time blocked. If you can't name one, that system is on a clock and the clock has been running.
Two. Pull the last 30 days of metrics for every AI workflow you have in production. Throughput, error rate, manual override rate at minimum. If those numbers don't exist anywhere, that absence is the first symptom. The system that nobody is watching is the system that nobody owns.
Three. Put the Day 91 conversation on your calendar for every active vendor engagement. Three months before the contract ends, not after. The vendors who let Day 91 sneak up on you are the ones planning to disappear quietly. The vendors who put it on the calendar early are the ones planning to stay.
Why we even built this
Here's the thing about post-launch playbooks. They aren't really about Day 90.
We didn't write the 90-Day Playbook because the world needed another agency methodology document. The world has plenty. We wrote it because the conversation the playbook forces is the actual product. When an owner walks through the four phases with us during scoping, the contract isn't the point. The point is that they now know who's going to be looking at the dashboard on Day 23. They know what manual override rate means and why it's the leading indicator. They know what the Day 75 independence test will ask of their named champion. They couldn't have answered any of that the morning they signed. By the afternoon they signed, they could.
That's the playbook. The four phases are the artifact. The honest conversation it forces, three months before anyone normally has it, is the work.
We benefit when SMB owners run this on their own existing vendor engagements and notice what's missing. We benefit more when an owner who never works with Wordwise still ends up naming an internal champion and pulling 30 days of metrics on a Tuesday afternoon in June, because a framework was findable.
See whether your situation fits the playbook
Fifteen minutes. Free. Written report by Monday. No call. No sequence.
Take the 5-min assessmentFAQs
What does post-launch ownership cost on top of the build?
We price the 90-Day Playbook into the build, not on top. The fixed engagement covers all four phases described in this piece. After Day 91 clients pick from three paths. Release means we leave a documented system and a trained internal owner and you carry it forward without us. Retainer means we stay on for monthly check-ins, drift detection, and the next quarter of small changes. Rebuild means we widen scope into the next workflow. The retainer is optional. About two thirds of clients keep it for the first year because vendors raise prices, models get deprecated, and APIs change. Watching that is real work. The other third take the documented system internal and check in quarterly. Both are honest outcomes. The shape of the next 90 days is a conversation we have on Day 75, never a surprise on Day 91.
How is the playbook different from a normal support contract?
A support contract starts when something breaks. The 90-Day Playbook starts the morning the system goes live. We are reading metrics on Day 8, not waiting for a ticket on Day 60. Most support contracts cover bug fixes and uptime. Ours covers prompt tuning, vendor changes, model deprecations, edge case triage, and a documented independence test on Day 75. Support is reactive. The playbook is structured. The pattern we keep seeing in this space is that relationships live or die in the first 90 days, and a contract that only activates on a broken endpoint will never catch a system that is quietly being abandoned. The most expensive failure mode in AI builds is not a system that breaks. It is a system that nobody opens on Tuesday. Support contracts do not protect against that. The playbook does.
Why 90 days and not 60 or 180?
Sixty days is too short to see real drift. The first month after launch is a honeymoon. Edge cases have not arrived. Vendor pricing has not shifted. The internal champion is still genuinely excited. By Day 90 you have seen at least one model update, at least one edge case the original scope did not anticipate, and at least one Tuesday morning where the team almost reverted to the old way. Those three events are the real test. One hundred eighty days is too long to stay on a fixed engagement without renegotiating scope. The 90-day window also matches the period where most AI initiatives without a named internal champion die, the majority inside the first year [estimate, based on our engagement intake]. Ninety days is long enough to find out and short enough to decide what comes next.
What happens if we want to take the system internal after Day 90?
Good. That is one of the three exits we plan for from Day 1. On Day 75 we run the independence test. The named internal owner operates the system for two weeks without us touching anything. We log every moment they reach for the support channel. Those moments are the gaps. The last 15 days of the engagement close them one by one. On Day 91 you have a working system, a trained owner, and a runbook that survives that owner leaving. That is the version of independence most consulting engagements promise on the cover page and almost none deliver by the back page.
What if the system is not working at Day 75?
Then we tell you. The independence test is also a working test. If the system cannot survive two weeks without us, it will not survive the next quarter with us either. By Day 75 we know whether the build is doing the job we scoped it for. If it is not, the answer is rarely "add more AI." It is usually that the scope was wrong, the data was not ready, or the named owner is the wrong person for the role. We surface that on Day 75 with the metrics to back it up. Then we have the harder conversation. Sometimes that means a partial refund. Sometimes it means rescoping the build into something smaller. Sometimes it means we built the right thing and the business has changed underneath us. Honest is faster than spinning.
What signals do you watch in the daily drift detection phase?
Six metrics, every day, for the first 30 days. Throughput, the number of runs the system completes per day. Error rate, the percent that fail or get flagged. Manual override rate, the percent where a human takes over and finishes the job by hand. Average response time, the median latency from input to output. Token cost per run, which catches vendor pricing changes inside 48 hours instead of inside an invoice. Champion engagement, measured by whether the named internal owner is actually logging in. Manual override rate is the one that catches the most rot. If the team is bypassing the system to do it the old way, the build is failing and nobody has told us yet. We log all six in a dashboard the client owns, not in a deck we present monthly. Day 23 is when the first vendor price bump usually shows up. Day 28 is when the first edge case the scope missed usually arrives. We are watching for both.
Keep reading
The Wordwise AI Readiness Score: how to know if you're ready before you spend a dollar
95% of SMB AI pilots fail. The Wordwise AI Readiness Score takes five minutes and tells you whether to audit, build one workflow, or go cross-functional.
Roni Ravikumar · 17 min read
The Wordwise build stack: what we use, why, and when we'd choose differently
The exact stack we use to ship AI automation for SMBs. n8n, Claude, Supabase, Resend, Cal.com, Cloudflare. Costs, trade-offs, when we'd pick differently.
Roni Ravikumar · 20 min read
Anthropic just spent ten cities pitching Claude to small business. Here's what you should actually do.
Anthropic took Claude on a 10-city tour pitching small business owners on AI. The tour answers the wrong question. Here's the conversation we'd run with any owner before they buy a single seat.
Roni Ravikumar · 13 min read