2 June 2026
From Pilot to Production: Why 90% of AI Projects Stall (and How to Avoid It)
By We Are Heylo
The number gets quoted endlessly: somewhere between 80% and 90% of AI projects never reach production. Different surveys, different sectors, similar number. The interesting question is not whether the number is real (it is), but what's actually driving it.
Having shipped AI inside regulated UK pharmacies and watched plenty of other people's pilots fail, the reasons are surprisingly consistent. They're not technical, mostly. They're operational and structural. This is a practical guide to what kills production deployment, and the moves that get a project across the canyon.
The five reasons pilots don't ship
After looking at maybe forty AI pilots across various businesses, the reasons they fail to reach production cluster into five patterns.
1. The data was good enough for a pilot, not for production
The pilot used a cleaned subset of data, curated for the demo. Production needs to handle the full, messy, real data. The pilot took 30 days. Cleaning the production data takes 5 months and nobody scoped it.
The fix. During Phase 0, score the production data conditions, not the pilot data conditions. Plan the data engineering work explicitly as part of the build. Budget 25 to 40% of project effort for data work, not 5%.
2. Nobody owns the system in production
The team that built it disbanded after the pilot. Operations doesn't know how to monitor it. Engineering doesn't know how to fix it when it breaks. Within 60 days, the system is degrading and no-one is responsible.
The fix. Name the production owner at the start of Phase 0. Involve them in design decisions. Plan the handover during Phase 1, not after. Document the system so the handover is actually possible.
3. There's no production-quality evaluation
The pilot was evaluated on cherry-picked examples. Production needs to handle everything users throw at it. When the first awkward case shows up, nobody can tell whether the AI is performing as expected or has quietly degraded.
The fix. Define evaluation metrics during Phase 0. Instrument them in Phase 1. Run them automatically. Build an evaluation harness that someone non-technical can read.
4. Integration debt
The pilot ran in isolation. Production needs to integrate with five systems, two of which have terrible APIs and one of which has no API at all. The integration work was estimated at 2 weeks. It actually takes 12.
The fix. Map the integrations during Phase 0. Pull actual data through actual APIs as part of the audit, not after. If an integration has no API, factor in the time to build one or change the workflow.
5. The use case was never that important
The pilot ran because someone wanted to try AI. There was no operational metric being moved. After the demo, nobody can articulate why the system should live in production. The project quietly dies because no-one is fighting for it.
The fix. Refuse to start projects that can't name a specific operational metric to move. "Try AI" is not a project. "Reduce mean handling time for tier-2 support tickets by 25%" is.
The canyon between pilot and production
Most AI projects look like this:
- Week 1-6: Pilot built. Demo. Excitement.
- Week 7-12: Stakeholders ask "can we put this into production?"
- Week 13-24: Engineering team realises the work to productionise.
- Week 25-36: Project quietly stalls. The pilot becomes a slide in a deck.
The canyon between week 6 and week 25 is where most of the engineering investment that was supposed to deliver value sits as sunk cost.
The way to avoid the canyon is to build for production from day one. Skip the pilot. Build a thin slice of the production system, deploy it to a small audience, measure real outcomes. The first version is smaller in functionality, but it's a real system, not a prototype.
What "production from day one" actually looks like
A few principles we apply to keep the canyon from forming.
Pick a small but real scope. One workflow, one team, one use case. Not three workflows in parallel.
Build the integration first, the AI second. Wire up the data inputs and the action outputs before the AI capability. Make sure the system can move information end to end with a dumb model. Then upgrade the model.
Deploy to a small audience week 2 or 3. Even with limited functionality. Real users find issues that internal testing never does.
Instrument success metrics from the first deploy. You can't tell if the system is improving if you don't measure.
Have a kill switch. If the AI is doing something unexpected, you should be able to disable it in 60 seconds without taking down the rest of the system.
Plan ongoing operations explicitly. Who monitors it. What's the on-call rota. How do model updates roll out. Where do bugs get filed.
This is unglamorous. It's also what separates AI that ships from AI that doesn't.
The "deploy thin slice early" pattern
If you only remember one heuristic, remember this: deploy a thin slice of the production system to a real audience as early as possible. Week 2 if you can.
A thin slice is the simplest possible end-to-end version. Data ingestion → storage → retrieval → AI inference → action → result. Every layer is present, but each layer is at its simplest.
This works because:
- It exposes integration problems immediately when they're cheap to fix
- It surfaces data issues early
- It builds the operational rhythm before scope is large
- It gives real users a chance to flag problems no engineer will catch
- It proves to stakeholders that the project will reach production
Most projects that ship use some version of this. Most projects that stall didn't.
A practical engineering checklist
If you're about to scope a new AI build, run through this checklist before starting Phase 1.
- One workflow, one named owner, one operational metric
- Production data scoring done (not pilot data)
- Integration touchpoints mapped and prototyped, not assumed
- Evaluation framework defined with concrete metrics
- Kill switch / human override path identified
- Production owner involved from day one
- Realistic timeline for data work (25 to 40% of total)
- Deploy-to-real-users milestone within the first 4 weeks
- Monitoring, logging, and on-call plan written
- Compliance touchpoints (PDPA, sector regulation) documented
If you can't check most of these, the project is at risk of joining the 90% that stall. Take a week to fix the gaps before committing to a build budget.
What we've watched go wrong (and how we avoided it)
A short list of mistakes we've made or watched close-up, and the lesson each one taught.
Built a recommendation engine on pilot data. Couldn't ship because production user IDs didn't match. Lesson: pull production data samples in Phase 0, not pilot data.
Built an excellent classifier. No-one in the business wanted to use it. Lesson: name the user who will consume the output, and validate they want it, before building.
Built a system that worked but was scary to change. Lesson: invest in evaluation infrastructure. If you can't tell whether a change made things better or worse, you stop changing things.
Built a system that depended on one engineer's personal API key. Lesson: production discipline from week one. Service accounts, documentation, on-call.
Built a system without a kill switch. Lesson: kill switch first. Capability second.
The bottom line
90% of AI pilots don't reach production because they were built as pilots, not as production systems. The fix is to refuse to build pilots. Build the smallest possible production-ready slice instead. Deploy it early. Measure it. Operate it. Then grow it. The projects that ship and stay shipped are the ones that treated week 1 like a production week, not a prototype week.
Related work
This article was written by the team at
We Are Heylo
We're an AI consulting and product engineering studio for operators who need the numbers to move. Singapore-based, UK delivery experience.
Related articles
How We Embed in a Business for a Week (the Operational AI Audit)
An honest walkthrough of what happens during our Phase 0 audit week. Who we talk to, what we look at, what we produce, and why most consultants skip this work.
AI Governance in Singapore: PDPA, the Model AI Framework, and What SMEs Need to Know
A practical guide to AI governance for Singapore SMEs. The Model AI Governance Framework, PDPA touchpoints, IMDA's autonomous agent guidance, and what to do this quarter.
Singapore Budget 2026 AI Grants: A Practical Guide for SMEs
What the 2026 Budget actually changed for AI adoption in Singapore SMEs. The 400% tax deduction, the EDG, the PSG, and which to use for what.

