• Moative
  • Posts
  • Productionizing AI Agents: Do what you did with ERP

Productionizing AI Agents: Do what you did with ERP

The real test isn’t the model. It’s how you contain its fragility.

Here’s an uncomfortable truth from Gartner about AI agents: out of thousands of companies claiming to deploy them, only about 130 are real AI agents and the rest are just rebranded bots and RPA workflows. Even more interesting? The best performing ones fail more than 2/3rds of the time. 

But before we talk about why this gap exists, we need to understand what AI agents are. Because until you understand the mechanics, you can’t understand why the journey from proof-of-concept to production is where most AI dreams go to die.

What agents actually do (hint: it’s not magic)

At its core, an AI agent is sophisticated orchestration software built around large language models. But here’s what most vendors won’t explain: the LLM itself can only do one thing - take a context window (think of it as the model’s working memory) and generate text based on that context. That’s it. Everything else that makes an “agent” is traditional software engineering.

When you build systems at scale, you realise that agents have roughly these four capabilities. First, they evaluate the context you provide to generate responses. This is a classic and basic LLM function. Second, over the past couple of years, they have gathered the capabilities to suggest using certain tools to inject more information into that context. So, when you push “Search the web” on your ChatGPT app or web page, you are not actually asking the LLM to do these things. What you are saying is that a tool needs to be called. This tool calling is traditional software.

Sidebar: Here are the steps that happen each time an agent uses a tool: The software defines what tools are available. Every system call includes these tool options. The LLM either generates text or recommends calling a specific tool with specific parameters. The orchestration software then decides whether to honor that request: checking permissions, whether the user is allowed to access that data etc. Only then does it actually call the tool and feed the results back to the LLM.

The third capability - and this is critical - is workflow control. The agent software manages how tools get used through rules (“no refunds over $100 without approval”) and statistics (“this request is 99th percentile unusual, and taught to escalate to a human”). Without this layer, you have an uncontrolled system that will inevitably fail in spectacular ways.

The fourth capability? Agents are software programs. They can do anything software can do - maintain memory across conversations, trigger workflows, integrate with your existing systems. But that means they inherit all the complexity of enterprise software development.

Disciplined software > magical thinking

Now that you understand the mechanics, the failures make sense. When Air Canada’s chatbot promised retroactive bereavement fares that didn’t exist, it wasn’t because the AI was “confused.” The agent lacked proper control to verify policy information before making promises. When McDonald’s drive-thru AI added 260 Chicken McNuggets to an order, it wasn’t a model problem - it was a failure in the orchestration layer to apply common-sense boundaries to order quantities.

This isn’t about AI capabilities; it’s about engineering discipline. Every impressive demo you’ve seen was carefully orchestrated, with controlled inputs and predetermined tool access. Production means millions of unpredictable users, each finding new ways to expose the gaps in your flow control. Poor software and systems design = poor agents. It’s that simple.

Context engineering - the art of building the right context window for each interaction - becomes exponentially complex at scale. A customer service agent needs not just the current query, but relevant history, account state, policy guidelines, and tool permissions. Get any piece wrong, and you have either a useless or dangerous system. Remember, ‘wrong’ in one step means a cascading failure in the ensuing steps.

The demo-to-death valley of death

Here’s what happens after the applause dies down from your proof-of-concept presentation. That elegant demo that handles three use cases now needs to handle three thousand. Your context window that worked perfectly with clean test data now has to deal with messy, inconsistent, real-world information.

Production means building permission systems that understand not just whether a tool can be called, but whether this specific user, with their specific role, at this specific time, should be allowed to call it with these specific parameters. It means statistical monitoring to catch when your agent starts behaving abnormally - before it promises every customer a full refund.

You need continuous integration pipelines for real-time monitoring for hallucinations, composable stores for consistent data serving. Each agent requires 500-1,000 word production prompts just to establish proper context. You need fallback mechanisms for when agents fail, because they will fail. Token management alone can mean 10x cost differences if done poorly.

The technical reality is that every AI agent is really three systems: the LLM for generation, the tool orchestration for capabilities, and the control for safety. Miss any one, and you don’t have a production system - you have an expensive experiment.

The expertise gap nobody talks about

This is why companies continue to abandon AI initiatives. The expertise required isn’t just “AI knowledge” - it’s the specific intersection of distributed systems engineering, context engineering, and enterprise integration.

You need engineers who understand that model versioning is as critical as code versioning, who know that context window management is a systems design problem, not an AI problem. You need teams that have learned the hard way that safety control rules will never cover every edge case, so you also need statistical controls. Most importantly, you need partners honest enough to tell you when not to build - when a simpler solution will serve you better than a complex agent.

I’ve seen too many organizations hire “AI experts” who understand models but not systems, or systems engineers who don’t grasp context engineering. The result is always the same: impressive demos that never quite make it to production, or worse, production systems that cause more problems than they solve.

How to go live with agents?

As a business decision maker, start with the obvious and simple use cases. Don’t attempt to show a cool AI feature. Your customers are beyond cool demos. They want demonstrable outcomes that cut costs or increase revenue. Do not overpromise on features or timelines. A simple demo that takes a month, takes 6 months to get to production. 

Make the team inter-disciplinary and all hands on the deck for the first production endeavor – over-communicate, and over-measure. Business, tech, IT,  and Product orgs all being in the same room knowing all calls made every week helps. This isn’t just software engineering where you build out a PRD and a deterministic result comes out on the other end. 

Ironically taking an AI agent to production is like cutting over to production with a multi-location ERP. We may blame ERP consultants for all the right reasons but they got one thing right - the acceptance that software engineering that stitches complex and brittle processes together needs active participation of the business and the translators (functional consultants). For agents, that functional consulting is an LLM with guardrails. You can fire a functional consultant. But you have to make an LLM work.

The more things change, the more they remain the same.

Enjoy your Sunday!

When Bharath Moro, Head of Products at Moative, explains guardrails using a salt shaker and tablecloth, people usually take notes on their napkins.