- Moative
- Posts
- Your AI System Has a Half-Life. Know What It Is?
Your AI System Has a Half-Life. Know What It Is?
The real question is whether you built in a way that makes the next migration cheap

Some decisions are one-way doors: hard to reverse, worth agonizing over. Others are two-way doors: easy to walk back through, worth making quickly and correcting as you learn.
Some decisions are one-way doors: hard to reverse, worth agonizing over. Others are two-way doors: easy to walk back through, worth making quickly and correcting as you learn.
Tursio is an enterprise analytics search product. In 2023, they built it on GPT-4-32k. It worked beautifully. Then OpenAI deprecated that model. They migrated to GPT-4.5-preview, which was deprecated three months later. Now they run on GPT-4.1, which interprets prompts so literally that their carefully tuned system broke in ways nobody noticed at first. Their regression pass rate dropped from 100% to 97%, which meant search queries that once surfaced the right answer now quietly returned the wrong one. (arxiv, July 2025)
Three percent sounds small. When your enterprise customers depend on search accuracy to find the right contract clause, the right policy document, the right compliance record, it is a crisis. And here is the part that should unsettle any leader investing in AI: Tursio picked the best available model each time. The problem was not their choice. The problem was structural.
We are seeing teams that went deep on specific models in late 2023 and 2024 now being punished for having been early. Worse, some of these teams are taking their carefully-tuned prompts, running them on a completely different model, and declaring the new model to be hype when the results disappoint. That is like switching from Spanish to Portuguese and blaming the language when your instructions are misunderstood.
But let us be honest about the other side of this. Early adopters also learned faster, shipped first, and won customers while others were still running evaluations. The cost of prompt migration is real but finite. We have seen migrations take anywhere from two to six weeks of engineering time depending on complexity. The cost of having built nothing while waiting for the ecosystem to stabilize is a different kind of loss entirely. The real question is whether you built in a way that makes the next migration cheap.
Models have dialects
If you have ever managed a team across geographies, you know that the same instruction lands differently depending on who receives it. Some people need explicit step-by-step scaffolding. Others take a brief and run with it. Give the brief-and-run person a detailed checklist and they will ignore half of it. Give the scaffolding person a vague brief and they will fill in the gaps incorrectly.
AI models work the same way. Each model family has developed its own prompting personality. Some (like GPT-4.1) interpret instructions hyper-literally. Others like Claude infer intent generously. Google's Gemini tends verbose and needs firm constraints to stay concise. OpenAI's own prompting guide acknowledges that prompt migration is likely required when moving between model versions within the same family, let alone across providers.
A five-page system prompt tuned for one model does not transfer to another. The Tursio team put it well in their paper: newer models are getting smarter in benchmarks but less flexible. More capable but less forgiving. They do exactly what you tell them, which means you had better say exactly the right thing, for that specific model.
The Tursio team made a sharp observation in their paper: newer models are getting smarter in benchmarks but dumber in flexibility. More capable but less forgiving. They do exactly what you tell them, which means you had better say exactly the right thing, for that specific model.
This is where the emerging discipline of prompt management enters. Prompt engineering is getting a prompt to work. Prompt management is keeping it working as models change underneath you. Most teams have invested heavily in the former. Very few have started on the latter.
The tempting wrong answer
When you see this problem clearly, the instinct is to reach for an abstraction layer. Build a model-agnostic interface. Swap providers like changing a battery.
It sounds right. It is how we solved vendor lock-in in every previous era of enterprise software. But LLMs are not databases or cloud providers. The interface to an LLM is natural language. A traditional software abstraction translates your intent into machine code, which is useful because machine code is hard to write. An LLM abstraction translates your intent into human language, through an intermediate framework. You have added complexity to reach a destination that was already in plain English.
The results speak for themselves. Teams that built elaborate abstraction frameworks in 2023 are in just as much pain today as teams that hardcoded their model calls directly. They traded model lock-in for framework lock-in. In a domain this young, you cannot correctly anticipate what to abstract. An abstraction layer is only useful if you have guessed right about the axes of future change. Right now, nobody is guessing right consistently.
What actually works
Jeff Bezos has a useful framework for thinking about decisions under uncertainty. Some decisions are one-way doors: hard to reverse, worth agonizing over. Others are two-way doors: easy to walk back through, worth making quickly and correcting as you learn.
Model choice is a two-way door. So is your prompt syntax. So is your framework. These will change, probably more than once. The goal is not to prevent that change but to make it cheap.
Your evaluation methodology, on the other hand, is a one-way door. So are your data quality standards and your deep understanding of what “good” looks like for your specific use case. These are the decisions worth agonizing over because they compound over time and they are expensive to rebuild.
This is where evals become the real migration strategy. If you have a robust evaluation suite, one that is task-specific, tied to real failure modes, and validated against human judgment, you can swap models and know within hours whether the new one works. Without evals, no amount of architecture saves you. You are flying blind on every migration.
The Tursio team learned this the hard way. They built what they call a "migration testbed": queries categorized by complexity, edge cases that had failed before, automated regression checks across model versions. What had been a months-long trial-and-error process became a repeatable multi-week cycle. The testbed, and not any abstraction layer, is what made their third migration tolerable.
The half-life question
Every decision in an AI system has a half-life. Some are long: your evaluation methodology, your data quality practices, your domain understanding, your team's accumulated intuition about what good looks like. Some are shockingly short: your model choice, your prompt syntax, your framework dependencies.
The teams that will thrive over the next two years are the ones that have developed an intuition for which of their decisions are decaying and which are compounding. For leaders, that means knowing which decisions to protect from churn and which to let your team swap freely.
Hold your strong architecture decisions loosely, yes. But hold them loosely in the right places. The skill is knowing the difference between what to grip and what to hold with open hands.
Enjoy your Sunday!
When Bharath Moro, Co-founder and Head of Products at Moative, explains guardrails using a salt shaker and tablecloth, guests usually take notes on their napkins.






