Practice / AI in Production

AI systems fail at the deployment architecture, not the model.

Every AI deployment failure has an official story: wrong model, insufficient data, hallucinations, misaligned expectations. The structural explanation is usually different — and usually invisible until the system is already in use.

Series
A — AI in Production
Cases
Method
CDSA

This is relevant if

The situation looks like this
Eval scores improved. Something else broke that no metric was tracking.
The model performs well in isolation. In production, the context is different — and so is the output.
Human reviewers are still in the loop. They stopped reading the outputs weeks ago.
The same class of error keeps returning after every prompt fix or model update.
The pilot worked. That's exactly why it's still a pilot eight months later.
Less relevant if
You are still selecting a model or comparing providers.
The problem is clearly scoped and the path to fixing it is visible.
You need prompt engineering guidance, model fine-tuning advice, or standard MLOps practices.
The constraint is compute budget or timeline, not structural architecture.
01

What this series is about

The diagnostic frame

AI deployments produce a specific class of structural contradiction: a system optimized for one thing sits inside an operational context that requires something different — and the gap between them is never visible at the model level.

Evaluation metrics, human-in-the-loop designs, and retrieval architectures all make sense when designed. The contradiction appears in what they cannot hold simultaneously: accuracy and context, oversight and throughput, capability and operational constraints. This series documents that gap — how it forms, where it migrates, and why it keeps returning after fixes.

02

What we look at

Three structural zones where AI deployment contradictions concentrate.

01
Evaluation architecture

The metric improves. The system degrades. The contradiction is not a measurement error — it is a missing layer: what the metric optimizes for and what the deployment actually needs are different problems.

02
Capability–context gap

The model can do the task. The operational context — privacy constraints, human review cycles, legacy integrations — changes what the model is actually asked to do. The gap is architectural, not technical.

03
Feedback dynamics

The system learns from what users do. What users do is shaped by the system. The feedback loop has no mechanism to distinguish between useful signal and the artifact it is producing. The trap closes gradually and invisibly.

03

Cases

Each case is a documented structural contradiction — not a failure story, but an architectural analysis of what made the failure structurally predictable.

04

From cases to intervention

Each case points to a specific structural move — not a fix, but a layer that was missing.

Evaluation architecture
Eval scores went up. Users got more frustrated.
The missing layer is metric–goal alignment as a design decision, not an assumption. The intervention is to make it explicit before optimizing: what the metric measures and what the deployment needs must be specified as separate problems — then checked for structural compatibility.
Capability–context gap
Human reviewers are in the loop. The loop moves without them.
The missing layer is throughput–oversight compatibility. The intervention is to make it explicit before deployment: the review architecture must be designed for the actual cognitive load and decision frequency — not for the idealized version of human oversight.
Feedback dynamics
The system kept getting better at giving people what they wanted. That was the problem.
The missing layer is feedback loop audit — a mechanism that can distinguish between satisfied users and narrowed users. The intervention is to make it explicit: what the system is learning from, and whether that signal is a proxy for the actual goal.
Pattern — across all three zones

The absent layer determines where the contradiction migrates.

In each case, the structural contradiction moves to the next available layer — the one nobody explicitly designed. In evaluation architecture it is metric–goal alignment. In oversight design it is cognitive load at scale. In feedback systems it is the difference between satisfaction and constriction. The layer is absent not by accident, but because the deployment architecture had no place for it. This is what makes the conflict structurally predictable and locally invisible at the same time.

Diagnostic rule: when the same class of error keeps returning after model updates or prompt fixes, look for the layer that was never designed — not the output that was wrong.

Across domains

The structural pattern
The same class of contradiction appears across different systems. AI in production is where the cost becomes visible first.
AI in Production
Model capability vs operational context. Evaluation metrics optimize one thing while degrading another. The absent layer is usually evaluation architecture or oversight design.
ERP
System logic vs organizational logic. The gap gets encoded into customizations, workarounds, and unowned data. The absent layer is reconciliation ownership.
Decision systems
Speed vs completeness. Clarity vs accuracy. The contradiction is in the output layer — not in the decision logic itself.
Back to
Practice
Also in Practice
ERP in Practice
The method
Foundation