Ask most teams running AI agents how much their system costs per task, and you'll get a blank stare. They know their monthly API bill — roughly — but they can't tell you the cost of a single workflow execution, which model contributed the most to that cost, or whether they're spending money effectively.
This is a problem. Not because costs need to be low, but because costs need to be understood. A system that costs £2 per task is fine if it's replacing a £50 manual process. A system that costs £0.10 per task is wasteful if a £0.01 approach would deliver the same quality.
The anatomy of AI system costs
AI system costs break down into four categories, and most teams only track the first one.
1. Model inference costs
This is what appears on your OpenAI or Anthropic invoice. It's the cost of sending tokens to a model and getting tokens back. It's the most visible cost, but it's rarely the largest when you account for everything else.
What makes inference costs tricky is the variance. A simple classification task might cost £0.001. A complex multi-step reasoning task with a large context window might cost £0.50. If you're sending everything to the same model, you're overpaying on the simple tasks.
The fix: Model routing. Route tasks to the cheapest model that can handle them. For most production workloads, we see an 80/20 split work well — 80% of tasks go to fast, cheap models (GPT-4o-mini, Claude Haiku, or similar), and only 20% escalate to frontier models for genuinely complex reasoning.
interface RoutingPolicy {
defaultModel: 'fast'; // Cheap, quick model
escalationThreshold: 0.7; // Confidence below this triggers escalation
maxCostPerTask: 0.10; // Hard ceiling
fallbackModel: 'frontier'; // For complex tasks
} 2. Orchestration overhead
Every multi-step workflow has overhead beyond the model calls themselves. Task queuing, state management, result aggregation, retry logic. These consume compute resources and add latency.
In serverless architectures (Lambda, Cloud Functions), this overhead is directly billed. In container-based deployments, it's part of your infrastructure cost. Either way, it's real money that most teams don't attribute to their AI system costs.
3. Data costs
AI systems consume data. They read from databases, fetch from APIs, process documents, and store results. The data layer costs include:
- Storage (S3, DynamoDB, vector databases) - Data transfer (API calls, cross-region transfers) - Processing (document parsing, embedding generation) - Caching (to avoid recomputing expensive operations)
Vector database costs are particularly sneaky. Embedding storage and similarity search at scale can easily exceed your model inference costs if you're not careful.
4. Human costs
The most expensive cost in any AI system is the humans who build, maintain, and supervise it. Debugging a production failure at 2am. Reviewing flagged outputs. Updating prompts when model behaviour changes. Training the system on new domains.
These costs are real, even if they don't appear on a cloud bill. A well-governed system reduces human costs by making failures visible, debuggable, and rare.
A cost framework that works
We use a simple framework for every system we build:
Cost per task
Every workflow execution should have a known cost. This means instrumenting every model call, every API request, every data operation. When a workflow completes, you should know exactly what it cost.
This isn't just accounting — it's product design. If you know your cost per task, you can price your service correctly, set budget limits that make sense, and identify optimisation opportunities.
Cost attribution
When your monthly bill arrives, you should be able to attribute every pound to a specific workflow, pipeline, or workspace. "Our AI costs £10,000/month" is useless information. "Pipeline A costs £3,000, Pipeline B costs £6,000, and Pipeline C costs £1,000" is actionable.
Budget controls
Every workspace, every pipeline, every user should have a budget ceiling. Not a soft warning that someone checks weekly. A hard limit that the system enforces in real time.
// Budget enforcement at the orchestration layer
async function executeTask(task: Task, budget: Budget): Promise<Result> {
const estimatedCost = await estimateTaskCost(task);
if (budget.remaining < estimatedCost) {
return { status: 'budget_exceeded', task, estimatedCost };
}
const result = await runTask(task);
budget.spend(result.actualCost);
return result;
} This seems obvious. It's not. Most production AI systems have no budget controls at all. They run until someone notices the bill.
Cost-quality tradeoffs
For every task, there's a relationship between cost and quality. Sending a document classification task to GPT-4 will give you marginally better accuracy than Claude Haiku, but at 20x the cost. Is that marginal improvement worth it?
This is a business decision, not a technical one. But the technical system needs to support it. Your routing layer should let you express preferences like "optimise for cost unless accuracy drops below 95%".
Common cost mistakes
Mistake 1: Using frontier models for everything
The most common mistake. Teams pick GPT-4 or Claude Opus because it gives the best results in testing, then use it for every task in production. This is like hiring a barrister to review your grocery list.
Rule of thumb: Start with the cheapest model. Only escalate when you can demonstrate the cheaper model doesn't meet quality requirements for that specific task type.
Mistake 2: No caching
If your system processes the same or similar inputs frequently, you're paying for the same computation repeatedly. Semantic caching — recognising that a new input is similar enough to a cached one — can reduce inference costs by 30-60% for many workloads.
Mistake 3: Ignoring context window costs
Long context windows are expensive. A 128k context request costs significantly more than a 4k context request. If you're stuffing entire documents into every prompt, you're paying for tokens the model doesn't need.
Better approach: Extract relevant sections first using a cheaper retrieval step, then send only what's needed to the reasoning model.
Mistake 4: No monitoring
If you're not measuring cost per task in real time, you're reacting to bills instead of managing costs. By the time you see the monthly invoice, you've already overspent. Real-time cost monitoring lets you catch anomalies early — a sudden spike in API calls, a new workflow that's unexpectedly expensive, a model routing rule that's not firing correctly.
What good looks like
A well-instrumented AI system gives you a dashboard that shows:
- Cost per task — broken down by model, pipeline, and workspace - Cost trends — daily, weekly, monthly, with anomaly detection - Model utilisation — which models are handling which tasks, and at what cost - Budget status — real-time budget consumption for every workspace - Cost-quality correlation — are you spending more for better results, or just spending more?
This isn't luxury. This is the minimum for running AI in production responsibly. Without it, you're guessing. And in our experience, the guesses are always wrong — usually by an order of magnitude.
The economics of AI systems
The companies that will win with AI aren't the ones with the best models. They're the ones with the best economics. The ones who can run workflows profitably, at scale, without surprise bills.
Building cost awareness into your AI system isn't a constraint. It's a competitive advantage. When you know exactly what every task costs, you can make better decisions about what to automate, how to price your services, and where to invest in optimisation.
The model is the easy part. The economics are the hard part. And the hard part is what separates systems that scale from systems that fail.