Which model, when. And do we really need Fable?

Every week someone asks me which AI model is the best. It is the wrong question, and we already know it is wrong, because nobody asks which employee is the best. You ask: best at what, and at what cost.

Here is the thing the leaderboard charts hide. As of mid 2026, all three big labs sell the same shape of product line. Once you see the shape, the choice stops being a religion question and becomes a staffing question.

The three tiers

Every lab now offers a frontier tier, a workhorse tier, and a volume tier.

Frontier. Claude Fable 5 ($10 in, $50 out), OpenAI’s GPT-5.5 Pro, Gemini 3.1 Pro and Deep Think. The deepest reasoning, the highest bill.
Workhorse. Claude Opus 4.8 and Sonnet 4.6 ($3 to $5 in), GPT-5.5, Gemini 3.5 Flash. Strong, fast, priced for daily work.
Volume. Claude Haiku 4.5 ($1 in, $5 out), GPT Mini and Nano, Gemini 3.1 Flash-Lite at $0.25 in. Pennies, built for scale.

The tiers matter more than the logos. A workhorse from any of the three labs will do most jobs well. The expensive mistake is not picking the wrong lab. It is running everything on the wrong tier.

The routing rule

Match the model tier to the cost of being wrong, not to the difficulty of the question.

A hard question with a cheap mistake belongs on the workhorse. You will read the answer anyway, and you are the safety net. An easy-looking task with an expensive mistake, a contract clause, a migration plan, an overnight agent run nobody reviews until morning, deserves the frontier tier. The model is not the safety net there. It is the last line.

This is exactly how you staff a programme. The intern does high-volume work that gets checked. The senior engineer does the work where checking is harder than doing. Nobody hires a principal architect to rename variables, yet teams route variable-renaming to frontier models every day and then wonder about the invoice.

Picking between the labs

Honest answers, briefly. Claude is strongest where I work: long agentic coding sessions, instruction following, and prose that does not sound like a press release. This journal runs on it, so weigh my bias accordingly. OpenAI has the broadest product surface, and if your team already lives inside ChatGPT, the default usually beats a better model nobody opens. Gemini is the price aggressor at the volume tier and the natural pick if your organisation is already inside Google Workspace.

The differences between labs are now smaller than the difference between a good and a bad process around them. Benchmark on your own work, not on someone else’s leaderboard. Switching costs are real, and the rankings reshuffle every quarter.

The Fable effect, and do we need it

A new top tier does two things to an organisation. It raises the ceiling: work that used to need a senior human review pass can now survive without one more often. And it quietly raises the floor of spend, because people default to the best model the way they default to business class when someone else pays.

The capability also gets absorbed by more deliberation, not less. Give a team a model that can run forty agents, and they will run forty agents. I have watched a fleet of them burn six hundred thousand tokens deliberating over a 120 word post. The answer was better. The bill was a wedding.

So, do we really need Fable? For ninety five percent of tasks, no. The workhorse tier is genuinely good now, and pretending otherwise is how budgets die. For the other five percent, the tasks where a wrong answer costs more than every token you will buy this year, it pays for itself the first time it catches what you would have missed.

The field checklist

Default every task to the workhorse tier. Sonnet, GPT-5.5, Gemini Flash. Promote only with a reason you can say out loud.
Escalate to frontier when two things are true at once. The task is ambiguous, and the mistake is expensive. One alone does not qualify.
Drop to the volume tier past a thousand calls a day. Classification, tagging, routing and extraction do not need a thinker. They need a clerk.
Re-run your comparison every quarter on your own tasks. The tiers reshuffle every few months. Loyalty is for people, not for model versions.
Budget for checking. The model that needs no review does not exist yet, at any price.

The future is a negative waiting for the right chemicals. The chemicals are cheaper one shelf down.