Hands On With Orchestra: An Agent Framework That Finally Respects Your Token Budget
We spent two weeks building real workflows on Orchestra, a Singapore-built open-source framework for multi-step AI agents. It is opinionated, fast, and occasionally maddening — and it gets the hard parts right.
- ·Capital rotates out of US/EU equities into hard ASEAN infrastructure.
- ·Data centres, power transmission and ports are the three priority lanes.
- ·Vietnam, Indonesia and the Philippines absorb the largest allocations.
SINGAPORE — Most AI-agent frameworks fail in the same place. They demo beautifully on a three-step task and then collapse into a fog of retries, runaway token bills and unexplained loops the moment you point them at real work. I have built throwaway agents on half a dozen of them, and the graveyard is large.
Orchestra, an open-source framework that came out of a small Singapore team late last year, is the first in a while that survived two weeks of me actively trying to break it. It is not perfect — its opinionated design will irritate anyone who wants to do things their own way — but it gets the genuinely hard parts of agent engineering right, and it does so without pretending the problems are solved.
This is a hands-on review based on building three real workflows: a document-extraction pipeline over messy PDFs, a customer-support triage agent with tool calls, and a research agent that fans out web searches and synthesises results. Here is what held up and what did not.
What it gets right
Orchestra's central idea is a typed graph of steps, where each node declares what it consumes and produces, and the framework enforces those contracts at the boundary between steps. In practice this means a malformed model output fails loudly at the node that produced it, not three steps later in a stack trace that tells you nothing. After years of debugging agents by archaeology, this alone is worth the price of admission.
The second thing it gets right is token accounting as a first-class citizen. Every run produces a ledger: tokens per node, retries per node, cost per node, wall-clock per node. My research agent's first version was quietly spending most of its budget on a single over-eager synthesis step. Orchestra showed me that in one table. Most frameworks make you instrument this yourself, badly.
An agent you cannot see the cost of is an agent you cannot run in production. Orchestra treats the token ledger the way a good database treats the query plan — as something you are meant to look at.
Third, its retry and fallback semantics are sane. You declare, per node, what a failure means and what to do about it: retry with a different prompt, fall back to a cheaper model, escalate to a human, or fail the run. These policies live in configuration, not buried in imperative code, which made my support-triage agent's behaviour legible to a non-engineer on the team for the first time.
Where it fights you
Orchestra is opinionated to a fault. The typed-graph model that makes production agents robust makes throwaway experiments tedious. For a five-minute prototype I do not want to declare schemas for every step, and Orchestra makes me. There is a 'sketch' mode that relaxes this, but it feels bolted on and you will outgrow it the moment the prototype becomes real.
The documentation is the weakest part. The concepts are deep and the guides are shallow; I learned more from reading the test suite than from the official tutorials, which stop exactly where the interesting problems begin. For an early open-source project this is forgivable, but it is a real tax on adoption, and the team should treat it as a priority rather than an afterthought.
Model-provider support is also uneven. The framework is provider-agnostic in principle, with clean adapters for the major APIs, but streaming behaviour and tool-call formats differ enough between providers that I hit edge cases the abstraction did not fully hide. Sticking to one well-supported provider for a given project is the pragmatic move today.
The performance picture
On the workflows I built, Orchestra's overhead was negligible relative to the model calls themselves, which is the only place latency that matters in an agent. The graph executor parallelises independent nodes automatically, and my research agent's fan-out of eight searches ran concurrently without my doing anything clever. That parallelism, free and correct, is harder to get right than it looks.
The token savings were the headline result. After using the ledger to restructure two of my three agents — caching an expensive step, downgrading a model where quality allowed, and cutting a redundant verification pass — I cut the per-run cost of the document pipeline by roughly 40 percent with no measurable quality loss. The framework did not do that for me, but it made the waste visible enough that fixing it was obvious.
Caching deserves a specific mention. Orchestra's content-addressed cache keys on the actual inputs to a node, so deterministic steps are free on re-runs. During development, where you run the same pipeline dozens of times, this turned an expensive iteration loop into a cheap one.
Who should use it
Orchestra is not for the developer who wants to wire up a quick demo and move on; the friction of its type discipline will outweigh the benefit. It is for teams putting agents into production and tired of discovering their cost, failure modes and loops the hard way. If you are running agents that cost real money and must not silently misbehave, Orchestra's opinionated rigour is a feature, not a bug.
It is also a genuinely encouraging sign of where Asia's developer-tools scene is heading. This is infrastructure built by a small team for a hard, universal problem, released openly and good enough to compete with anything coming out of the West. The documentation needs work and the rough edges are real, but the bones are excellent.
Two weeks in, I am keeping it for the production agents and reaching for something lighter for prototypes — which is, in the end, the most honest endorsement I can give a tool: it earned a permanent place in the part of my stack where mistakes are expensive.