Usage & Token Tracking
Monitor token consumption across agent runs
Every agent run tracks token usage. Access RunResult.usage to monitor costs, enforce limits, and debug consumption. Usage is aggregated across all model calls in a run, including tool loops.
Accessing Usage
After a run() completes, the result includes accumulated usage across all model calls:
import { Agent, run } from "stratus-sdk/core";
const agent = new Agent({ name: "assistant", model });
const result = await run(agent, "Explain TypeScript generics");
console.log(result.usage.promptTokens); // Total prompt tokens
console.log(result.usage.completionTokens); // Total completion tokens
console.log(result.usage.totalTokens); // Sum of prompt + completion
console.log(result.usage.cacheReadTokens); // Tokens read from cache (if available)
console.log(result.usage.cacheCreationTokens); // Tokens written to cache (if available)UsageInfo Reference
| Property | Type | Description |
|---|---|---|
promptTokens | number | Total tokens in the prompt (system + messages + tool definitions) |
completionTokens | number | Total tokens generated by the model |
totalTokens | number | Sum of promptTokens and completionTokens |
cacheReadTokens | number? | Tokens served from Azure's prompt cache |
cacheCreationTokens | number? | Tokens written to the prompt cache |
reasoningTokens | number? | Tokens used for internal reasoning (reasoning models only) |
Optional fields (cacheReadTokens, cacheCreationTokens, reasoningTokens) are undefined when the model doesn't report them, not 0.
Usage Across Tool Loops
When a run involves multiple model calls (tool calls followed by a final response), usage is the sum across all calls.
Each model call in the run loop adds its tokens to the running total. A run that calls two tools makes at least two model calls - one that produces the tool calls, and one that generates the final response:
import { Agent, run, tool } from "stratus-sdk/core";
import { z } from "zod";
const getWeather = tool({
name: "get_weather",
description: "Get weather for a city",
parameters: z.object({ city: z.string() }),
execute: async (_ctx, { city }) => `72F in ${city}`,
});
const agent = new Agent({
name: "weather",
model,
tools: [getWeather],
});
const result = await run(agent, "Weather in NYC and London?");
// Usage includes BOTH model calls:
// 1. Model call that produced the tool calls
// 2. Model call that generated the final response
console.log(result.usage.promptTokens); // ~300 (sum of both calls)
console.log(result.usage.completionTokens); // ~80 (sum of both calls)
console.log(result.usage.totalTokens); // ~380Usage in Streaming
When streaming, usage arrives in the done event's response.usage field. This is the usage for a single model call. The aggregated total is available on the final RunResult:
import { Agent, stream } from "stratus-sdk/core";
const agent = new Agent({ name: "writer", model });
const { stream: s, result } = stream(agent, "Write a haiku");
for await (const event of s) {
if (event.type === "content_delta") {
process.stdout.write(event.content);
}
if (event.type === "done") {
// Per-call usage from this model response
console.log("This call:", event.response.usage?.totalTokens);
}
}
// Aggregated usage across the entire run
const finalResult = await result;
console.log("Total:", finalResult.usage.totalTokens); Tracking Costs
Built-in cost estimator
Use createCostEstimator() to build a cost function from your model's pricing. Pass it as costEstimator in run options to get automatic cost tracking on every result.
import { Agent, run, createCostEstimator } from "stratus-sdk/core";
const estimator = createCostEstimator({
inputTokenCostPer1k: 0.005,
outputTokenCostPer1k: 0.015,
cachedInputTokenCostPer1k: 0.0025, // optional: discounted rate for cached tokens
});
const agent = new Agent({ name: "assistant", model });
const result = await run(agent, "Summarize this document", {
costEstimator: estimator,
});
console.log(`Cost: $${result.totalCostUsd.toFixed(4)}`);
console.log(`Turns: ${result.numTurns}`); totalCostUsd accumulates across all model calls in the run. Without a costEstimator, it's always 0.
Budget limits
Set maxBudgetUsd to automatically stop runs that exceed a dollar threshold. Requires costEstimator.
import { run, createCostEstimator, MaxBudgetExceededError } from "stratus-sdk/core";
const estimator = createCostEstimator({
inputTokenCostPer1k: 0.005,
outputTokenCostPer1k: 0.015,
});
try {
const result = await run(agent, "Research this topic thoroughly", {
costEstimator: estimator,
maxBudgetUsd: 0.50,
});
} catch (error) {
if (error instanceof MaxBudgetExceededError) {
console.error(`Spent $${error.spentUsd.toFixed(4)} — budget was $${error.budgetUsd.toFixed(4)}`);
}
}The budget is checked after each model call. The onStop hook fires with reason: "max_budget" before the error is thrown.
Usage with Sessions
When using sessions, each stream() call produces its own RunResult with usage for that turn:
import { createSession } from "stratus-sdk/core";
const session = createSession({ model, instructions: "You are a helpful assistant." });
session.send("What is TypeScript?");
for await (const event of session.stream()) {
if (event.type === "content_delta") process.stdout.write(event.content);
}
const turn1 = await session.result;
console.log("Turn 1 tokens:", turn1.usage.totalTokens);
session.send("How do generics work?");
for await (const event of session.stream()) {
if (event.type === "content_delta") process.stdout.write(event.content);
}
const turn2 = await session.result;
console.log("Turn 2 tokens:", turn2.usage.totalTokens);
// Aggregate across turns manually
const totalTokens = turn1.usage.totalTokens + turn2.usage.totalTokens;
console.log("Session total:", totalTokens);Each session.result contains usage for that turn only. To track cumulative session usage, sum across turns yourself.
Usage with Tracing
Combine withTrace() with usage tracking for full observability. Model call spans automatically include usage in their metadata:
import { withTrace, run, Agent } from "stratus-sdk/core";
const agent = new Agent({ name: "assistant", model, tools: [getWeather] });
const { result, trace } = await withTrace("weather_request", async () => {
return run(agent, "What's the weather in Tokyo?");
});
// Run-level usage
console.log("Total tokens:", result.usage.totalTokens);
// Per-span usage from trace metadata
for (const span of trace.spans) {
if (span.type === "model_call" && span.metadata?.usage) {
console.log(`${span.name}: ${JSON.stringify(span.metadata.usage)}`);
}
}The trace gives you per-call breakdowns while result.usage gives you the aggregate. Together they show exactly where tokens were spent.
Next Steps
Last updated on