stratus

Usage & Token Tracking

Monitor token consumption across agent runs

Every agent run tracks token usage. Access RunResult.usage to monitor costs, enforce limits, and debug consumption. Usage is aggregated across all model calls in a run, including tool loops.

Accessing Usage

After a run() completes, the result includes accumulated usage across all model calls:

usage.ts
import { Agent, run } from "stratus-sdk/core";

const agent = new Agent({ name: "assistant", model });
const result = await run(agent, "Explain TypeScript generics");

console.log(result.usage.promptTokens);      // Total prompt tokens
console.log(result.usage.completionTokens);  // Total completion tokens
console.log(result.usage.totalTokens);       // Sum of prompt + completion
console.log(result.usage.cacheReadTokens);   // Tokens read from cache (if available)
console.log(result.usage.cacheCreationTokens); // Tokens written to cache (if available)

UsageInfo Reference

PropertyTypeDescription
promptTokensnumberTotal tokens in the prompt (system + messages + tool definitions)
completionTokensnumberTotal tokens generated by the model
totalTokensnumberSum of promptTokens and completionTokens
cacheReadTokensnumber?Tokens served from Azure's prompt cache
cacheCreationTokensnumber?Tokens written to the prompt cache
reasoningTokensnumber?Tokens used for internal reasoning (reasoning models only)

Optional fields (cacheReadTokens, cacheCreationTokens, reasoningTokens) are undefined when the model doesn't report them, not 0.

Usage Across Tool Loops

When a run involves multiple model calls (tool calls followed by a final response), usage is the sum across all calls.

Each model call in the run loop adds its tokens to the running total. A run that calls two tools makes at least two model calls - one that produces the tool calls, and one that generates the final response:

tool-usage.ts
import { Agent, run, tool } from "stratus-sdk/core";
import { z } from "zod";

const getWeather = tool({
  name: "get_weather",
  description: "Get weather for a city",
  parameters: z.object({ city: z.string() }),
  execute: async (_ctx, { city }) => `72F in ${city}`,
});

const agent = new Agent({
  name: "weather",
  model,
  tools: [getWeather],
});

const result = await run(agent, "Weather in NYC and London?");

// Usage includes BOTH model calls:
//   1. Model call that produced the tool calls
//   2. Model call that generated the final response
console.log(result.usage.promptTokens);     // ~300 (sum of both calls)
console.log(result.usage.completionTokens); // ~80  (sum of both calls)
console.log(result.usage.totalTokens);      // ~380

Usage in Streaming

When streaming, usage arrives in the done event's response.usage field. This is the usage for a single model call. The aggregated total is available on the final RunResult:

stream-usage.ts
import { Agent, stream } from "stratus-sdk/core";

const agent = new Agent({ name: "writer", model });
const { stream: s, result } = stream(agent, "Write a haiku");

for await (const event of s) {
  if (event.type === "content_delta") {
    process.stdout.write(event.content);
  }
  if (event.type === "done") {
    // Per-call usage from this model response
    console.log("This call:", event.response.usage?.totalTokens); 
  }
}

// Aggregated usage across the entire run
const finalResult = await result;
console.log("Total:", finalResult.usage.totalTokens); 

Tracking Costs

Built-in cost estimator

Use createCostEstimator() to build a cost function from your model's pricing. Pass it as costEstimator in run options to get automatic cost tracking on every result.

costs.ts
import { Agent, run, createCostEstimator } from "stratus-sdk/core";

const estimator = createCostEstimator({ 
  inputTokenCostPer1k: 0.005,
  outputTokenCostPer1k: 0.015,
  cachedInputTokenCostPer1k: 0.0025, // optional: discounted rate for cached tokens
});

const agent = new Agent({ name: "assistant", model });
const result = await run(agent, "Summarize this document", {
  costEstimator: estimator, 
});

console.log(`Cost: $${result.totalCostUsd.toFixed(4)}`); 
console.log(`Turns: ${result.numTurns}`); 

totalCostUsd accumulates across all model calls in the run. Without a costEstimator, it's always 0.

Budget limits

Set maxBudgetUsd to automatically stop runs that exceed a dollar threshold. Requires costEstimator.

budget.ts
import { run, createCostEstimator, MaxBudgetExceededError } from "stratus-sdk/core";

const estimator = createCostEstimator({
  inputTokenCostPer1k: 0.005,
  outputTokenCostPer1k: 0.015,
});

try {
  const result = await run(agent, "Research this topic thoroughly", {
    costEstimator: estimator,
    maxBudgetUsd: 0.50, 
  });
} catch (error) {
  if (error instanceof MaxBudgetExceededError) {
    console.error(`Spent $${error.spentUsd.toFixed(4)} — budget was $${error.budgetUsd.toFixed(4)}`);
  }
}

The budget is checked after each model call. The onStop hook fires with reason: "max_budget" before the error is thrown.

Usage with Sessions

When using sessions, each stream() call produces its own RunResult with usage for that turn:

session-usage.ts
import { createSession } from "stratus-sdk/core";

const session = createSession({ model, instructions: "You are a helpful assistant." });

session.send("What is TypeScript?");
for await (const event of session.stream()) {
  if (event.type === "content_delta") process.stdout.write(event.content);
}

const turn1 = await session.result;
console.log("Turn 1 tokens:", turn1.usage.totalTokens);

session.send("How do generics work?");
for await (const event of session.stream()) {
  if (event.type === "content_delta") process.stdout.write(event.content);
}

const turn2 = await session.result;
console.log("Turn 2 tokens:", turn2.usage.totalTokens); 

// Aggregate across turns manually
const totalTokens = turn1.usage.totalTokens + turn2.usage.totalTokens;
console.log("Session total:", totalTokens);

Each session.result contains usage for that turn only. To track cumulative session usage, sum across turns yourself.

Usage with Tracing

Combine withTrace() with usage tracking for full observability. Model call spans automatically include usage in their metadata:

traced-usage.ts
import { withTrace, run, Agent } from "stratus-sdk/core";

const agent = new Agent({ name: "assistant", model, tools: [getWeather] });

const { result, trace } = await withTrace("weather_request", async () => {
  return run(agent, "What's the weather in Tokyo?");
});

// Run-level usage
console.log("Total tokens:", result.usage.totalTokens);

// Per-span usage from trace metadata
for (const span of trace.spans) {
  if (span.type === "model_call" && span.metadata?.usage) {
    console.log(`${span.name}: ${JSON.stringify(span.metadata.usage)}`); 
  }
}

The trace gives you per-call breakdowns while result.usage gives you the aggregate. Together they show exactly where tokens were spent.

Next Steps

  • Streaming - Stream events include per-call usage in the done event
  • Tracing - Inspect per-span usage metadata for detailed breakdowns
  • Sessions - Track usage across multi-turn conversations
  • Tools - Understand how tool loops affect token consumption
Edit on GitHub

Last updated on

On this page