stratus

Finish Reasons

Understand why a model stopped generating and how the run loop responds

Every model response includes a finishReason - why the model stopped generating. The run loop uses this to decide what happens next: execute tool calls, return a result, or throw an error.

Finish reason values

ValueMeaningRun loop behavior
stopThe model finished naturally. It produced a complete response.Returns the result. The run is done.
tool_callsThe model wants to call one or more tools.Executes the tool calls, then calls the model again with the results.
lengthThe response was truncated because it hit the maxTokens limit.Returns the partial result. No error is thrown.
content_filterAzure's content filter blocked the request or response.Throws a ContentFilterError. The run does not continue.

How the run loop uses finish reasons

When the model responds, the run loop checks the response and branches:

Model returns a response

The run loop calls the model and receives a ModelResponse containing content, toolCalls, and finishReason.

Check for tool calls

If toolCalls is non-empty (finish reason is tool_calls), the run loop executes all tool calls in parallel, appends the results as tool messages, and calls the model again. This repeats until the model responds without tool calls or maxTurns is exceeded.

No tool calls -- return the result

If toolCalls is empty, the run is finished. The model's text output becomes result.output. The finish reason is stored on result.finishReason -- typically stop or length.

Model response
├── toolCalls present?
│   ├── Yes → execute tools → call model again (loop)
│   └── No  → finishReason is "stop" or "length"
│       └── return RunResult
└── finishReason is "content_filter"?
    └── Yes → throw ContentFilterError

The content_filter finish reason is intercepted at the model layer before the run loop sees it. Both AzureResponsesModel and AzureChatCompletionsModel throw a ContentFilterError immediately, so the run loop never receives a response with finishReason: "content_filter".

Accessing finishReason

From run()

run-finish-reason.ts
import { Agent, run } from "stratus-sdk/core";

const agent = new Agent({ name: "assistant", model });
const result = await run(agent, "What is the capital of France?");

console.log(result.finishReason); // "stop"
console.log(result.output);       // "The capital of France is Paris."

From stream()

stream-finish-reason.ts
import { Agent, stream } from "stratus-sdk/core";

const agent = new Agent({ name: "writer", model });
const { stream: s, result } = stream(agent, "Write a haiku");

for await (const event of s) {
  if (event.type === "content_delta") {
    process.stdout.write(event.content);
  }
  if (event.type === "done") {
    // Per-call finish reason from this model response
    console.log(event.response.finishReason); 
  }
}

const finalResult = await result;
console.log(finalResult.finishReason); // "stop" - from the last model call

From a session

session-finish-reason.ts
import { createSession } from "stratus-sdk/core";

const session = createSession({ model, instructions: "You are a helpful assistant." });

session.send("Explain TypeScript generics");
for await (const event of session.stream()) {
  if (event.type === "content_delta") process.stdout.write(event.content);
}

const result = await session.result;
console.log(result.finishReason); // "stop"

Finish reasons vs errors

A finish reason is part of a successful response. An error means no usable response was produced.

ConditionTypeHow it surfacesRecoverable?
stopFinish reasonresult.finishReasonN/A -- this is the normal case
tool_callsFinish reasonresult.finishReason (of the last call)N/A -- the run loop handles this
lengthFinish reasonresult.finishReasonYes -- increase maxTokens or shorten input
content_filterThrown errorcatch (e) { e instanceof ContentFilterError }Depends -- rephrase the input or output
API failureThrown errorcatch (e) { e instanceof ModelError }Retry or check credentials
TimeoutThrown errorcatch (e) { e instanceof RunAbortedError }Increase timeout or simplify the task
Too many turnsThrown errorcatch (e) { e instanceof MaxTurnsExceededError }Increase maxTurns

A length finish reason is not an error. The run completes successfully, but the output may be incomplete. Always check finishReason if you need to guarantee the model finished its response.

Handling truncated responses

When finishReason is "length", the model hit the token limit before finishing. The output is cut off mid-sentence or mid-thought. Here are your options:

Increase maxTokens -- Give the model more room to respond.

increase-max-tokens.ts
const agent = new Agent({
  name: "writer",
  model,
  modelSettings: {
    maxTokens: 4096, 
  },
});

const result = await run(agent, "Write a detailed analysis of TypeScript's type system");
if (result.finishReason === "length") {
  console.warn("Response was truncated - consider increasing maxTokens");
}

Shorten the input -- Reduce the prompt length so more tokens are available for the response.

Split into multiple calls -- Break a large task into smaller, focused prompts that each fit within the token limit.

Detect and retry -- Check the finish reason and automatically retry with a higher limit.

retry-on-truncation.ts
import { Agent, run } from "stratus-sdk/core";

const agent = new Agent({ name: "writer", model });

let result = await run(agent, "Summarize this document", {
  context: { maxTokens: 1024 },
});

if (result.finishReason === "length") { 
  const retryAgent = agent.clone({
    modelSettings: { maxTokens: 4096 },
  });
  result = await run(retryAgent, "Summarize this document");
}

console.log(result.output);

In streaming

During streaming, the finish reason is not available until the model finishes its response. It arrives in the final done event for each model call.

streaming-finish-reason.ts
import { Agent, stream } from "stratus-sdk/core";

const agent = new Agent({ name: "assistant", model });
const { stream: s, result } = stream(agent, "Tell me a story");

for await (const event of s) {
  switch (event.type) {
    case "content_delta":
      process.stdout.write(event.content);
      break;
    case "done":
      // Available here - one 'done' event per model call
      console.log("\nFinish reason:", event.response.finishReason); 
      break;
  }
}

// Also available on the final RunResult
const finalResult = await result;
console.log("Last finish reason:", finalResult.finishReason);

If the run involves tool calls, you will see multiple done events -- one per model call. The finishReason on the RunResult is always from the last model call in the run.

Next steps

Edit on GitHub

Last updated on

On this page