stratus

Model Settings

Configure temperature, tool choice, and other model parameters

Model settings control how the model generates responses. You set them on the agent at construction time.

Setting on an agent

Pass a modelSettings object when creating an agent:

agent-settings.ts
import { Agent } from "stratus-sdk/core";

const agent = new Agent({
  name: "assistant",
  model,
  modelSettings: {
    temperature: 0.7,
    maxTokens: 1000,
  },
});

Settings are sent to the model on every call the agent makes. To change settings between runs, clone the agent with new values:

clone-settings.ts
const creativeAgent = agent.clone({
  modelSettings: { temperature: 1.2, topP: 0.95 },
});

ModelSettings reference

SettingTypeDefaultDescription
temperaturenumberModel defaultSampling temperature. Higher values (closer to 2) produce more random output. Lower values (closer to 0) produce more deterministic output. Range: 0--2.
topPnumberModel defaultNucleus sampling. The model considers tokens whose cumulative probability exceeds this threshold. Range: 0--1.
maxTokensnumberModel defaultMaximum number of tokens to generate in the response.
stopstring[]undefinedStop sequences. The model stops generating when it produces any of these strings.
presencePenaltynumber0Penalizes tokens that have already appeared, encouraging the model to talk about new topics. Range: -2 to 2.
frequencyPenaltynumber0Penalizes tokens proportional to how often they've appeared, reducing repetition. Range: -2 to 2.
toolChoiceToolChoice"auto"Controls which tools the model can call. See Tool choice.
parallelToolCallsbooleantrueWhether the model can call multiple tools in a single turn.
seednumberundefinedSeed for deterministic sampling. Repeated requests with the same seed and parameters should return the same result.
reasoningEffortReasoningEffortundefinedControls how much reasoning effort the model spends. See Reasoning models.
maxCompletionTokensnumberundefinedMax tokens for the model's completion, including reasoning tokens. Use instead of maxTokens for reasoning models.
promptCacheKeystringundefinedInfluences prompt cache routing. Requests with the same key and prefix are more likely to hit cache. See Prompt caching.

Reasoning models

For reasoning models (o1, o3, etc.), use reasoningEffort and maxCompletionTokens instead of temperature and maxTokens.

reasoningEffort controls how much internal reasoning the model does before responding. Higher effort produces more thorough answers but uses more tokens and takes longer.

reasoning-settings.ts
import { Agent } from "stratus-sdk/core";

const agent = new Agent({
  name: "analyst",
  model,
  modelSettings: {
    reasoningEffort: "high", 
    maxCompletionTokens: 16384, 
  },
});

Valid values for reasoningEffort:

ValueDescription
"none"No reasoning
"minimal"Minimal reasoning
"low"Low effort
"medium"Medium effort (default for reasoning models)
"high"High effort
"xhigh"Maximum effort

maxCompletionTokens includes both reasoning tokens and output tokens. If the model uses 1000 tokens for reasoning and 500 for the response, that's 1500 total against the limit. Reasoning tokens are tracked in UsageInfo.reasoningTokens.

Prompt caching

Azure automatically caches prompt prefixes for requests over 1,024 tokens. Use promptCacheKey to improve cache hit rates when many requests share long common prefixes.

cache-key.ts
const agent = new Agent({
  name: "assistant",
  model,
  modelSettings: {
    promptCacheKey: "support-agent-v2", 
  },
});

Cache hits appear as cacheReadTokens in UsageInfo and are billed at a discount. No opt-in is needed for basic caching — promptCacheKey is only for improving hit rates across requests with shared prefixes.

Tool choice

The toolChoice setting controls whether and how the model calls tools. Set it inside modelSettings.

The default. The model decides whether to call a tool or respond with text.

tool-choice-auto.ts
const agent = new Agent({
  name: "assistant",
  model,
  tools: [getWeather],
  modelSettings: {
    toolChoice: "auto", 
  },
});

Forces the model to call at least one tool. It will not respond with text alone.

tool-choice-required.ts
const agent = new Agent({
  name: "assistant",
  model,
  tools: [getWeather, searchDocs],
  modelSettings: {
    toolChoice: "required", 
  },
});

Prevents the model from calling any tools, even if tools are defined on the agent. The model responds with text only.

tool-choice-none.ts
const agent = new Agent({
  name: "assistant",
  model,
  tools: [getWeather],
  modelSettings: {
    toolChoice: "none", 
  },
});

Forces the model to call one specific tool by name. Useful when you know exactly which tool should run.

tool-choice-function.ts
const agent = new Agent({
  name: "assistant",
  model,
  tools: [getWeather, searchDocs],
  modelSettings: {
    toolChoice: { 
      type: "function", 
      function: { name: "get_weather" }, 
    }, 
  },
});

Tool use behavior

toolUseBehavior is separate from modelSettings. It is set directly on the agent and controls what happens after a tool executes -- not what the model generates.

The default. After a tool executes, the result is sent back to the model so it can generate a follow-up response or call more tools.

behavior-run-again.ts
const agent = new Agent({
  name: "assistant",
  model,
  tools: [getWeather],
  toolUseBehavior: "run_llm_again", 
});

Stops the run immediately after the first tool call completes. The tool's return value becomes the run output. The model is not called again.

behavior-stop-first.ts
const agent = new Agent({
  name: "data-fetcher",
  model,
  tools: [fetchData],
  toolUseBehavior: "stop_on_first_tool", 
});

const result = await run(agent, "Get the latest sales data");
// result.output is the return value of fetchData

This is useful when the agent's only job is to pick and invoke the right tool.

Stops only when a specific tool is called. Other tools feed their results back to the model as usual.

behavior-stop-at.ts
const agent = new Agent({
  name: "researcher",
  model,
  tools: [searchDocs, summarize, finalAnswer],
  toolUseBehavior: { 
    stopAtToolNames: ["final_answer"], 
  }, 
});

The agent can call searchDocs and summarize as many times as it needs. The run stops only when it calls final_answer.

toolUseBehavior is set on the Agent, not in modelSettings. It controls what happens after a tool executes, not what the model generates.

Response format

Structured output is configured via outputType on the agent, not through modelSettings directly. When you set outputType to a Zod schema, Stratus sends the appropriate response_format to Azure automatically.

const agent = new Agent({
  name: "extractor",
  model,
  outputType: z.object({
    name: z.string(),
    age: z.number(),
  }),
});

See Structured Output for details.

Next steps

  • Tools -- define functions the model can call
  • Agents -- agent configuration reference
  • Streaming -- stream responses in real time
  • Hooks -- intercept tool calls and handoffs before they execute
Edit on GitHub

Last updated on

On this page