Model Settings
Configure temperature, tool choice, and other model parameters
Model settings control how the model generates responses. You set them on the agent at construction time.
Setting on an agent
Pass a modelSettings object when creating an agent:
import { Agent } from "stratus-sdk/core";
const agent = new Agent({
name: "assistant",
model,
modelSettings: {
temperature: 0.7,
maxTokens: 1000,
},
});Settings are sent to the model on every call the agent makes. To change settings between runs, clone the agent with new values:
const creativeAgent = agent.clone({
modelSettings: { temperature: 1.2, topP: 0.95 },
});ModelSettings reference
| Setting | Type | Default | Description |
|---|---|---|---|
temperature | number | Model default | Sampling temperature. Higher values (closer to 2) produce more random output. Lower values (closer to 0) produce more deterministic output. Range: 0--2. |
topP | number | Model default | Nucleus sampling. The model considers tokens whose cumulative probability exceeds this threshold. Range: 0--1. |
maxTokens | number | Model default | Maximum number of tokens to generate in the response. |
stop | string[] | undefined | Stop sequences. The model stops generating when it produces any of these strings. |
presencePenalty | number | 0 | Penalizes tokens that have already appeared, encouraging the model to talk about new topics. Range: -2 to 2. |
frequencyPenalty | number | 0 | Penalizes tokens proportional to how often they've appeared, reducing repetition. Range: -2 to 2. |
toolChoice | ToolChoice | "auto" | Controls which tools the model can call. See Tool choice. |
parallelToolCalls | boolean | true | Whether the model can call multiple tools in a single turn. |
seed | number | undefined | Seed for deterministic sampling. Repeated requests with the same seed and parameters should return the same result. |
reasoningEffort | ReasoningEffort | undefined | Controls how much reasoning effort the model spends. See Reasoning models. |
maxCompletionTokens | number | undefined | Max tokens for the model's completion, including reasoning tokens. Use instead of maxTokens for reasoning models. |
promptCacheKey | string | undefined | Influences prompt cache routing. Requests with the same key and prefix are more likely to hit cache. See Prompt caching. |
Reasoning models
For reasoning models (o1, o3, etc.), use reasoningEffort and maxCompletionTokens instead of temperature and maxTokens.
reasoningEffort controls how much internal reasoning the model does before responding. Higher effort produces more thorough answers but uses more tokens and takes longer.
import { Agent } from "stratus-sdk/core";
const agent = new Agent({
name: "analyst",
model,
modelSettings: {
reasoningEffort: "high",
maxCompletionTokens: 16384,
},
});Valid values for reasoningEffort:
| Value | Description |
|---|---|
"none" | No reasoning |
"minimal" | Minimal reasoning |
"low" | Low effort |
"medium" | Medium effort (default for reasoning models) |
"high" | High effort |
"xhigh" | Maximum effort |
maxCompletionTokens includes both reasoning tokens and output tokens. If the model uses 1000 tokens for reasoning and 500 for the response, that's 1500 total against the limit. Reasoning tokens are tracked in UsageInfo.reasoningTokens.
Prompt caching
Azure automatically caches prompt prefixes for requests over 1,024 tokens. Use promptCacheKey to improve cache hit rates when many requests share long common prefixes.
const agent = new Agent({
name: "assistant",
model,
modelSettings: {
promptCacheKey: "support-agent-v2",
},
});Cache hits appear as cacheReadTokens in UsageInfo and are billed at a discount. No opt-in is needed for basic caching — promptCacheKey is only for improving hit rates across requests with shared prefixes.
Tool choice
The toolChoice setting controls whether and how the model calls tools. Set it inside modelSettings.
The default. The model decides whether to call a tool or respond with text.
const agent = new Agent({
name: "assistant",
model,
tools: [getWeather],
modelSettings: {
toolChoice: "auto",
},
});Forces the model to call at least one tool. It will not respond with text alone.
const agent = new Agent({
name: "assistant",
model,
tools: [getWeather, searchDocs],
modelSettings: {
toolChoice: "required",
},
});Prevents the model from calling any tools, even if tools are defined on the agent. The model responds with text only.
const agent = new Agent({
name: "assistant",
model,
tools: [getWeather],
modelSettings: {
toolChoice: "none",
},
});Forces the model to call one specific tool by name. Useful when you know exactly which tool should run.
const agent = new Agent({
name: "assistant",
model,
tools: [getWeather, searchDocs],
modelSettings: {
toolChoice: {
type: "function",
function: { name: "get_weather" },
},
},
});Tool use behavior
toolUseBehavior is separate from modelSettings. It is set directly on the agent and controls what happens after a tool executes -- not what the model generates.
The default. After a tool executes, the result is sent back to the model so it can generate a follow-up response or call more tools.
const agent = new Agent({
name: "assistant",
model,
tools: [getWeather],
toolUseBehavior: "run_llm_again",
});Stops the run immediately after the first tool call completes. The tool's return value becomes the run output. The model is not called again.
const agent = new Agent({
name: "data-fetcher",
model,
tools: [fetchData],
toolUseBehavior: "stop_on_first_tool",
});
const result = await run(agent, "Get the latest sales data");
// result.output is the return value of fetchDataThis is useful when the agent's only job is to pick and invoke the right tool.
Stops only when a specific tool is called. Other tools feed their results back to the model as usual.
const agent = new Agent({
name: "researcher",
model,
tools: [searchDocs, summarize, finalAnswer],
toolUseBehavior: {
stopAtToolNames: ["final_answer"],
},
});The agent can call searchDocs and summarize as many times as it needs. The run stops only when it calls final_answer.
toolUseBehavior is set on the Agent, not in modelSettings. It controls what happens after a tool executes, not what the model generates.
Response format
Structured output is configured via outputType on the agent, not through modelSettings directly. When you set outputType to a Zod schema, Stratus sends the appropriate response_format to Azure automatically.
const agent = new Agent({
name: "extractor",
model,
outputType: z.object({
name: z.string(),
age: z.number(),
}),
});See Structured Output for details.
Next steps
Last updated on