Azure OpenAI
Configure Azure Chat Completions and Responses API models
Stratus includes two built-in Azure OpenAI model implementations. Both implement the Model interface and work with all Stratus APIs (agents, tools, sessions, streaming, etc.).
| Model | API | Best for |
|---|---|---|
AzureResponsesModel | Responses API | Recommended. Latest API format with full feature support |
AzureChatCompletionsModel | Chat Completions | Legacy support, widest compatibility |
AzureResponsesModel
The recommended model for new projects. Uses the Azure Responses API.
import { AzureResponsesModel } from "stratus-sdk/azure";
const model = new AzureResponsesModel({
endpoint: "https://your-resource.openai.azure.com",
apiKey: "your-api-key",
deployment: "gpt-5.2",
apiVersion: "2025-04-01-preview", // optional, this is the default
});Config Options
| Property | Type | Description |
|---|---|---|
endpoint | string | Required. Any supported endpoint format |
apiKey | string | Required. API key for authentication |
deployment | string | Required. Sent as model in request body |
apiVersion | string | API version (default: "2025-04-01-preview") |
AzureChatCompletionsModel
Uses the Azure Chat Completions API. Use this if your deployment doesn't support the Responses API.
import { AzureChatCompletionsModel } from "stratus-sdk/azure";
const model = new AzureChatCompletionsModel({
endpoint: "https://your-resource.openai.azure.com",
apiKey: "your-api-key",
deployment: "gpt-5.2",
apiVersion: "2025-03-01-preview", // optional, this is the default
});Config Options
| Property | Type | Description |
|---|---|---|
endpoint | string | Required. Any supported endpoint format |
apiKey | string | Required. API key for authentication |
deployment | string | Required. Model deployment name |
apiVersion | string | API version (default: "2025-03-01-preview") |
Both models are interchangeable. Swap one for the other without changing any agent, tool, or session code.
Endpoint Formats
Pass any Azure endpoint URL as endpoint — the SDK auto-detects the type and builds the correct request URL.
// Azure OpenAI
endpoint: "https://your-resource.openai.azure.com"
// Cognitive Services
endpoint: "https://your-resource.cognitiveservices.azure.com"
// AI Foundry project
endpoint: "https://your-project.services.ai.azure.com/api/projects/my-project"
// Full URL (used as-is, deployment and apiVersion are ignored)
endpoint: "https://your-resource.openai.azure.com/openai/deployments/gpt-5.2/chat/completions?api-version=2025-03-01-preview"Trailing slashes are normalized automatically.
Non-OpenAI Models (Model Inference API)
AzureChatCompletionsModel works with any model deployed through the Azure AI Model Inference API, not just OpenAI models. Pass the full Model Inference URL as the endpoint and the model name as the deployment:
import { AzureChatCompletionsModel } from "stratus-sdk/azure";
const model = new AzureChatCompletionsModel({
endpoint: "https://your-resource.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview",
apiKey: "your-api-key",
deployment: "Kimi-K2.5", // model name sent in request body
});The deployment value is sent as the model field in the request body, which the Model Inference API uses to route to the correct model. All Stratus features (tools, streaming, handoffs, sessions, etc.) work with any model that supports the Chat Completions format.
Not all models support every feature. For example, some models don't support tool calling or structured output. The SDK will surface the API error if an unsupported feature is used.
Tested Models
The following non-OpenAI models have been verified with AzureChatCompletionsModel:
| Model | Tools | Structured Output | Streaming | Handoffs |
|---|---|---|---|---|
| Kimi-K2.5 | Yes | Yes | Yes | Yes |
| Kimi-K2-Thinking | Yes | Yes | Yes | Yes |
Usage
Both models implement the Model interface and work identically with all Stratus APIs:
// With run()
const result = await run(agent, "Hello", { model });
// With createSession()
const session = createSession({ model, instructions: "..." });
// With prompt()
const result = await prompt("Hello", { model });Model Interface
Any model provider can be used with Stratus by implementing the Model interface:
interface Model {
getResponse(request: ModelRequest, options?: ModelRequestOptions): Promise<ModelResponse>;
getStreamedResponse(request: ModelRequest, options?: ModelRequestOptions): AsyncIterable<StreamEvent>;
}
interface ModelRequestOptions {
signal?: AbortSignal;
}The options parameter is optional and backward compatible. When provided, signal is used for request cancellation.
ModelRequest
interface ModelRequest {
messages: ChatMessage[];
tools?: ToolDefinition[];
modelSettings?: ModelSettings;
responseFormat?: ResponseFormat;
}ModelResponse
interface ModelResponse {
content: string | null;
toolCalls: ToolCall[];
usage?: UsageInfo;
finishReason?: FinishReason;
}UsageInfo
interface UsageInfo {
promptTokens: number;
completionTokens: number;
totalTokens: number;
cacheReadTokens?: number;
cacheCreationTokens?: number;
reasoningTokens?: number;
}Cache token fields are populated when the Azure API returns prompt caching details. reasoningTokens is populated for reasoning models (o1, o3, etc.) from completion_tokens_details.reasoning_tokens (Chat Completions) or output_tokens_details.reasoning_tokens (Responses API). All optional fields are undefined when not active.
Prompt Caching
Both models support Azure's automatic prompt caching. Cache hits appear as cacheReadTokens in UsageInfo and are billed at a discount. Use promptCacheKey in ModelSettings to improve hit rates:
const agent = new Agent({
name: "assistant",
model,
modelSettings: {
promptCacheKey: "my-app-v1",
},
});Both AzureChatCompletionsModel and AzureResponsesModel parse cached token counts from their respective response formats.
Authentication
Both implementations use api-key header authentication. The API key is sent as a header with every request.
Streaming
Both models use Server-Sent Events (SSE) with a shared zero-dependency parser. Events are yielded as StreamEvent objects as they arrive from the Azure API.
Error Handling
Both models throw the same errors for failure modes:
ModelError- General API errors (4xx/5xx responses)ContentFilterError- Azure content filter blocked the request or response
import { ModelError, ContentFilterError } from "stratus-sdk/core";
try {
const result = await run(agent, input);
} catch (error) {
if (error instanceof ContentFilterError) {
// Handle content filter
} else if (error instanceof ModelError) {
console.error(`API error ${error.status}: ${error.message}`);
}
}Both models also retry on 429 (rate limit) responses with exponential backoff, respecting the Retry-After header when present.
Last updated on