Azure OpenAI

Stratus includes two built-in Azure OpenAI model implementations. Both implement the Model interface and work with all Stratus APIs (agents, tools, sessions, streaming, etc.).

Model	API	Best for
`AzureResponsesModel`	Responses API	Recommended. Latest API format with full feature support
`AzureChatCompletionsModel`	Chat Completions	Legacy support, widest compatibility

AzureResponsesModel

The recommended model for new projects. Uses the Azure Responses API.

responses.ts

import { AzureResponsesModel } from "stratus-sdk/azure";

const model = new AzureResponsesModel({
  endpoint: "https://your-resource.openai.azure.com",
  apiKey: "your-api-key",
  deployment: "gpt-5.2",
  apiVersion: "2025-04-01-preview", // optional, this is the default
});

Config Options

Property	Type	Description
`endpoint`	`string`	Required. Any supported endpoint format
`apiKey`	`string`	Required. API key for authentication
`deployment`	`string`	Required. Sent as `model` in request body
`apiVersion`	`string`	API version (default: `"2025-04-01-preview"`)

AzureChatCompletionsModel

Uses the Azure Chat Completions API. Use this if your deployment doesn't support the Responses API.

chat-completions.ts

import { AzureChatCompletionsModel } from "stratus-sdk/azure";

const model = new AzureChatCompletionsModel({
  endpoint: "https://your-resource.openai.azure.com",
  apiKey: "your-api-key",
  deployment: "gpt-5.2",
  apiVersion: "2025-03-01-preview", // optional, this is the default
});

Config Options

Property	Type	Description
`endpoint`	`string`	Required. Any supported endpoint format
`apiKey`	`string`	Required. API key for authentication
`deployment`	`string`	Required. Model deployment name
`apiVersion`	`string`	API version (default: `"2025-03-01-preview"`)

Both models are interchangeable. Swap one for the other without changing any agent, tool, or session code.

Endpoint Formats

Pass any Azure endpoint URL as endpoint — the SDK auto-detects the type and builds the correct request URL.

// Azure OpenAI
endpoint: "https://your-resource.openai.azure.com"

// Cognitive Services
endpoint: "https://your-resource.cognitiveservices.azure.com"

// AI Foundry project
endpoint: "https://your-project.services.ai.azure.com/api/projects/my-project"

// Full URL (used as-is, deployment and apiVersion are ignored)
endpoint: "https://your-resource.openai.azure.com/openai/deployments/gpt-5.2/chat/completions?api-version=2025-03-01-preview"

Trailing slashes are normalized automatically.

Non-OpenAI Models (Model Inference API)

AzureChatCompletionsModel works with any model deployed through the Azure AI Model Inference API, not just OpenAI models. Pass the full Model Inference URL as the endpoint and the model name as the deployment:

model-inference.ts

import { AzureChatCompletionsModel } from "stratus-sdk/azure";

const model = new AzureChatCompletionsModel({
  endpoint: "https://your-resource.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview",
  apiKey: "your-api-key",
  deployment: "Kimi-K2.5", // model name sent in request body
});

The deployment value is sent as the model field in the request body, which the Model Inference API uses to route to the correct model. All Stratus features (tools, streaming, handoffs, sessions, etc.) work with any model that supports the Chat Completions format.

Not all models support every feature. For example, some models don't support tool calling or structured output. The SDK will surface the API error if an unsupported feature is used.

Tested Models

The following non-OpenAI models have been verified with AzureChatCompletionsModel:

Model	Tools	Structured Output	Streaming	Handoffs
Kimi-K2.5	Yes	Yes	Yes	Yes
Kimi-K2-Thinking	Yes	Yes	Yes	Yes

Usage

Both models implement the Model interface and work identically with all Stratus APIs:

// With run()
const result = await run(agent, "Hello", { model });

// With createSession()
const session = createSession({ model, instructions: "..." });

// With prompt()
const result = await prompt("Hello", { model });

Model Interface

Any model provider can be used with Stratus by implementing the Model interface:

model-interface.ts

interface Model {
  getResponse(request: ModelRequest, options?: ModelRequestOptions): Promise<ModelResponse>;
  getStreamedResponse(request: ModelRequest, options?: ModelRequestOptions): AsyncIterable<StreamEvent>;
}

interface ModelRequestOptions {
  signal?: AbortSignal; 
}

The options parameter is optional and backward compatible. When provided, signal is used for request cancellation.

ModelRequest

types.ts

interface ModelRequest {
  messages: ChatMessage[];
  tools?: ToolDefinition[];
  modelSettings?: ModelSettings;
  responseFormat?: ResponseFormat;
}

ModelResponse

types.ts

interface ModelResponse {
  content: string | null;
  toolCalls: ToolCall[];
  usage?: UsageInfo;
  finishReason?: FinishReason; 
}

UsageInfo

types.ts

interface UsageInfo {
  promptTokens: number;
  completionTokens: number;
  totalTokens: number;
  cacheReadTokens?: number; 
  cacheCreationTokens?: number; 
  reasoningTokens?: number; 
}

Cache token fields are populated when the Azure API returns prompt caching details. reasoningTokens is populated for reasoning models (o1, o3, etc.) from completion_tokens_details.reasoning_tokens (Chat Completions) or output_tokens_details.reasoning_tokens (Responses API). All optional fields are undefined when not active.

Prompt Caching

Both models support Azure's automatic prompt caching. Cache hits appear as cacheReadTokens in UsageInfo and are billed at a discount. Use promptCacheKey in ModelSettings to improve hit rates:

const agent = new Agent({
  name: "assistant",
  model,
  modelSettings: {
    promptCacheKey: "my-app-v1", 
  },
});

Both AzureChatCompletionsModel and AzureResponsesModel parse cached token counts from their respective response formats.

ModelError - General API errors (4xx/5xx responses)
ContentFilterError - Azure content filter blocked the request or response

error-handling.ts

import { ModelError, ContentFilterError } from "stratus-sdk/core";

try {
  const result = await run(agent, input);
} catch (error) {
  if (error instanceof ContentFilterError) {
    // Handle content filter
  } else if (error instanceof ModelError) {
    console.error(`API error ${error.status}: ${error.message}`);
  }
}

Both models also retry on 429 (rate limit) responses with exponential backoff, respecting the Retry-After header when present.

AzureResponsesModel

Config Options

AzureChatCompletionsModel

Config Options

Endpoint Formats

Non-OpenAI Models (Model Inference API)

Tested Models

Usage

Model Interface

ModelRequest

ModelResponse

UsageInfo

Prompt Caching

Authentication

Streaming

Error Handling

On this page