stratus

Azure OpenAI

Configure Azure Chat Completions and Responses API models

Stratus includes two built-in Azure OpenAI model implementations. Both implement the Model interface and work with all Stratus APIs (agents, tools, sessions, streaming, etc.).

ModelAPIBest for
AzureResponsesModelResponses APIRecommended. Latest API format with full feature support
AzureChatCompletionsModelChat CompletionsLegacy support, widest compatibility

AzureResponsesModel

The recommended model for new projects. Uses the Azure Responses API.

responses.ts
import { AzureResponsesModel } from "stratus-sdk/azure";

const model = new AzureResponsesModel({
  endpoint: "https://your-resource.openai.azure.com",
  apiKey: "your-api-key",
  deployment: "gpt-5.2",
  apiVersion: "2025-04-01-preview", // optional, this is the default
});

Config Options

PropertyTypeDescription
endpointstringRequired. Any supported endpoint format
apiKeystringRequired. API key for authentication
deploymentstringRequired. Sent as model in request body
apiVersionstringAPI version (default: "2025-04-01-preview")

AzureChatCompletionsModel

Uses the Azure Chat Completions API. Use this if your deployment doesn't support the Responses API.

chat-completions.ts
import { AzureChatCompletionsModel } from "stratus-sdk/azure";

const model = new AzureChatCompletionsModel({
  endpoint: "https://your-resource.openai.azure.com",
  apiKey: "your-api-key",
  deployment: "gpt-5.2",
  apiVersion: "2025-03-01-preview", // optional, this is the default
});

Config Options

PropertyTypeDescription
endpointstringRequired. Any supported endpoint format
apiKeystringRequired. API key for authentication
deploymentstringRequired. Model deployment name
apiVersionstringAPI version (default: "2025-03-01-preview")

Both models are interchangeable. Swap one for the other without changing any agent, tool, or session code.

Endpoint Formats

Pass any Azure endpoint URL as endpoint — the SDK auto-detects the type and builds the correct request URL.

// Azure OpenAI
endpoint: "https://your-resource.openai.azure.com"

// Cognitive Services
endpoint: "https://your-resource.cognitiveservices.azure.com"

// AI Foundry project
endpoint: "https://your-project.services.ai.azure.com/api/projects/my-project"

// Full URL (used as-is, deployment and apiVersion are ignored)
endpoint: "https://your-resource.openai.azure.com/openai/deployments/gpt-5.2/chat/completions?api-version=2025-03-01-preview"

Trailing slashes are normalized automatically.

Non-OpenAI Models (Model Inference API)

AzureChatCompletionsModel works with any model deployed through the Azure AI Model Inference API, not just OpenAI models. Pass the full Model Inference URL as the endpoint and the model name as the deployment:

model-inference.ts
import { AzureChatCompletionsModel } from "stratus-sdk/azure";

const model = new AzureChatCompletionsModel({
  endpoint: "https://your-resource.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview",
  apiKey: "your-api-key",
  deployment: "Kimi-K2.5", // model name sent in request body
});

The deployment value is sent as the model field in the request body, which the Model Inference API uses to route to the correct model. All Stratus features (tools, streaming, handoffs, sessions, etc.) work with any model that supports the Chat Completions format.

Not all models support every feature. For example, some models don't support tool calling or structured output. The SDK will surface the API error if an unsupported feature is used.

Tested Models

The following non-OpenAI models have been verified with AzureChatCompletionsModel:

ModelToolsStructured OutputStreamingHandoffs
Kimi-K2.5YesYesYesYes
Kimi-K2-ThinkingYesYesYesYes

Usage

Both models implement the Model interface and work identically with all Stratus APIs:

// With run()
const result = await run(agent, "Hello", { model });

// With createSession()
const session = createSession({ model, instructions: "..." });

// With prompt()
const result = await prompt("Hello", { model });

Model Interface

Any model provider can be used with Stratus by implementing the Model interface:

model-interface.ts
interface Model {
  getResponse(request: ModelRequest, options?: ModelRequestOptions): Promise<ModelResponse>;
  getStreamedResponse(request: ModelRequest, options?: ModelRequestOptions): AsyncIterable<StreamEvent>;
}

interface ModelRequestOptions {
  signal?: AbortSignal; 
}

The options parameter is optional and backward compatible. When provided, signal is used for request cancellation.

ModelRequest

types.ts
interface ModelRequest {
  messages: ChatMessage[];
  tools?: ToolDefinition[];
  modelSettings?: ModelSettings;
  responseFormat?: ResponseFormat;
}

ModelResponse

types.ts
interface ModelResponse {
  content: string | null;
  toolCalls: ToolCall[];
  usage?: UsageInfo;
  finishReason?: FinishReason; 
}

UsageInfo

types.ts
interface UsageInfo {
  promptTokens: number;
  completionTokens: number;
  totalTokens: number;
  cacheReadTokens?: number; 
  cacheCreationTokens?: number; 
  reasoningTokens?: number; 
}

Cache token fields are populated when the Azure API returns prompt caching details. reasoningTokens is populated for reasoning models (o1, o3, etc.) from completion_tokens_details.reasoning_tokens (Chat Completions) or output_tokens_details.reasoning_tokens (Responses API). All optional fields are undefined when not active.

Prompt Caching

Both models support Azure's automatic prompt caching. Cache hits appear as cacheReadTokens in UsageInfo and are billed at a discount. Use promptCacheKey in ModelSettings to improve hit rates:

const agent = new Agent({
  name: "assistant",
  model,
  modelSettings: {
    promptCacheKey: "my-app-v1", 
  },
});

Both AzureChatCompletionsModel and AzureResponsesModel parse cached token counts from their respective response formats.

Authentication

Both implementations use api-key header authentication. The API key is sent as a header with every request.

Streaming

Both models use Server-Sent Events (SSE) with a shared zero-dependency parser. Events are yielded as StreamEvent objects as they arrive from the Azure API.

Error Handling

Both models throw the same errors for failure modes:

  • ModelError - General API errors (4xx/5xx responses)
  • ContentFilterError - Azure content filter blocked the request or response
error-handling.ts
import { ModelError, ContentFilterError } from "stratus-sdk/core";

try {
  const result = await run(agent, input);
} catch (error) {
  if (error instanceof ContentFilterError) {
    // Handle content filter
  } else if (error instanceof ModelError) {
    console.error(`API error ${error.status}: ${error.message}`);
  }
}

Both models also retry on 429 (rate limit) responses with exponential backoff, respecting the Retry-After header when present.

Edit on GitHub

Last updated on

On this page