Custom Model Providers
Implement the Model interface to use any LLM provider
Stratus is provider-agnostic at its core. The stratus/core package defines a Model interface that any LLM provider can implement. Azure is the built-in implementation, but you can plug in OpenAI, Anthropic, local models, or anything else. Custom models work with all SDK features - tools, handoffs, guardrails, sessions, and tracing.
The Model interface
import type {
Model,
ModelRequest,
ModelRequestOptions,
ModelResponse,
StreamEvent,
} from "stratus-sdk/core";The Model interface requires two methods:
interface Model {
getResponse(
request: ModelRequest,
options?: ModelRequestOptions,
): Promise<ModelResponse>;
getStreamedResponse(
request: ModelRequest,
options?: ModelRequestOptions,
): AsyncIterable<StreamEvent>;
}getResponse() makes a single request and returns the full response. getStreamedResponse() returns an async iterable of StreamEvent objects that the SDK consumes as they arrive. Both methods receive the same ModelRequest input and an optional ModelRequestOptions with an AbortSignal.
Implementing a custom model
Here is a minimal model that echoes the last user message and supports both methods:
import type {
Model,
ModelRequest,
ModelRequestOptions,
ModelResponse,
StreamEvent,
} from "stratus-sdk/core";
export class EchoModel implements Model {
async getResponse(
request: ModelRequest,
_options?: ModelRequestOptions,
): Promise<ModelResponse> {
const lastMessage = request.messages.at(-1);
const text = lastMessage?.role === "user"
? typeof lastMessage.content === "string"
? lastMessage.content
: "echo"
: "echo";
return {
content: `Echo: ${text}`,
toolCalls: [],
usage: {
promptTokens: 0,
completionTokens: 0,
totalTokens: 0,
},
finishReason: "stop",
};
}
async *getStreamedResponse(
request: ModelRequest,
options?: ModelRequestOptions,
): AsyncGenerator<StreamEvent> {
// Reuse getResponse and emit the result as stream events
const response = await this.getResponse(request, options);
const content = response.content ?? "";
// Stream content one word at a time
const words = content.split(" ");
for (const [i, word] of words.entries()) {
const chunk = i < words.length - 1 ? `${word} ` : word;
yield { type: "content_delta", content: chunk };
}
yield { type: "done", response };
}
}This works with run(), stream(), sessions, and every other SDK feature.
ModelRequest
The ModelRequest object is passed to both model methods. It contains everything the model needs to generate a response.
| Field | Type | Description |
|---|---|---|
messages | ChatMessage[] | The conversation history (system, user, assistant, tool messages) |
tools | ToolDefinition[] | Tool definitions the model can call. Optional - omitted when the agent has no tools |
modelSettings | ModelSettings | Temperature, max tokens, top-p, stop sequences, tool choice, and other generation parameters |
responseFormat | ResponseFormat | Output format constraint (text, json_object, or json_schema for structured output) |
ChatMessage is a union of SystemMessage, UserMessage, AssistantMessage, and ToolMessage. User messages support multimodal content via ContentPart[].
ModelResponse
The ModelResponse is what both methods must produce. For streaming, the final done event must include the complete ModelResponse.
| Field | Type | Description |
|---|---|---|
content | string | null | The text content of the model's response. null when the model only made tool calls |
toolCalls | ToolCall[] | Tool calls the model wants to execute. Empty array when there are no tool calls |
usage | UsageInfo | Token usage statistics (prompt, completion, total, cache tokens). Optional |
finishReason | string | Why the model stopped generating (stop, tool_calls, length, content_filter). Optional |
StreamEvent types
The getStreamedResponse() method yields a sequence of StreamEvent objects. The SDK processes these to build up the response incrementally.
| Event Type | Payload | Description |
|---|---|---|
content_delta | { content: string } | A chunk of text content. Emitted as the model generates text |
tool_call_start | { toolCall: { id: string; name: string } } | A new tool call has started. Emitted once per tool call |
tool_call_delta | { toolCallId: string; arguments: string } | A chunk of JSON arguments for an in-progress tool call |
tool_call_done | { toolCallId: string } | A tool call's arguments are complete |
done | { response: ModelResponse } | The stream is finished. Must include the full ModelResponse with all content and tool calls |
The done event is required. The SDK relies on it to finalize the response, update usage tracking, and determine the finish reason.
Example: OpenAI-compatible provider
Here is a sketch of how you would wrap the OpenAI chat completions API:
import type {
Model,
ModelRequest,
ModelRequestOptions,
ModelResponse,
StreamEvent,
UsageInfo,
} from "stratus-sdk/core";
import type { ToolCall } from "stratus-sdk/core";
interface OpenAIModelConfig {
apiKey: string;
model: string;
baseUrl?: string;
}
export class OpenAIModel implements Model {
private readonly apiKey: string;
private readonly model: string;
private readonly baseUrl: string;
constructor(config: OpenAIModelConfig) {
this.apiKey = config.apiKey;
this.model = config.model;
this.baseUrl = config.baseUrl ?? "https://api.openai.com/v1";
}
async getResponse(
request: ModelRequest,
options?: ModelRequestOptions,
): Promise<ModelResponse> {
const body = this.buildBody(request);
const res = await fetch(`${this.baseUrl}/chat/completions`, {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${this.apiKey}`,
},
body: JSON.stringify(body),
signal: options?.signal,
});
if (!res.ok) {
throw new Error(`OpenAI API error: ${res.status}`);
}
const json = await res.json();
return this.parseResponse(json);
}
async *getStreamedResponse(
request: ModelRequest,
options?: ModelRequestOptions,
): AsyncGenerator<StreamEvent> {
const body = { ...this.buildBody(request), stream: true };
const res = await fetch(`${this.baseUrl}/chat/completions`, {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${this.apiKey}`,
},
body: JSON.stringify(body),
signal: options?.signal,
});
if (!res.ok) {
throw new Error(`OpenAI API error: ${res.status}`);
}
// Parse SSE stream, accumulate content and tool calls,
// yield content_delta / tool_call_start / tool_call_delta events,
// then yield tool_call_done for each tool call and a final done event.
// See AzureResponsesModel source for a complete SSE implementation.
let content = "";
const toolCalls: ToolCall[] = [];
// ... SSE parsing logic here ...
yield {
type: "done",
response: { content: content || null, toolCalls, finishReason: "stop" },
};
}
private buildBody(request: ModelRequest): Record<string, unknown> {
const body: Record<string, unknown> = {
model: this.model,
messages: request.messages,
};
if (request.tools?.length) {
body.tools = request.tools;
}
if (request.responseFormat) {
body.response_format = request.responseFormat;
}
const s = request.modelSettings;
if (s?.temperature !== undefined) body.temperature = s.temperature;
if (s?.maxTokens !== undefined) body.max_tokens = s.maxTokens;
if (s?.topP !== undefined) body.top_p = s.topP;
if (s?.stop !== undefined) body.stop = s.stop;
if (s?.toolChoice !== undefined) body.tool_choice = s.toolChoice;
return body;
}
private parseResponse(json: any): ModelResponse {
const choice = json.choices[0];
const toolCalls: ToolCall[] = (choice.message.tool_calls ?? []).map(
(tc: any) => ({
id: tc.id,
type: "function" as const,
function: { name: tc.function.name, arguments: tc.function.arguments },
}),
);
const usage: UsageInfo | undefined = json.usage
? {
promptTokens: json.usage.prompt_tokens,
completionTokens: json.usage.completion_tokens,
totalTokens: json.usage.total_tokens,
}
: undefined;
return {
content: choice.message.content,
toolCalls,
usage,
finishReason: choice.finish_reason,
};
}
}The built-in AzureResponsesModel is a reference implementation. Use its source code as a guide for building your own - it covers SSE parsing, retry logic, abort signal handling, and error mapping.
Using your custom model
Custom models are passed anywhere the SDK accepts a Model. There is no registration step - just instantiate and use.
With an Agent and run()
import { Agent, run } from "stratus-sdk/core";
const model = new OpenAIModel({
apiKey: process.env.OPENAI_API_KEY!,
model: "gpt-4o",
});
const agent = new Agent({
name: "assistant",
model,
instructions: "You are a helpful assistant.",
});
const result = await run(agent, "Hello!");
console.log(result.output);With stream()
import { Agent, stream } from "stratus-sdk/core";
const { stream: events, result } = await stream(
agent,
"Explain quantum computing.",
);
for await (const event of events) {
if (event.type === "content_delta") {
process.stdout.write(event.content);
}
}
const final = await result;
console.log(final.usage);With sessions
import { createSession } from "stratus-sdk/core";
const session = createSession({
model,
instructions: "You are a helpful assistant.",
tools: [getWeather],
});
session.send("What's the weather in Paris?");
for await (const event of session.stream()) {
if (event.type === "content_delta") {
process.stdout.write(event.content);
}
}Override at call site
You can also set a default model on the agent and override it per-call:
const agent = new Agent({
name: "assistant",
model: defaultModel,
});
// Use a different model for this specific run
const result = await run(agent, "Hello", { model: otherModel });Next steps
Last updated on