Multimodal Input
Send images, files, audio, and mixed content to agents
Send text, images, files (PDFs), audio, or any combination to agents using ContentPart arrays.
Sending an image
Pass a ChatMessage[] array to run() with a UserMessage whose content is a ContentPart[]:
import { Agent, run } from "@usestratus/sdk/core";
import type { ChatMessage } from "@usestratus/sdk/core";
const agent = new Agent({
name: "vision",
model,
instructions: "Describe what you see in the image.",
});
const messages: ChatMessage[] = [
{
role: "user",
content: [
{
type: "image_url",
image_url: { url: "https://example.com/photo.png" },
},
],
},
];
const result = await run(agent, messages);
console.log(result.output);Base64 data URLs work the same way:
import { readFile } from "node:fs/promises";
const buffer = await readFile("./chart.png");
const dataUrl = `data:image/png;base64,${buffer.toString("base64")}`;
const messages: ChatMessage[] = [
{
role: "user",
content: [
{
type: "image_url",
image_url: { url: dataUrl },
},
],
},
];
const result = await run(agent, messages);Mixed text and images
Combine text and image parts in a single message:
import type { ContentPart } from "@usestratus/sdk/core";
const parts: ContentPart[] = [
{ type: "text", text: "Compare these two charts and summarize the differences." },
{ type: "image_url", image_url: { url: "https://example.com/chart-q1.png" } },
{ type: "image_url", image_url: { url: "https://example.com/chart-q2.png" } },
];
const messages: ChatMessage[] = [{ role: "user", content: parts }];
const result = await run(agent, messages);
console.log(result.output);Image detail levels
The detail parameter controls how the model processes the image:
| Level | Description |
|---|---|
"auto" | The model decides based on image size (default) |
"low" | Fixed low-resolution processing. Faster and uses fewer tokens |
"high" | High-resolution processing with tiled analysis. More accurate for detailed images |
Set the detail level on the image_url object:
const parts: ContentPart[] = [
{ type: "text", text: "Read the fine print in this contract." },
{
type: "image_url",
image_url: {
url: "https://example.com/contract.png",
detail: "high",
},
},
];Use "low" when you only need a general understanding of the image. It processes faster and consumes fewer tokens. Use "high" when fine details matter, such as reading text in screenshots or analyzing charts.
Sending a file (PDF)
Pass PDF files as base64 data URLs or file IDs. Only supported by AzureResponsesModel.
import { readFile } from "node:fs/promises";
import type { ChatMessage } from "@usestratus/sdk/core";
const buffer = await readFile("./report.pdf");
const dataUrl = `data:application/pdf;base64,${buffer.toString("base64")}`;
const messages: ChatMessage[] = [
{
role: "user",
content: [
{ type: "file", file: { url: dataUrl }, filename: "report.pdf" },
{ type: "text", text: "Summarize this PDF" },
],
},
];
const result = await run(agent, messages);If you've uploaded the file via the Azure Files API, use a file ID instead:
const messages: ChatMessage[] = [
{
role: "user",
content: [
{ type: "file", file: { file_id: "assistant-KaVLJQ..." } },
{ type: "text", text: "What does this document say?" },
],
},
];Sending audio
Pass audio as a URL or inline base64 data. Only supported by AzureResponsesModel.
const messages: ChatMessage[] = [
{
role: "user",
content: [
{ type: "audio", audio: { data: base64AudioData, format: "wav" } },
{ type: "text", text: "Transcribe this audio" },
],
},
];With sessions
session.send() accepts a ContentPart[] directly:
import { createSession } from "@usestratus/sdk/core";
import type { ContentPart } from "@usestratus/sdk/core";
const session = createSession({
model,
instructions: "You are a helpful vision assistant.",
});
const parts: ContentPart[] = [
{ type: "text", text: "What is in this image?" },
{ type: "image_url", image_url: { url: "https://example.com/photo.png" } },
];
session.send(parts);
for await (const event of session.stream()) {
if (event.type === "content_delta") process.stdout.write(event.content);
}Follow-up messages can reference the image from the previous turn:
session.send("What colors are most prominent in that image?");
for await (const event of session.stream()) {
if (event.type === "content_delta") process.stdout.write(event.content);
}With prompt()
prompt() also accepts ContentPart[] as input:
import { prompt } from "@usestratus/sdk/core";
import type { ContentPart } from "@usestratus/sdk/core";
const parts: ContentPart[] = [
{ type: "text", text: "Describe this image in one sentence." },
{ type: "image_url", image_url: { url: "https://example.com/sunset.png" } },
];
const result = await prompt(parts, { model });
console.log(result.output);Image support depends on the model deployment. Most gpt-5.x deployments support vision.
ContentPart types
interface TextContentPart {
type: "text";
text: string;
}
interface ImageContentPart {
type: "image_url";
image_url: {
url: string;
detail?: "auto" | "low" | "high";
};
}
interface FileContentPart {
type: "file";
file: { url: string } | { file_id: string };
filename?: string;
}
interface AudioContentPart {
type: "audio";
audio: { url: string } | { data: string; format: "wav" | "mp3" };
}
type ContentPart = TextContentPart | ImageContentPart | FileContentPart | AudioContentPart;UserMessage.content accepts either a plain string or a ContentPart[] array. When you pass a string, it behaves as a single text part.
FileContentPart and AudioContentPart are only supported by AzureResponsesModel. They are converted to the Responses API's input_file and input_audio types respectively.
Next steps
- Sessions - Multi-turn conversations with persistent history
- Streaming - Stream responses token by token
- Structured Output - Parse model output into typed objects
- Tools - Give agents the ability to call functions
Last updated on