Documentation Index
Fetch the complete documentation index at: https://mintlify.com/microsoft/onnxruntime/llms.txt
Use this file to discover all available pages before exploring further.
The InferenceSession class is the main interface for loading and running ONNX models in JavaScript environments.
Importing
Browser (ES Modules)
import * as ort from 'onnxruntime-web';
Node.js
const ort = require('onnxruntime-node');
// or
import * as ort from 'onnxruntime-node';
Creating Sessions
create()
Creates an inference session from a model.
static async create(
path: string | Uint8Array | ArrayBufferLike,
options?: InferenceSession.SessionOptions
): Promise<InferenceSession>
Parameters:
path: Model file path, URL, or binary data
options: Optional session configuration
Returns: Promise resolving to InferenceSession
From URL
const session = await ort.InferenceSession.create('./model.onnx');
From ArrayBuffer
const response = await fetch('./model.onnx');
const arrayBuffer = await response.arrayBuffer();
const session = await ort.InferenceSession.create(arrayBuffer);
From Uint8Array
const modelData = new Uint8Array(arrayBuffer);
const session = await ort.InferenceSession.create(modelData);
With Options
const session = await ort.InferenceSession.create('./model.onnx', {
executionProviders: ['webgpu', 'wasm'],
graphOptimizationLevel: 'all',
enableCpuMemArena: true
});
Session Properties
Gets array of input names.
readonly inputNames: readonly string[]
Example:
const inputs = session.inputNames;
console.log('Model inputs:', inputs);
// Output: Model inputs: ['input']
outputNames
Gets array of output names.
readonly outputNames: readonly string[]
Example:
const outputs = session.outputNames;
console.log('Model outputs:', outputs);
// Output: Model outputs: ['output']
Running Inference
run()
Runs inference on the model.
async run(
feeds: InferenceSession.FeedsType,
options?: InferenceSession.RunOptions
): Promise<InferenceSession.ReturnType>
Parameters:
feeds: Object mapping input names to tensors
options: Optional run configuration
Returns: Promise resolving to output tensors
Basic Usage
import * as ort from 'onnxruntime-web';
// Create session
const session = await ort.InferenceSession.create('./model.onnx');
// Prepare input
const inputData = Float32Array.from([1, 2, 3, 4]);
const tensor = new ort.Tensor('float32', inputData, [1, 4]);
// Run inference
const feeds = { input: tensor };
const results = await session.run(feeds);
// Get output
const output = results.output;
console.log('Output data:', output.data);
console.log('Output shape:', output.dims);
With Specific Outputs
// Request specific outputs
const results = await session.run(
{ input: inputTensor },
['output1', 'output2'] // Only get these outputs
);
With Run Options
const results = await session.run(
{ input: tensor },
{
logSeverityLevel: 2,
logVerbosityLevel: 0,
tag: 'inference-1'
}
);
SessionOptions
Configuration options for creating sessions.
executionProviders
Specifies execution providers in priority order.
executionProviders?: ExecutionProviderConfig[]
Example:
const session = await ort.InferenceSession.create('./model.onnx', {
executionProviders: ['webgpu', 'wasm']
});
// With provider options
const session = await ort.InferenceSession.create('./model.onnx', {
executionProviders: [
{
name: 'webgpu',
deviceType: 'gpu',
powerPreference: 'high-performance'
},
'wasm'
]
});
graphOptimizationLevel
Sets graph optimization level.
graphOptimizationLevel?: 'disabled' | 'basic' | 'extended' | 'all'
Example:
const session = await ort.InferenceSession.create('./model.onnx', {
graphOptimizationLevel: 'all'
});
executionMode
Controls sequential vs parallel execution.
executionMode?: 'sequential' | 'parallel'
Thread Configuration
intraOpNumThreads?: number
interOpNumThreads?: number
Example:
const session = await ort.InferenceSession.create('./model.onnx', {
intraOpNumThreads: 4,
interOpNumThreads: 1,
executionMode: 'parallel'
});
Memory Options
enableCpuMemArena?: boolean
enableMemPattern?: boolean
Example:
const session = await ort.InferenceSession.create('./model.onnx', {
enableCpuMemArena: true,
enableMemPattern: true
});
Logging
logId?: string
logSeverityLevel?: 0 | 1 | 2 | 3 | 4 // Verbose, Info, Warning, Error, Fatal
Example:
const session = await ort.InferenceSession.create('./model.onnx', {
logId: 'my-model',
logSeverityLevel: 2 // Warning
});
extra?: Record<string, unknown>
Example:
const session = await ort.InferenceSession.create('./model.onnx', {
extra: {
session: {
set_denormal_as_zero: '1',
disable_prepacking: '1'
}
}
});
Complete Examples
Image Classification (Browser)
import * as ort from 'onnxruntime-web';
class ImageClassifier {
constructor() {
this.session = null;
}
async initialize(modelPath) {
this.session = await ort.InferenceSession.create(modelPath, {
executionProviders: ['webgpu', 'wasm'],
graphOptimizationLevel: 'all'
});
console.log('Model loaded');
console.log('Inputs:', this.session.inputNames);
console.log('Outputs:', this.session.outputNames);
}
async classify(imageElement) {
// Preprocess image
const tensor = await this.preprocessImage(imageElement);
// Run inference
const feeds = { [this.session.inputNames[0]]: tensor };
const results = await this.session.run(feeds);
// Get output
const output = results[this.session.outputNames[0]];
return this.postprocess(output);
}
async preprocessImage(img) {
const canvas = document.createElement('canvas');
const ctx = canvas.getContext('2d');
canvas.width = 224;
canvas.height = 224;
ctx.drawImage(img, 0, 0, 224, 224);
const imageData = ctx.getImageData(0, 0, 224, 224);
const pixels = imageData.data;
// Convert to CHW format and normalize
const mean = [0.485, 0.456, 0.406];
const std = [0.229, 0.224, 0.225];
const data = new Float32Array(3 * 224 * 224);
for (let i = 0; i < 224 * 224; i++) {
data[i] = (pixels[i * 4] / 255 - mean[0]) / std[0];
data[224 * 224 + i] = (pixels[i * 4 + 1] / 255 - mean[1]) / std[1];
data[224 * 224 * 2 + i] = (pixels[i * 4 + 2] / 255 - mean[2]) / std[2];
}
return new ort.Tensor('float32', data, [1, 3, 224, 224]);
}
postprocess(output) {
const predictions = Array.from(output.data)
.map((prob, idx) => ({ class: idx, probability: prob }))
.sort((a, b) => b.probability - a.probability)
.slice(0, 5);
return predictions;
}
}
// Usage
const classifier = new ImageClassifier();
await classifier.initialize('./resnet50.onnx');
const img = document.getElementById('image');
const predictions = await classifier.classify(img);
console.log('Top predictions:', predictions);
Text Processing (Node.js)
const ort = require('onnxruntime-node');
const fs = require('fs');
class TextClassifier {
async initialize(modelPath) {
const modelBuffer = fs.readFileSync(modelPath);
this.session = await ort.InferenceSession.create(modelBuffer, {
intraOpNumThreads: 4,
graphOptimizationLevel: 'all'
});
}
async classify(tokenIds, attentionMask) {
// Create input tensors
const inputIds = new ort.Tensor(
'int64',
new BigInt64Array(tokenIds.map(x => BigInt(x))),
[1, tokenIds.length]
);
const mask = new ort.Tensor(
'int64',
new BigInt64Array(attentionMask.map(x => BigInt(x))),
[1, attentionMask.length]
);
// Run inference
const results = await this.session.run({
input_ids: inputIds,
attention_mask: mask
});
// Get logits
const logits = results.logits;
return this.softmax(Array.from(logits.data));
}
softmax(arr) {
const max = Math.max(...arr);
const exps = arr.map(x => Math.exp(x - max));
const sum = exps.reduce((a, b) => a + b);
return exps.map(x => x / sum);
}
}
// Usage
(async () => {
const classifier = new TextClassifier();
await classifier.initialize('./bert.onnx');
const tokenIds = [101, 2023, 2003, 1037, 3231, 102];
const attentionMask = [1, 1, 1, 1, 1, 1];
const probs = await classifier.classify(tokenIds, attentionMask);
console.log('Classification probabilities:', probs);
})();
Batch Processing
class BatchProcessor {
constructor(session) {
this.session = session;
}
async processBatch(inputs) {
const results = [];
for (const input of inputs) {
const tensor = new ort.Tensor('float32', input.data, input.shape);
const feeds = { input: tensor };
const output = await this.session.run(feeds);
results.push(output);
}
return results;
}
async processParallel(inputs) {
const promises = inputs.map(async (input) => {
const tensor = new ort.Tensor('float32', input.data, input.shape);
const feeds = { input: tensor };
return await this.session.run(feeds);
});
return await Promise.all(promises);
}
}
// Usage
const session = await ort.InferenceSession.create('./model.onnx');
const processor = new BatchProcessor(session);
const inputs = [
{ data: new Float32Array([1, 2, 3]), shape: [1, 3] },
{ data: new Float32Array([4, 5, 6]), shape: [1, 3] },
{ data: new Float32Array([7, 8, 9]), shape: [1, 3] }
];
const results = await processor.processParallel(inputs);
Error Handling
try {
const session = await ort.InferenceSession.create('./model.onnx', {
executionProviders: ['webgpu', 'wasm']
});
const results = await session.run(feeds);
console.log('Inference successful:', results);
} catch (error) {
console.error('Inference error:', error.message);
if (error.message.includes('model')) {
console.error('Failed to load model');
} else if (error.message.includes('input')) {
console.error('Invalid input tensor');
}
}
- Reuse sessions: Create once, use many times
- Choose right EP: WebGPU for modern browsers, WASM for compatibility
- Enable optimizations: Use ‘all’ graph optimization level
- Batch when possible: Process multiple inputs together
- Pre-allocate tensors: Reuse tensor buffers for repeated inference
Browser Compatibility
// Check for WebGPU support
if ('gpu' in navigator) {
console.log('WebGPU available');
executionProviders = ['webgpu', 'wasm'];
} else {
console.log('Using WebAssembly');
executionProviders = ['wasm'];
}
const session = await ort.InferenceSession.create('./model.onnx', {
executionProviders
});
See Also