
ONNX Runtime mobile – deploy machine learning models across iOS, Android, and web with a single codebase. ONNX Runtime mobile provides the performance, flexibility, and compatibility that modern cross-platform AI applications demand.
This comprehensive tutorial covers ONNX Runtime mobile fundamentals, implementation guides, optimization techniques, performance benchmarks, and real-world deployment strategies for building production-ready mobile AI applications in 2025.
What Is ONNX Runtime Mobile?
ONNX Runtime mobile is Microsoft’s high-performance inference engine for Open Neural Network Exchange (ONNX) format models. ONNX Runtime mobile enables developers to train models in any framework and deploy everywhere.
[Image Alt Text: ONNX Runtime mobile cross-platform architecture diagram]
Why ONNX Runtime Mobile Matters:
Framework Flexibility:
- Train in TensorFlow
- Train in PyTorch
- Train in scikit-learn
- Convert to ONNX
- Deploy with ONNX Runtime
Platform Coverage:
- iOS (CoreML backend)
- Android (NNAPI backend)
- Web (WebAssembly)
- Windows (DirectML)
- Linux (CUDA)
Performance:
- Hardware acceleration
- Optimized inference
- Small binary size
- Efficient memory usage
- Fast startup time
Learn about mobile ML frameworks.
ONNX Runtime Mobile vs Alternatives
vs TensorFlow Lite
ONNX Runtime mobile compared to TFLite:
ONNX Runtime Advantages:
- Framework-agnostic
- Better cross-platform consistency
- Simpler conversion
- Unified API
- Smaller runtime
TensorFlow Lite Advantages:
- Larger community
- More tutorials
- Better Google integration
- Mature ecosystem
- More pre-trained models
[Image Alt Text: ONNX Runtime mobile vs TensorFlow Lite comparison chart]
vs CoreML
ONNX Runtime mobile versus Apple framework:
ONNX Runtime Benefits:
- Cross-platform code
- Framework flexibility
- Easier model sharing
- Unified development
- Version control
CoreML Benefits:
- iOS optimization
- Better Apple integration
- Neural Engine priority
- System-level features
- First-party support
See our CoreML tutorial for iOS-specific development.
vs PyTorch Mobile
ONNX Runtime mobile against PyTorch:
ONNX Runtime Strengths:
- Production-optimized
- Better performance
- Smaller size
- More backends
- Model portability
PyTorch Mobile Strengths:
- Direct PyTorch workflow
- Easier debugging
- Dynamic graphs
- Research-friendly
- Python integration
Getting Started with ONNX Runtime Mobile
Installation
Set up ONNX Runtime mobile development:
iOS (CocoaPods):
# Podfile
platform :ios, '12.0'
target 'YourApp' do
use_frameworks!
# Basic ONNX Runtime
pod 'onnxruntime-objc', '~> 1.16'
# With CoreML support
pod 'onnxruntime-objc', '~> 1.16', :subspecs => ['CoreML']
end
Android (Gradle):
// build.gradle
dependencies {
// Basic ONNX Runtime
implementation 'com.microsoft.onnxruntime:onnxruntime-android:1.16.0'
// With NNAPI support
implementation 'com.microsoft.onnxruntime:onnxruntime-android:1.16.0'
implementation 'com.microsoft.onnxruntime:onnxruntime-extensions-android:1.16.0'
}
[Image Alt Text: ONNX Runtime mobile installation process steps]
React Native:
npm install onnxruntime-react-native
# or
yarn add onnxruntime-react-native
Model Conversion
Convert models to ONNX Runtime mobile format:
From PyTorch:
import torch
import torch.onnx
# Load PyTorch model
model = torch.load('model.pth')
model.eval()
# Example input
dummy_input = torch.randn(1, 3, 224, 224)
# Export to ONNX
torch.onnx.export(
model,
dummy_input,
"model.onnx",
export_params=True,
opset_version=15,
do_constant_folding=True,
input_names=['input'],
output_names=['output'],
dynamic_axes={
'input': {0: 'batch_size'},
'output': {0: 'batch_size'}
}
)
From TensorFlow:
import tensorflow as tf
import tf2onnx
# Load TensorFlow model
model = tf.keras.models.load_model('model.h5')
# Convert to ONNX
spec = (tf.TensorSpec((None, 224, 224, 3), tf.float32, name="input"),)
output_path = "model.onnx"
model_proto, _ = tf2onnx.convert.from_keras(
model,
input_signature=spec,
opset=15,
output_path=output_path
)
[Image Alt Text: ONNX Runtime mobile model conversion workflow diagram]
Model Optimization
Optimize for ONNX Runtime mobile:
Quantization:
from onnxruntime.quantization import quantize_dynamic, QuantType
# Dynamic quantization (easiest)
model_input = "model.onnx"
model_output = "model_quantized.onnx"
quantize_dynamic(
model_input,
model_output,
weight_type=QuantType.QUInt8 # or QInt8
)
Graph Optimization:
import onnxruntime as ort
# Optimize graph
sess_options = ort.SessionOptions()
sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
sess_options.optimized_model_filepath = "model_optimized.onnx"
session = ort.InferenceSession("model.onnx", sess_options)
Learn about model quantization.
ONNX Runtime Mobile Implementation
iOS Implementation
Build ONNX Runtime mobile iOS app:
Swift Code:
import onnxruntime_objc
class ImageClassifier {
private var session: ORTSession?
init() {
do {
// Create session options
let options = try ORTSessionOptions()
// Enable CoreML (hardware acceleration)
try options.appendCoreMLExecutionProvider()
// Load model
let modelPath = Bundle.main.path(forResource: "model", ofType: "onnx")!
let env = try ORTEnv(loggingLevel: .warning)
session = try ORTSession(env: env, modelPath: modelPath, sessionOptions: options)
} catch {
print("ONNX Runtime mobile error: \(error)")
}
}
func classify(image: UIImage) throws -> [Float] {
// Preprocess image
let inputData = preprocessImage(image)
// Create input tensor
let inputName = "input"
let inputShape: [NSNumber] = [1, 3, 224, 224]
let inputTensor = try ORTValue(tensorData: NSMutableData(data: inputData),
elementType: .float,
shape: inputShape)
// Run inference with ONNX Runtime mobile
let outputs = try session!.run(
withInputs: [inputName: inputTensor],
outputNames: ["output"],
runOptions: nil
)
// Extract results
let outputTensor = outputs["output"]!
let outputData = try outputTensor.tensorData() as Data
// Convert to array
let results = outputData.withUnsafeBytes {
Array(UnsafeBufferPointer<Float>(
start: $0.baseAddress!.assumingMemoryBound(to: Float.self),
count: outputData.count / MemoryLayout<Float>.stride
))
}
return results
}
private func preprocessImage(_ image: UIImage) -> Data {
// Resize to 224x224
let size = CGSize(width: 224, height: 224)
UIGraphicsBeginImageContext(size)
image.draw(in: CGRect(origin: .zero, size: size))
let resized = UIGraphicsGetImageFromCurrentImageContext()!
UIGraphicsEndImageContext()
// Convert to RGB data
guard let cgImage = resized.cgImage,
let data = cgImage.dataProvider?.data,
let bytes = CFDataGetBytePtr(data) else {
return Data()
}
// Normalize to [-1, 1]
var floatArray = [Float]()
for i in stride(from: 0, to: CFDataGetLength(data), by: 4) {
let r = (Float(bytes[i]) / 255.0 - 0.5) * 2
let g = (Float(bytes[i+1]) / 255.0 - 0.5) * 2
let b = (Float(bytes[i+2]) / 255.0 - 0.5) * 2
floatArray.append(contentsOf: [r, g, b])
}
return Data(bytes: floatArray, count: floatArray.count * MemoryLayout<Float>.stride)
}
}
[Image Alt Text: ONNX Runtime mobile iOS Swift code implementation example]
Android Implementation
Implement ONNX Runtime mobile on Android:
Kotlin Code:
import ai.onnxruntime.*
import android.graphics.Bitmap
class ImageClassifier(context: Context) {
private val ortEnv = OrtEnvironment.getEnvironment()
private val session: OrtSession
init {
// Create session options
val sessionOptions = OrtSession.SessionOptions()
// Enable NNAPI for hardware acceleration
sessionOptions.addNnapi()
// Load model from assets
val modelBytes = context.assets.open("model.onnx").readBytes()
session = ortEnv.createSession(modelBytes, sessionOptions)
}
fun classify(bitmap: Bitmap): FloatArray {
// Preprocess image
val inputData = preprocessImage(bitmap)
// Create input tensor
val inputName = session.inputNames.iterator().next()
val shape = longArrayOf(1, 3, 224, 224)
val inputTensor = OnnxTensor.createTensor(
ortEnv,
FloatBuffer.wrap(inputData),
shape
)
// Run inference with ONNX Runtime mobile
val results = session.run(mapOf(inputName to inputTensor))
// Extract output
val outputName = session.outputNames.iterator().next()
val output = results[outputName]?.value as Array<FloatArray>
// Cleanup
inputTensor.close()
results.close()
return output[0]
}
private fun preprocessImage(bitmap: Bitmap): FloatArray {
// Resize to 224x224
val resized = Bitmap.createScaledBitmap(bitmap, 224, 224, true)
// Extract pixels
val pixels = IntArray(224 * 224)
resized.getPixels(pixels, 0, 224, 0, 0, 224, 224)
// Convert to float array and normalize
val floatArray = FloatArray(3 * 224 * 224)
var idx = 0
for (pixel in pixels) {
floatArray[idx++] = ((pixel shr 16 and 0xFF) / 255f - 0.5f) * 2f // R
floatArray[idx++] = ((pixel shr 8 and 0xFF) / 255f - 0.5f) * 2f // G
floatArray[idx++] = ((pixel and 0xFF) / 255f - 0.5f) * 2f // B
}
return floatArray
}
fun close() {
session.close()
ortEnv.close()
}
}
React Native Implementation
ONNX Runtime mobile for React Native:
JavaScript Code:
import { InferenceSession, Tensor } from 'onnxruntime-react-native';
class ImageClassifier {
constructor() {
this.session = null;
}
async initialize() {
try {
// Load model with ONNX Runtime mobile
this.session = await InferenceSession.create(
'model.onnx',
{
executionProviders: ['coreml'], // or 'nnapi' for Android
graphOptimizationLevel: 'all'
}
);
} catch (error) {
console.error('ONNX Runtime mobile init error:', error);
}
}
async classify(imageData) {
// Preprocess image
const preprocessed = this.preprocessImage(imageData);
// Create input tensor
const inputTensor = new Tensor(
'float32',
new Float32Array(preprocessed),
[1, 3, 224, 224]
);
// Run inference
const feeds = { input: inputTensor };
const results = await this.session.run(feeds);
// Extract output
const output = results.output.data;
return Array.from(output);
}
preprocessImage(imageData) {
// Image preprocessing logic
// Resize, normalize, convert to float array
// ...
return preprocessedArray;
}
}
export default ImageClassifier;
[Image Alt Text: ONNX Runtime mobile React Native code example]
ONNX Runtime Mobile Performance Optimization
Execution Providers
ONNX Runtime mobile backend optimization:
Available Providers:
- CoreML (iOS): Apple Neural Engine
- NNAPI (Android): Neural Networks API
- CPU (Fallback): Cross-platform
- DirectML (Windows): GPU acceleration
- XNNPACK (Mobile): Optimized CPU
Selection Strategy:
# Priority order (ONNX Runtime mobile tries in order)
providers = [
'CoreMLExecutionProvider', # iOS
'NNAPIExecutionProvider', # Android
'CPUExecutionProvider' # Fallback
]
session = ort.InferenceSession('model.onnx', providers=providers)
[Image Alt Text: ONNX Runtime mobile execution providers performance comparison]
Memory Optimization
ONNX Runtime mobile memory management:
Session Options:
// C++ example
OrtSessionOptions* session_options;
CreateSessionOptions(&session_options);
// Enable memory pattern optimization
EnableMemPattern(session_options);
// Enable CPU memory arena
EnableCpuMemArena(session_options);
// Set graph optimization level
SetGraphOptimizationLevel(session_options, ORT_ENABLE_ALL);
Model Optimization:
- Reduce precision (FP32 → FP16 → INT8)
- Prune unnecessary operations
- Fuse operations
- Remove unused outputs
- Optimize graph structure
Batch Processing
Improve ONNX Runtime mobile throughput:
Batching Strategy:
# Process multiple inputs efficiently
batch_size = 4
inputs = collect_batch(batch_size)
# Prepare batched input
batched_input = np.stack(inputs) # Shape: (4, 3, 224, 224)
# Single inference call
session = ort.InferenceSession('model.onnx')
outputs = session.run(None, {'input': batched_input})
# Process all results at once
for i, output in enumerate(outputs[0]):
process_result(output)
Discover NPU acceleration benefits.
ONNX Runtime Mobile Benchmarks
Performance Comparison
ONNX Runtime mobile speed tests:
MobileNet V2 (Image Classification):
- iOS (CoreML): 15ms
- Android (NNAPI): 18ms
- CPU fallback: 45ms
ResNet-50:
- iOS (CoreML): 35ms
- Android (NNAPI): 42ms
- CPU fallback: 180ms
BERT-base (Text):
- iOS: 85ms
- Android: 95ms
- CPU: 320ms
[Image Alt Text: ONNX Runtime mobile performance benchmarks across platforms]
Memory Footprint
ONNX Runtime mobile resource usage:
Runtime Size:
- iOS: ~8MB
- Android: ~12MB
- React Native: ~15MB
Model Sizes (quantized):
- MobileNet V2: 3.5MB
- ResNet-50: 25MB
- BERT-base: 110MB
Peak Memory:
- MobileNet V2: ~50MB
- ResNet-50: ~200MB
- BERT-base: ~450MB
Real-World ONNX Runtime Mobile Applications
Computer Vision
ONNX Runtime mobile vision apps:
Object Detection:
- YOLO models
- Real-time processing
- Bounding boxes
- Classification
Image Segmentation:
- Semantic segmentation
- Instance segmentation
- Medical imaging
- AR applications
Face Recognition:
- Face detection
- Landmark detection
- Expression analysis
- Identity verification
[Image Alt Text: ONNX Runtime mobile computer vision applications examples]
Natural Language Processing
ONNX Runtime mobile NLP:
Text Classification:
- Sentiment analysis
- Spam detection
- Category prediction
- Language identification
Named Entity Recognition:
- Person names
- Organizations
- Locations
- Dates/times
Question Answering:
- Context-based QA
- Extractive answers
- Mobile assistants
Audio Processing
ONNX Runtime mobile audio:
Speech Recognition:
- Voice commands
- Transcription
- Speaker identification
Audio Classification:
- Sound detection
- Music genre
- Acoustic scenes
Voice Enhancement:
- Noise reduction
- Echo cancellation
- Audio upsampling
Troubleshooting ONNX Runtime Mobile
Common Issues
ONNX Runtime mobile problems:
Model Load Failures:
Error: Failed to load model
Solution:
- Check ONNX opset version
- Verify model path
- Ensure compatible operators
- Update ONNX Runtime
Inference Errors:
Error: Shape mismatch
Solution:
- Verify input dimensions
- Check data type
- Validate preprocessing
- Review model specs
[Image Alt Text: ONNX Runtime mobile troubleshooting decision tree]
Performance Issues:
Problem: Slow inference
Solutions:
- Enable hardware acceleration
- Quantize model
- Optimize graph
- Reduce input size
- Check provider priority
Debugging Tips
ONNX Runtime mobile debugging:
Enable Logging:
import onnxruntime as ort
# Set logging level
ort.set_default_logger_severity(0) # Verbose
# Log to file
ort.set_default_logger_verbosity(3)
Profile Performance:
# Enable profiling
sess_options = ort.SessionOptions()
sess_options.enable_profiling = True
session = ort.InferenceSession('model.onnx', sess_options)
# After inference
prof_file = session.end_profiling()
print(f"Profiling data: {prof_file}")
The Verdict on ONNX Runtime Mobile
ONNX Runtime mobile delivers on its promise of write-once, run-anywhere machine learning. The combination of framework flexibility, platform coverage, and performance optimization makes it an excellent choice for cross-platform mobile AI development.
Choose ONNX Runtime Mobile For:
- ✅ Cross-platform apps
- ✅ Framework flexibility
- ✅ Production optimization
- ✅ Performance requirements
- ✅ Model portability
Consider Alternatives If:
- ❌ Platform-specific optimizations crucial
- ❌ Need largest model library
- ❌ Require extensive tutorials
- ❌ Want first-party framework
- ❌ Prefer native tools
Key Takeaways:
- Framework-agnostic deployment
- Excellent performance
- Consistent cross-platform API
- Active Microsoft support
- Growing adoption
ONNX Runtime mobile bridges the gap between AI research and mobile production. Train in your preferred framework, optimize once, deploy everywhere—exactly as promised.
For teams building cross-platform mobile AI applications, ONNX Runtime mobile significantly reduces development complexity while maintaining performance. It’s mature, well-supported, and production-ready.
The future of mobile ML is cross-platform, and ONNX Runtime mobile leads that future today.
