ONNX Runtime Mobile: Cross-Platform AI Tutorial 2025

ONNX Runtime mobile – deploy machine learning models across iOS, Android, and web with a single codebase. ONNX Runtime mobile provides the performance, flexibility, and compatibility that modern cross-platform AI applications demand.

This comprehensive tutorial covers ONNX Runtime mobile fundamentals, implementation guides, optimization techniques, performance benchmarks, and real-world deployment strategies for building production-ready mobile AI applications in 2025.

What Is ONNX Runtime Mobile?

ONNX Runtime mobile is Microsoft’s high-performance inference engine for Open Neural Network Exchange (ONNX) format models. ONNX Runtime mobile enables developers to train models in any framework and deploy everywhere.

[Image Alt Text: ONNX Runtime mobile cross-platform architecture diagram]

Why ONNX Runtime Mobile Matters:

Framework Flexibility:

Train in TensorFlow
Train in PyTorch
Train in scikit-learn
Convert to ONNX
Deploy with ONNX Runtime

Platform Coverage:

iOS (CoreML backend)
Android (NNAPI backend)
Web (WebAssembly)
Windows (DirectML)
Linux (CUDA)

Performance:

Hardware acceleration
Optimized inference
Small binary size
Efficient memory usage
Fast startup time

Learn about mobile ML frameworks.

ONNX Runtime Mobile vs Alternatives

vs TensorFlow Lite

ONNX Runtime mobile compared to TFLite:

ONNX Runtime Advantages:

Framework-agnostic
Better cross-platform consistency
Simpler conversion
Unified API
Smaller runtime

TensorFlow Lite Advantages:

Larger community
More tutorials
Better Google integration
Mature ecosystem
More pre-trained models

[Image Alt Text: ONNX Runtime mobile vs TensorFlow Lite comparison chart]

vs CoreML

ONNX Runtime mobile versus Apple framework:

ONNX Runtime Benefits:

Cross-platform code
Framework flexibility
Easier model sharing
Unified development
Version control

CoreML Benefits:

iOS optimization
Better Apple integration
Neural Engine priority
System-level features
First-party support

See our CoreML tutorial for iOS-specific development.

vs PyTorch Mobile

ONNX Runtime mobile against PyTorch:

ONNX Runtime Strengths:

Production-optimized
Better performance
Smaller size
More backends
Model portability

PyTorch Mobile Strengths:

Direct PyTorch workflow
Easier debugging
Dynamic graphs
Research-friendly
Python integration

Getting Started with ONNX Runtime Mobile

Installation

Set up ONNX Runtime mobile development:

iOS (CocoaPods):

# Podfile
platform :ios, '12.0'

target 'YourApp' do
  use_frameworks!
  
  # Basic ONNX Runtime
  pod 'onnxruntime-objc', '~> 1.16'
  
  # With CoreML support
  pod 'onnxruntime-objc', '~> 1.16', :subspecs => ['CoreML']
end

Android (Gradle):

// build.gradle
dependencies {
    // Basic ONNX Runtime
    implementation 'com.microsoft.onnxruntime:onnxruntime-android:1.16.0'
    
    // With NNAPI support
    implementation 'com.microsoft.onnxruntime:onnxruntime-android:1.16.0'
    implementation 'com.microsoft.onnxruntime:onnxruntime-extensions-android:1.16.0'
}

[Image Alt Text: ONNX Runtime mobile installation process steps]

React Native:

npm install onnxruntime-react-native
# or
yarn add onnxruntime-react-native

Model Conversion

Convert models to ONNX Runtime mobile format:

From PyTorch:

import torch
import torch.onnx

# Load PyTorch model
model = torch.load('model.pth')
model.eval()

# Example input
dummy_input = torch.randn(1, 3, 224, 224)

# Export to ONNX
torch.onnx.export(
    model,
    dummy_input,
    "model.onnx",
    export_params=True,
    opset_version=15,
    do_constant_folding=True,
    input_names=['input'],
    output_names=['output'],
    dynamic_axes={
        'input': {0: 'batch_size'},
        'output': {0: 'batch_size'}
    }
)

From TensorFlow:

import tensorflow as tf
import tf2onnx

# Load TensorFlow model
model = tf.keras.models.load_model('model.h5')

# Convert to ONNX
spec = (tf.TensorSpec((None, 224, 224, 3), tf.float32, name="input"),)
output_path = "model.onnx"

model_proto, _ = tf2onnx.convert.from_keras(
    model,
    input_signature=spec,
    opset=15,
    output_path=output_path
)

[Image Alt Text: ONNX Runtime mobile model conversion workflow diagram]

Model Optimization

Optimize for ONNX Runtime mobile:

Quantization:

from onnxruntime.quantization import quantize_dynamic, QuantType

# Dynamic quantization (easiest)
model_input = "model.onnx"
model_output = "model_quantized.onnx"

quantize_dynamic(
    model_input,
    model_output,
    weight_type=QuantType.QUInt8  # or QInt8
)

Graph Optimization:

import onnxruntime as ort

# Optimize graph
sess_options = ort.SessionOptions()
sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
sess_options.optimized_model_filepath = "model_optimized.onnx"

session = ort.InferenceSession("model.onnx", sess_options)

Learn about model quantization.

ONNX Runtime Mobile Implementation

iOS Implementation

Build ONNX Runtime mobile iOS app:

Swift Code:

import onnxruntime_objc

class ImageClassifier {
    private var session: ORTSession?
    
    init() {
        do {
            // Create session options
            let options = try ORTSessionOptions()
            
            // Enable CoreML (hardware acceleration)
            try options.appendCoreMLExecutionProvider()
            
            // Load model
            let modelPath = Bundle.main.path(forResource: "model", ofType: "onnx")!
            let env = try ORTEnv(loggingLevel: .warning)
            session = try ORTSession(env: env, modelPath: modelPath, sessionOptions: options)
            
        } catch {
            print("ONNX Runtime mobile error: \(error)")
        }
    }
    
    func classify(image: UIImage) throws -> [Float] {
        // Preprocess image
        let inputData = preprocessImage(image)
        
        // Create input tensor
        let inputName = "input"
        let inputShape: [NSNumber] = [1, 3, 224, 224]
        let inputTensor = try ORTValue(tensorData: NSMutableData(data: inputData),
                                       elementType: .float,
                                       shape: inputShape)
        
        // Run inference with ONNX Runtime mobile
        let outputs = try session!.run(
            withInputs: [inputName: inputTensor],
            outputNames: ["output"],
            runOptions: nil
        )
        
        // Extract results
        let outputTensor = outputs["output"]!
        let outputData = try outputTensor.tensorData() as Data
        
        // Convert to array
        let results = outputData.withUnsafeBytes { 
            Array(UnsafeBufferPointer<Float>(
                start: $0.baseAddress!.assumingMemoryBound(to: Float.self),
                count: outputData.count / MemoryLayout<Float>.stride
            ))
        }
        
        return results
    }
    
    private func preprocessImage(_ image: UIImage) -> Data {
        // Resize to 224x224
        let size = CGSize(width: 224, height: 224)
        UIGraphicsBeginImageContext(size)
        image.draw(in: CGRect(origin: .zero, size: size))
        let resized = UIGraphicsGetImageFromCurrentImageContext()!
        UIGraphicsEndImageContext()
        
        // Convert to RGB data
        guard let cgImage = resized.cgImage,
              let data = cgImage.dataProvider?.data,
              let bytes = CFDataGetBytePtr(data) else {
            return Data()
        }
        
        // Normalize to [-1, 1]
        var floatArray = [Float]()
        for i in stride(from: 0, to: CFDataGetLength(data), by: 4) {
            let r = (Float(bytes[i]) / 255.0 - 0.5) * 2
            let g = (Float(bytes[i+1]) / 255.0 - 0.5) * 2
            let b = (Float(bytes[i+2]) / 255.0 - 0.5) * 2
            floatArray.append(contentsOf: [r, g, b])
        }
        
        return Data(bytes: floatArray, count: floatArray.count * MemoryLayout<Float>.stride)
    }
}

[Image Alt Text: ONNX Runtime mobile iOS Swift code implementation example]

Android Implementation

Implement ONNX Runtime mobile on Android:

Kotlin Code:

import ai.onnxruntime.*
import android.graphics.Bitmap

class ImageClassifier(context: Context) {
    private val ortEnv = OrtEnvironment.getEnvironment()
    private val session: OrtSession
    
    init {
        // Create session options
        val sessionOptions = OrtSession.SessionOptions()
        
        // Enable NNAPI for hardware acceleration
        sessionOptions.addNnapi()
        
        // Load model from assets
        val modelBytes = context.assets.open("model.onnx").readBytes()
        session = ortEnv.createSession(modelBytes, sessionOptions)
    }
    
    fun classify(bitmap: Bitmap): FloatArray {
        // Preprocess image
        val inputData = preprocessImage(bitmap)
        
        // Create input tensor
        val inputName = session.inputNames.iterator().next()
        val shape = longArrayOf(1, 3, 224, 224)
        val inputTensor = OnnxTensor.createTensor(
            ortEnv,
            FloatBuffer.wrap(inputData),
            shape
        )
        
        // Run inference with ONNX Runtime mobile
        val results = session.run(mapOf(inputName to inputTensor))
        
        // Extract output
        val outputName = session.outputNames.iterator().next()
        val output = results[outputName]?.value as Array<FloatArray>
        
        // Cleanup
        inputTensor.close()
        results.close()
        
        return output[0]
    }
    
    private fun preprocessImage(bitmap: Bitmap): FloatArray {
        // Resize to 224x224
        val resized = Bitmap.createScaledBitmap(bitmap, 224, 224, true)
        
        // Extract pixels
        val pixels = IntArray(224 * 224)
        resized.getPixels(pixels, 0, 224, 0, 0, 224, 224)
        
        // Convert to float array and normalize
        val floatArray = FloatArray(3 * 224 * 224)
        var idx = 0
        for (pixel in pixels) {
            floatArray[idx++] = ((pixel shr 16 and 0xFF) / 255f - 0.5f) * 2f // R
            floatArray[idx++] = ((pixel shr 8 and 0xFF) / 255f - 0.5f) * 2f  // G
            floatArray[idx++] = ((pixel and 0xFF) / 255f - 0.5f) * 2f        // B
        }
        
        return floatArray
    }
    
    fun close() {
        session.close()
        ortEnv.close()
    }
}

React Native Implementation

ONNX Runtime mobile for React Native:

JavaScript Code:

import { InferenceSession, Tensor } from 'onnxruntime-react-native';

class ImageClassifier {
  constructor() {
    this.session = null;
  }
  
  async initialize() {
    try {
      // Load model with ONNX Runtime mobile
      this.session = await InferenceSession.create(
        'model.onnx',
        {
          executionProviders: ['coreml'], // or 'nnapi' for Android
          graphOptimizationLevel: 'all'
        }
      );
    } catch (error) {
      console.error('ONNX Runtime mobile init error:', error);
    }
  }
  
  async classify(imageData) {
    // Preprocess image
    const preprocessed = this.preprocessImage(imageData);
    
    // Create input tensor
    const inputTensor = new Tensor(
      'float32',
      new Float32Array(preprocessed),
      [1, 3, 224, 224]
    );
    
    // Run inference
    const feeds = { input: inputTensor };
    const results = await this.session.run(feeds);
    
    // Extract output
    const output = results.output.data;
    
    return Array.from(output);
  }
  
  preprocessImage(imageData) {
    // Image preprocessing logic
    // Resize, normalize, convert to float array
    // ...
    return preprocessedArray;
  }
}

export default ImageClassifier;

[Image Alt Text: ONNX Runtime mobile React Native code example]

ONNX Runtime Mobile Performance Optimization

Execution Providers

ONNX Runtime mobile backend optimization:

Available Providers:

CoreML (iOS): Apple Neural Engine
NNAPI (Android): Neural Networks API
CPU (Fallback): Cross-platform
DirectML (Windows): GPU acceleration
XNNPACK (Mobile): Optimized CPU

Selection Strategy:

# Priority order (ONNX Runtime mobile tries in order)
providers = [
    'CoreMLExecutionProvider',  # iOS
    'NNAPIExecutionProvider',   # Android
    'CPUExecutionProvider'      # Fallback
]

session = ort.InferenceSession('model.onnx', providers=providers)

[Image Alt Text: ONNX Runtime mobile execution providers performance comparison]

Memory Optimization

ONNX Runtime mobile memory management:

Session Options:

// C++ example
OrtSessionOptions* session_options;
CreateSessionOptions(&session_options);

// Enable memory pattern optimization
EnableMemPattern(session_options);

// Enable CPU memory arena
EnableCpuMemArena(session_options);

// Set graph optimization level
SetGraphOptimizationLevel(session_options, ORT_ENABLE_ALL);

Model Optimization:

Reduce precision (FP32 → FP16 → INT8)
Prune unnecessary operations
Fuse operations
Remove unused outputs
Optimize graph structure

Batch Processing

Improve ONNX Runtime mobile throughput:

Batching Strategy:

# Process multiple inputs efficiently
batch_size = 4
inputs = collect_batch(batch_size)

# Prepare batched input
batched_input = np.stack(inputs)  # Shape: (4, 3, 224, 224)

# Single inference call
session = ort.InferenceSession('model.onnx')
outputs = session.run(None, {'input': batched_input})

# Process all results at once
for i, output in enumerate(outputs[0]):
    process_result(output)

Discover NPU acceleration benefits.

ONNX Runtime Mobile Benchmarks

Performance Comparison

ONNX Runtime mobile speed tests:

MobileNet V2 (Image Classification):

iOS (CoreML): 15ms
Android (NNAPI): 18ms
CPU fallback: 45ms

ResNet-50:

iOS (CoreML): 35ms
Android (NNAPI): 42ms
CPU fallback: 180ms

BERT-base (Text):

iOS: 85ms
Android: 95ms
CPU: 320ms

[Image Alt Text: ONNX Runtime mobile performance benchmarks across platforms]

Memory Footprint

ONNX Runtime mobile resource usage:

Runtime Size:

iOS: ~8MB
Android: ~12MB
React Native: ~15MB

Model Sizes (quantized):

MobileNet V2: 3.5MB
ResNet-50: 25MB
BERT-base: 110MB

Peak Memory:

MobileNet V2: ~50MB
ResNet-50: ~200MB
BERT-base: ~450MB

Real-World ONNX Runtime Mobile Applications

Computer Vision

ONNX Runtime mobile vision apps:

Object Detection:

YOLO models
Real-time processing
Bounding boxes
Classification

Image Segmentation:

Semantic segmentation
Instance segmentation
Medical imaging
AR applications

Face Recognition:

Face detection
Landmark detection
Expression analysis
Identity verification

[Image Alt Text: ONNX Runtime mobile computer vision applications examples]

Natural Language Processing

ONNX Runtime mobile NLP:

Text Classification:

Sentiment analysis
Spam detection
Category prediction
Language identification

Named Entity Recognition:

Person names
Organizations
Locations
Dates/times

Question Answering:

Context-based QA
Extractive answers
Mobile assistants

Audio Processing

ONNX Runtime mobile audio:

Speech Recognition:

Voice commands
Transcription
Speaker identification

Audio Classification:

Sound detection
Music genre
Acoustic scenes

Voice Enhancement:

Noise reduction
Echo cancellation
Audio upsampling

Troubleshooting ONNX Runtime Mobile

Common Issues

ONNX Runtime mobile problems:

Model Load Failures:

Error: Failed to load model
Solution:
- Check ONNX opset version
- Verify model path
- Ensure compatible operators
- Update ONNX Runtime

Inference Errors:

Error: Shape mismatch
Solution:
- Verify input dimensions
- Check data type
- Validate preprocessing
- Review model specs

[Image Alt Text: ONNX Runtime mobile troubleshooting decision tree]

Performance Issues:

Problem: Slow inference
Solutions:
- Enable hardware acceleration
- Quantize model
- Optimize graph
- Reduce input size
- Check provider priority

Debugging Tips

ONNX Runtime mobile debugging:

Enable Logging:

import onnxruntime as ort

# Set logging level
ort.set_default_logger_severity(0)  # Verbose

# Log to file
ort.set_default_logger_verbosity(3)

Profile Performance:

# Enable profiling
sess_options = ort.SessionOptions()
sess_options.enable_profiling = True

session = ort.InferenceSession('model.onnx', sess_options)

# After inference
prof_file = session.end_profiling()
print(f"Profiling data: {prof_file}")

The Verdict on ONNX Runtime Mobile

ONNX Runtime mobile delivers on its promise of write-once, run-anywhere machine learning. The combination of framework flexibility, platform coverage, and performance optimization makes it an excellent choice for cross-platform mobile AI development.

Choose ONNX Runtime Mobile For:

✅ Cross-platform apps
✅ Framework flexibility
✅ Production optimization
✅ Performance requirements
✅ Model portability

Consider Alternatives If:

❌ Platform-specific optimizations crucial
❌ Need largest model library
❌ Require extensive tutorials
❌ Want first-party framework
❌ Prefer native tools

Key Takeaways:

Framework-agnostic deployment
Excellent performance
Consistent cross-platform API
Active Microsoft support
Growing adoption

ONNX Runtime mobile bridges the gap between AI research and mobile production. Train in your preferred framework, optimize once, deploy everywhere—exactly as promised.

For teams building cross-platform mobile AI applications, ONNX Runtime mobile significantly reduces development complexity while maintaining performance. It’s mature, well-supported, and production-ready.

The future of mobile ML is cross-platform, and ONNX Runtime mobile leads that future today.