ONNX Runtime Mobile: Cross-Platform AI Tutorial 2025

ONNX Runtime Mobile

ONNX Runtime mobile – deploy machine learning models across iOS, Android, and web with a single codebase. ONNX Runtime mobile provides the performance, flexibility, and compatibility that modern cross-platform AI applications demand.

This comprehensive tutorial covers ONNX Runtime mobile fundamentals, implementation guides, optimization techniques, performance benchmarks, and real-world deployment strategies for building production-ready mobile AI applications in 2025.

What Is ONNX Runtime Mobile?

ONNX Runtime mobile is Microsoft’s high-performance inference engine for Open Neural Network Exchange (ONNX) format models. ONNX Runtime mobile enables developers to train models in any framework and deploy everywhere.

[Image Alt Text: ONNX Runtime mobile cross-platform architecture diagram]

Why ONNX Runtime Mobile Matters:

Framework Flexibility:

  • Train in TensorFlow
  • Train in PyTorch
  • Train in scikit-learn
  • Convert to ONNX
  • Deploy with ONNX Runtime

Platform Coverage:

  • iOS (CoreML backend)
  • Android (NNAPI backend)
  • Web (WebAssembly)
  • Windows (DirectML)
  • Linux (CUDA)

Performance:

  • Hardware acceleration
  • Optimized inference
  • Small binary size
  • Efficient memory usage
  • Fast startup time

Learn about mobile ML frameworks.

ONNX Runtime Mobile vs Alternatives

vs TensorFlow Lite

ONNX Runtime mobile compared to TFLite:

ONNX Runtime Advantages:

  • Framework-agnostic
  • Better cross-platform consistency
  • Simpler conversion
  • Unified API
  • Smaller runtime

TensorFlow Lite Advantages:

  • Larger community
  • More tutorials
  • Better Google integration
  • Mature ecosystem
  • More pre-trained models

[Image Alt Text: ONNX Runtime mobile vs TensorFlow Lite comparison chart]

vs CoreML

ONNX Runtime mobile versus Apple framework:

ONNX Runtime Benefits:

  • Cross-platform code
  • Framework flexibility
  • Easier model sharing
  • Unified development
  • Version control

CoreML Benefits:

  • iOS optimization
  • Better Apple integration
  • Neural Engine priority
  • System-level features
  • First-party support

See our CoreML tutorial for iOS-specific development.

vs PyTorch Mobile

ONNX Runtime mobile against PyTorch:

ONNX Runtime Strengths:

  • Production-optimized
  • Better performance
  • Smaller size
  • More backends
  • Model portability

PyTorch Mobile Strengths:

  • Direct PyTorch workflow
  • Easier debugging
  • Dynamic graphs
  • Research-friendly
  • Python integration

Getting Started with ONNX Runtime Mobile

Installation

Set up ONNX Runtime mobile development:

iOS (CocoaPods):

# Podfile
platform :ios, '12.0'

target 'YourApp' do
  use_frameworks!
  
  # Basic ONNX Runtime
  pod 'onnxruntime-objc', '~> 1.16'
  
  # With CoreML support
  pod 'onnxruntime-objc', '~> 1.16', :subspecs => ['CoreML']
end

Android (Gradle):

// build.gradle
dependencies {
    // Basic ONNX Runtime
    implementation 'com.microsoft.onnxruntime:onnxruntime-android:1.16.0'
    
    // With NNAPI support
    implementation 'com.microsoft.onnxruntime:onnxruntime-android:1.16.0'
    implementation 'com.microsoft.onnxruntime:onnxruntime-extensions-android:1.16.0'
}

[Image Alt Text: ONNX Runtime mobile installation process steps]

React Native:

npm install onnxruntime-react-native
# or
yarn add onnxruntime-react-native

Model Conversion

Convert models to ONNX Runtime mobile format:

From PyTorch:

import torch
import torch.onnx

# Load PyTorch model
model = torch.load('model.pth')
model.eval()

# Example input
dummy_input = torch.randn(1, 3, 224, 224)

# Export to ONNX
torch.onnx.export(
    model,
    dummy_input,
    "model.onnx",
    export_params=True,
    opset_version=15,
    do_constant_folding=True,
    input_names=['input'],
    output_names=['output'],
    dynamic_axes={
        'input': {0: 'batch_size'},
        'output': {0: 'batch_size'}
    }
)

From TensorFlow:

import tensorflow as tf
import tf2onnx

# Load TensorFlow model
model = tf.keras.models.load_model('model.h5')

# Convert to ONNX
spec = (tf.TensorSpec((None, 224, 224, 3), tf.float32, name="input"),)
output_path = "model.onnx"

model_proto, _ = tf2onnx.convert.from_keras(
    model,
    input_signature=spec,
    opset=15,
    output_path=output_path
)

[Image Alt Text: ONNX Runtime mobile model conversion workflow diagram]

Model Optimization

Optimize for ONNX Runtime mobile:

Quantization:

from onnxruntime.quantization import quantize_dynamic, QuantType

# Dynamic quantization (easiest)
model_input = "model.onnx"
model_output = "model_quantized.onnx"

quantize_dynamic(
    model_input,
    model_output,
    weight_type=QuantType.QUInt8  # or QInt8
)

Graph Optimization:

import onnxruntime as ort

# Optimize graph
sess_options = ort.SessionOptions()
sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
sess_options.optimized_model_filepath = "model_optimized.onnx"

session = ort.InferenceSession("model.onnx", sess_options)

Learn about model quantization.

ONNX Runtime Mobile Implementation

iOS Implementation

Build ONNX Runtime mobile iOS app:

Swift Code:

import onnxruntime_objc

class ImageClassifier {
    private var session: ORTSession?
    
    init() {
        do {
            // Create session options
            let options = try ORTSessionOptions()
            
            // Enable CoreML (hardware acceleration)
            try options.appendCoreMLExecutionProvider()
            
            // Load model
            let modelPath = Bundle.main.path(forResource: "model", ofType: "onnx")!
            let env = try ORTEnv(loggingLevel: .warning)
            session = try ORTSession(env: env, modelPath: modelPath, sessionOptions: options)
            
        } catch {
            print("ONNX Runtime mobile error: \(error)")
        }
    }
    
    func classify(image: UIImage) throws -> [Float] {
        // Preprocess image
        let inputData = preprocessImage(image)
        
        // Create input tensor
        let inputName = "input"
        let inputShape: [NSNumber] = [1, 3, 224, 224]
        let inputTensor = try ORTValue(tensorData: NSMutableData(data: inputData),
                                       elementType: .float,
                                       shape: inputShape)
        
        // Run inference with ONNX Runtime mobile
        let outputs = try session!.run(
            withInputs: [inputName: inputTensor],
            outputNames: ["output"],
            runOptions: nil
        )
        
        // Extract results
        let outputTensor = outputs["output"]!
        let outputData = try outputTensor.tensorData() as Data
        
        // Convert to array
        let results = outputData.withUnsafeBytes { 
            Array(UnsafeBufferPointer<Float>(
                start: $0.baseAddress!.assumingMemoryBound(to: Float.self),
                count: outputData.count / MemoryLayout<Float>.stride
            ))
        }
        
        return results
    }
    
    private func preprocessImage(_ image: UIImage) -> Data {
        // Resize to 224x224
        let size = CGSize(width: 224, height: 224)
        UIGraphicsBeginImageContext(size)
        image.draw(in: CGRect(origin: .zero, size: size))
        let resized = UIGraphicsGetImageFromCurrentImageContext()!
        UIGraphicsEndImageContext()
        
        // Convert to RGB data
        guard let cgImage = resized.cgImage,
              let data = cgImage.dataProvider?.data,
              let bytes = CFDataGetBytePtr(data) else {
            return Data()
        }
        
        // Normalize to [-1, 1]
        var floatArray = [Float]()
        for i in stride(from: 0, to: CFDataGetLength(data), by: 4) {
            let r = (Float(bytes[i]) / 255.0 - 0.5) * 2
            let g = (Float(bytes[i+1]) / 255.0 - 0.5) * 2
            let b = (Float(bytes[i+2]) / 255.0 - 0.5) * 2
            floatArray.append(contentsOf: [r, g, b])
        }
        
        return Data(bytes: floatArray, count: floatArray.count * MemoryLayout<Float>.stride)
    }
}

[Image Alt Text: ONNX Runtime mobile iOS Swift code implementation example]

Android Implementation

Implement ONNX Runtime mobile on Android:

Kotlin Code:

import ai.onnxruntime.*
import android.graphics.Bitmap

class ImageClassifier(context: Context) {
    private val ortEnv = OrtEnvironment.getEnvironment()
    private val session: OrtSession
    
    init {
        // Create session options
        val sessionOptions = OrtSession.SessionOptions()
        
        // Enable NNAPI for hardware acceleration
        sessionOptions.addNnapi()
        
        // Load model from assets
        val modelBytes = context.assets.open("model.onnx").readBytes()
        session = ortEnv.createSession(modelBytes, sessionOptions)
    }
    
    fun classify(bitmap: Bitmap): FloatArray {
        // Preprocess image
        val inputData = preprocessImage(bitmap)
        
        // Create input tensor
        val inputName = session.inputNames.iterator().next()
        val shape = longArrayOf(1, 3, 224, 224)
        val inputTensor = OnnxTensor.createTensor(
            ortEnv,
            FloatBuffer.wrap(inputData),
            shape
        )
        
        // Run inference with ONNX Runtime mobile
        val results = session.run(mapOf(inputName to inputTensor))
        
        // Extract output
        val outputName = session.outputNames.iterator().next()
        val output = results[outputName]?.value as Array<FloatArray>
        
        // Cleanup
        inputTensor.close()
        results.close()
        
        return output[0]
    }
    
    private fun preprocessImage(bitmap: Bitmap): FloatArray {
        // Resize to 224x224
        val resized = Bitmap.createScaledBitmap(bitmap, 224, 224, true)
        
        // Extract pixels
        val pixels = IntArray(224 * 224)
        resized.getPixels(pixels, 0, 224, 0, 0, 224, 224)
        
        // Convert to float array and normalize
        val floatArray = FloatArray(3 * 224 * 224)
        var idx = 0
        for (pixel in pixels) {
            floatArray[idx++] = ((pixel shr 16 and 0xFF) / 255f - 0.5f) * 2f // R
            floatArray[idx++] = ((pixel shr 8 and 0xFF) / 255f - 0.5f) * 2f  // G
            floatArray[idx++] = ((pixel and 0xFF) / 255f - 0.5f) * 2f        // B
        }
        
        return floatArray
    }
    
    fun close() {
        session.close()
        ortEnv.close()
    }
}

React Native Implementation

ONNX Runtime mobile for React Native:

JavaScript Code:

import { InferenceSession, Tensor } from 'onnxruntime-react-native';

class ImageClassifier {
  constructor() {
    this.session = null;
  }
  
  async initialize() {
    try {
      // Load model with ONNX Runtime mobile
      this.session = await InferenceSession.create(
        'model.onnx',
        {
          executionProviders: ['coreml'], // or 'nnapi' for Android
          graphOptimizationLevel: 'all'
        }
      );
    } catch (error) {
      console.error('ONNX Runtime mobile init error:', error);
    }
  }
  
  async classify(imageData) {
    // Preprocess image
    const preprocessed = this.preprocessImage(imageData);
    
    // Create input tensor
    const inputTensor = new Tensor(
      'float32',
      new Float32Array(preprocessed),
      [1, 3, 224, 224]
    );
    
    // Run inference
    const feeds = { input: inputTensor };
    const results = await this.session.run(feeds);
    
    // Extract output
    const output = results.output.data;
    
    return Array.from(output);
  }
  
  preprocessImage(imageData) {
    // Image preprocessing logic
    // Resize, normalize, convert to float array
    // ...
    return preprocessedArray;
  }
}

export default ImageClassifier;

[Image Alt Text: ONNX Runtime mobile React Native code example]

ONNX Runtime Mobile Performance Optimization

Execution Providers

ONNX Runtime mobile backend optimization:

Available Providers:

  • CoreML (iOS): Apple Neural Engine
  • NNAPI (Android): Neural Networks API
  • CPU (Fallback): Cross-platform
  • DirectML (Windows): GPU acceleration
  • XNNPACK (Mobile): Optimized CPU

Selection Strategy:

# Priority order (ONNX Runtime mobile tries in order)
providers = [
    'CoreMLExecutionProvider',  # iOS
    'NNAPIExecutionProvider',   # Android
    'CPUExecutionProvider'      # Fallback
]

session = ort.InferenceSession('model.onnx', providers=providers)

[Image Alt Text: ONNX Runtime mobile execution providers performance comparison]

Memory Optimization

ONNX Runtime mobile memory management:

Session Options:

// C++ example
OrtSessionOptions* session_options;
CreateSessionOptions(&session_options);

// Enable memory pattern optimization
EnableMemPattern(session_options);

// Enable CPU memory arena
EnableCpuMemArena(session_options);

// Set graph optimization level
SetGraphOptimizationLevel(session_options, ORT_ENABLE_ALL);

Model Optimization:

  • Reduce precision (FP32 → FP16 → INT8)
  • Prune unnecessary operations
  • Fuse operations
  • Remove unused outputs
  • Optimize graph structure

Batch Processing

Improve ONNX Runtime mobile throughput:

Batching Strategy:

# Process multiple inputs efficiently
batch_size = 4
inputs = collect_batch(batch_size)

# Prepare batched input
batched_input = np.stack(inputs)  # Shape: (4, 3, 224, 224)

# Single inference call
session = ort.InferenceSession('model.onnx')
outputs = session.run(None, {'input': batched_input})

# Process all results at once
for i, output in enumerate(outputs[0]):
    process_result(output)

Discover NPU acceleration benefits.

ONNX Runtime Mobile Benchmarks

Performance Comparison

ONNX Runtime mobile speed tests:

MobileNet V2 (Image Classification):

  • iOS (CoreML): 15ms
  • Android (NNAPI): 18ms
  • CPU fallback: 45ms

ResNet-50:

  • iOS (CoreML): 35ms
  • Android (NNAPI): 42ms
  • CPU fallback: 180ms

BERT-base (Text):

  • iOS: 85ms
  • Android: 95ms
  • CPU: 320ms

[Image Alt Text: ONNX Runtime mobile performance benchmarks across platforms]

Memory Footprint

ONNX Runtime mobile resource usage:

Runtime Size:

  • iOS: ~8MB
  • Android: ~12MB
  • React Native: ~15MB

Model Sizes (quantized):

  • MobileNet V2: 3.5MB
  • ResNet-50: 25MB
  • BERT-base: 110MB

Peak Memory:

  • MobileNet V2: ~50MB
  • ResNet-50: ~200MB
  • BERT-base: ~450MB

Real-World ONNX Runtime Mobile Applications

Computer Vision

ONNX Runtime mobile vision apps:

Object Detection:

  • YOLO models
  • Real-time processing
  • Bounding boxes
  • Classification

Image Segmentation:

  • Semantic segmentation
  • Instance segmentation
  • Medical imaging
  • AR applications

Face Recognition:

  • Face detection
  • Landmark detection
  • Expression analysis
  • Identity verification

[Image Alt Text: ONNX Runtime mobile computer vision applications examples]

Natural Language Processing

ONNX Runtime mobile NLP:

Text Classification:

  • Sentiment analysis
  • Spam detection
  • Category prediction
  • Language identification

Named Entity Recognition:

  • Person names
  • Organizations
  • Locations
  • Dates/times

Question Answering:

  • Context-based QA
  • Extractive answers
  • Mobile assistants

Audio Processing

ONNX Runtime mobile audio:

Speech Recognition:

  • Voice commands
  • Transcription
  • Speaker identification

Audio Classification:

  • Sound detection
  • Music genre
  • Acoustic scenes

Voice Enhancement:

  • Noise reduction
  • Echo cancellation
  • Audio upsampling

Troubleshooting ONNX Runtime Mobile

Common Issues

ONNX Runtime mobile problems:

Model Load Failures:

Error: Failed to load model
Solution:
- Check ONNX opset version
- Verify model path
- Ensure compatible operators
- Update ONNX Runtime

Inference Errors:

Error: Shape mismatch
Solution:
- Verify input dimensions
- Check data type
- Validate preprocessing
- Review model specs

[Image Alt Text: ONNX Runtime mobile troubleshooting decision tree]

Performance Issues:

Problem: Slow inference
Solutions:
- Enable hardware acceleration
- Quantize model
- Optimize graph
- Reduce input size
- Check provider priority

Debugging Tips

ONNX Runtime mobile debugging:

Enable Logging:

import onnxruntime as ort

# Set logging level
ort.set_default_logger_severity(0)  # Verbose

# Log to file
ort.set_default_logger_verbosity(3)

Profile Performance:

# Enable profiling
sess_options = ort.SessionOptions()
sess_options.enable_profiling = True

session = ort.InferenceSession('model.onnx', sess_options)

# After inference
prof_file = session.end_profiling()
print(f"Profiling data: {prof_file}")

The Verdict on ONNX Runtime Mobile

ONNX Runtime mobile delivers on its promise of write-once, run-anywhere machine learning. The combination of framework flexibility, platform coverage, and performance optimization makes it an excellent choice for cross-platform mobile AI development.

Choose ONNX Runtime Mobile For:

  • ✅ Cross-platform apps
  • ✅ Framework flexibility
  • ✅ Production optimization
  • ✅ Performance requirements
  • ✅ Model portability

Consider Alternatives If:

  • ❌ Platform-specific optimizations crucial
  • ❌ Need largest model library
  • ❌ Require extensive tutorials
  • ❌ Want first-party framework
  • ❌ Prefer native tools

Key Takeaways:

  • Framework-agnostic deployment
  • Excellent performance
  • Consistent cross-platform API
  • Active Microsoft support
  • Growing adoption

ONNX Runtime mobile bridges the gap between AI research and mobile production. Train in your preferred framework, optimize once, deploy everywhere—exactly as promised.

For teams building cross-platform mobile AI applications, ONNX Runtime mobile significantly reduces development complexity while maintaining performance. It’s mature, well-supported, and production-ready.

The future of mobile ML is cross-platform, and ONNX Runtime mobile leads that future today.


Leave a Reply

Your email address will not be published. Required fields are marked *