How to Deploy ONNX Models (almost) Anywhere, with PlaidML

Apr 27, 2018 | By: Rob Earhart

At Vertex.AI, we’ve been running ONNX machine-learning models on various GPUs via our own PlaidML high-performance machine-learning backend. What makes this happen is ONNX-PlaidML, a bridge from ONNX to PlaidML’s Tile computation language.

Comments on Hacker News

What’s This All About?

ONNX provides a high-level framework-independent file format for ML models; you might train a model on one machine learning platform, export it to ONNX, and use the model for inference with a completely different framework.

ONNX is built on Protocol Buffers, making the model easy to manipulate from any language. The developers have thought carefully about versioning, provided good documentation for the ONNX operator set, and provided hundreds of tests to cover all sorts of specification edge cases.

Together, these make ONNX a solid way to publish a model with confidence that it’ll run the same way on any ONNX backend implementation, even in the face of updates to the ONNX specification.

Actually running ONNX models can be a bit tricky, though—you need an ML backend, and you need one that’s general enough to work with your model. That’s where PlaidML comes in.

PlaidML supports a wide variety of different processors: it has direct support for CPUs, and can use most GPUs via its OpenCL backend (we regularly test on AMD, Intel and NVidia GPUs). PlaidML does this by generating optimized GPU kernels on-the-fly, specific to the exact operation being performed, the dimensions of the input and output data, and the computational abilities of the processor.

Getting Started

For reference: we’re running this on an original-generation Surface Book, with Windows 10. PlaidML also supports Linux and MacOS.

We’re also running this using Anaconda. ONNX can be a little tricky to install on Windows; Anaconda makes the installation easy. If you’re not using Anaconda, you’ll want to use the installation instructions at the ONNX Code Repository.

Installation itself is straightforward:

C:\Users\rob> conda install -q -y -c conda-forge onnx
C:\Users\rob> pip install -q onnx-plaidml

You’ll also need to set up PlaidML, so it knows which device to use:

C:\Users\rob> plaidml-setup

PlaidML Setup (0.3.2)

Thanks for using PlaidML!

Some Notes:
  * Bugs and other issues:
  * Questions:
  * Say hello:!forum/plaidml-dev
  * PlaidML is licensed under the GNU AGPLv3

Default Config Devices:
   geforce_gpu.0 : NVIDIA Corporation GeForce GPU

Experimental Config Devices:
   geforce_gpu.0 : NVIDIA Corporation GeForce GPU
   opencl_cpu.0 : Intel(R) Corporation OpenCL CPU

Using experimental devices can cause poor performance, crashes, and other nastiness.

Enable experimental device support? (y,n)[n]:n

Selected device:

PlaidML sends anonymous usage statistics to help guide improvements.
We'd love your help making it better.

Enable telemetry reporting? (y,n)[y]:y

Almost done. Multiplying some matrices...
Tile code:
  function (B[X,Z], C[Z,Y]) -> (A) { A[x,y : X,Y] = +(B[x,z] * C[z,y]); }
Whew. That worked.

Save settings to C:\Users\rob\.plaidml? (y,n)[y]:

Once PlaidML’s been set up, your code can use ONNX-PlaidML to run ONNX models:

from __future__ import print_function
import os

import numpy as np
import onnx
import onnx_plaidml.backend

# Use ONNX to read the model:
model_dir = os.path.join('.onnx', 'models', 'resnet50')
model = onnx.load(os.path.join(model_dir, 'model.onnx'))

# Use PlaidML to compile the model:
rep = onnx_plaidml.backend.prepare(model)

# Read some test data:
data = np.load(os.path.join(model_dir, 'test_data_0.npz'), encoding='bytes')

# Run inference over the test data using the model:
outputs =['inputs'])

# Make sure it worked:
np.testing.assert_allclose(data['outputs'], outputs, rtol=1e-3, atol=1e-7)

print('ResNet50 works!')
C:\Users\rob> python
INFO:plaidml:Opening device "geforce_gpu.0"
ResNet50 works!

We’ve also integrated ONNX-PlaidML into our machine-learning benchmarking tool, PlaidBench. Here’s an example:

C:\Users\rob> pip install -q plaidbench

C:\Users\rob> plaidbench onnx shufflenet
Running 1024 examples with shufflenet, batch size 1
Compiling network...
INFO:plaidml:Opening device "geforce_gpu.0"
Warming up ...
Main timing
Example finished, elapsed: 4.371005058288574 (compile), 18.001006603240967 (execution), 0.017579108010977507 (execution per example)
Correctness: PASS, max_error: 3.838554221147206e-06, max_abs_error: 1.1548399925231934e-07, fail_ratio: 0.0

C:\Users\rob> plaidbench --batch-size 32 onnx shufflenet
Running 1024 examples with shufflenet, batch size 32
Compiling network...
INFO:plaidml:Opening device "geforce_gpu.0"
INFO:plaidml:Analyzing Ops: 878 of 922 operations complete
Warming up ...
Main timing
Example finished, elapsed: 4.580990552902222 (compile), 0.5449995994567871 (execution), 0.0005322261713445187 (execution per example)
Correctness: PASS, max_error: 3.838554221147206e-06, max_abs_error: 1.1548399925231934e-07, fail_ratio: 0.0

The code’s Open Source:

C:\Users\rob> git clone
Cloning into 'onnx-plaidml'...
remote: Counting objects: 74, done.
remote: Total 74 (delta 0), reused 0 (delta 0), pack-reused 74
Unpacking objects: 100% (74/74), done.

C:\Users\rob> cd onnx-plaidml

C:\Users\rob\onnx-plaidml> python test

# Installation, test running, &c
---------- onnx coverage: ----------
Operators (passed/loaded/total): 63/64/109
Operator            Attributes
                    (name: #values)
Abs                 No attributes
Add                 axis: 1
                    broadcast: 1
And                 axis: 4
                    broadcast: 1
Conv                dilations: 4
                    group: 7
                    kernel_shape: 9
                    pads: 7
                    strides: 5
                    auto_pad: 0
Cast                to: 3
Ceil                No attributes
Clip                max: 3
                    min: 3
Concat              axis: 3
Constant            value: 2
Pad                 mode: 1
                    pads: 2
                    value: 3
Div                 broadcast: 1
                    axis: 0
Elu                 alpha: 1
Equal               broadcast: 1
                    axis: 0
Exp                 No attributes
Flatten             axis: 4
Floor               No attributes
Gather              axis: 1
GlobalAveragePool   No attributes
GlobalMaxPool       No attributes
Greater             broadcast: 1
                    axis: 0
Hardmax             axis: 3
HardSigmoid         alpha: 1
                    beta: 1
LeakyRelu           alpha: 3
Less                broadcast: 1
                    axis: 0
Log                 No attributes
LogSoftmax          axis: 3
MatMul              No attributes
Max                 No attributes
Mean                No attributes
Min                 No attributes
Mul                 axis: 1
                    broadcast: 1
Neg                 No attributes
Not                 No attributes
Or                  axis: 4
                    broadcast: 1
Pow                 axis: 1
                    broadcast: 1
Reciprocal          No attributes
Relu                No attributes
Reshape             shape: 16
Selu                alpha: 1
                    gamma: 1
Shape               No attributes
Sigmoid             No attributes
Size                No attributes
Slice               axes: 2
                    ends: 4
                    starts: 5
Softmax             axis: 5
Softplus            No attributes
Softsign            No attributes
Sqrt                No attributes
Squeeze             axes: 1
Sub                 broadcast: 1
                    axis: 0
Sum                 No attributes
Tanh                No attributes
ThresholdedRelu     alpha: 1
Transpose           perm: 8
Unsqueeze           axes: 1
Xor                 axis: 4
                    broadcast: 1
LRN                 alpha: 2
                    beta: 1
                    bias: 2
                    size: 1
MaxPool             kernel_shape: 5
                    pads: 6
                    strides: 5
                    auto_pad: 0
Gemm                alpha: 1
                    beta: 1
                    broadcast: 1
                    transB: 1
                    transA: 0
Dropout             is_test: 1
                    ratio: 2
BatchNormalization  consumed_inputs: 1
                    epsilon: 3
                    is_test: 1
                    momentum: 3
                    spatial: 0
AveragePool         kernel_shape: 5
                    pads: 4
                    strides: 4
                    auto_pad: 0
Split               axis: 3
                    split: 1
PRelu               No attributes
================== 266 passed, 8 skipped in 1383.43 seconds ===================


Caveat: On the ONNX backend test suite, ONNX-PlaidML passes 266 tests and skips eight; there’re a few operations that’re a little tricky to implement, and we haven’t needed them yet for our own work. We’ve implemented every operation used by AlexNet, DenseNet, Inception-v1, Inception-v2, ResNet-50, ShuffleNet, SqueezeNet, VGG-16, and VGG-19—and we’re definitely open to contributions, if there’s something you need.

We’ve found ONNX to be a useful tool, and look forward to seeing more frameworks support it. ONNX-PlaidML makes it easy to use ONNX in production settings, and we look forward to seeing what people do with it.

And if you’re looking for a team to help you deploy production machine-learning networks, or to help you get custom AI hardware running in modern machine-learning frameworks, drop us a note; we’d love to help make your products come to life.

Further reading:

Social media:

© 2018 Intel Corporation