Accelerated Deep Learning on macOS with PlaidML's new Metal support

May 10, 2018 | By: Frank Laub

Comments on Hacker News

For the 0.3.3 release of PlaidML, support for running deep learning networks on macOS has improved with the ability to use Apple’s native Metal API. Metal offers “near-direct access to the graphics processing unit (GPU)”, allowing machine learning tasks to run faster on any Mac where Metal is supported.

As previously announced, Mac users could accelerate their PlaidML workloads by using the OpenCL backend. In our internal testing, in some cases, we see an up to 5x speed up by using Metal over OpenCL.

To try out this new functionality on your Mac, first install the latest pre-release of PlaidML:

# Setting up a virtualenv to keep things tidy (this step is optional)
$ virtualenv env
$ source env/bin/activate

# Install the latest pre-release of PlaidML (with Keras support)
$ pip install --pre plaidml-keras

# Install plaidbench to compare benchmarks between different backends
$ pip install plaidbench

Next, run plaidml-setup to select the desired Metal-based device. Be sure to select experimental device support:

$ plaidml-setup

PlaidML Setup (0.3.3rc1)

Thanks for using PlaidML!

Some Notes:
  * Bugs and other issues: https://github.com/plaidml/plaidml
  * Questions: https://stackoverflow.com/questions/tagged/plaidml
  * Say hello: https://groups.google.com/forum/#!forum/plaidml-dev
  * PlaidML is licensed under the GNU AGPLv3

Default Config Devices:
   No devices.

Experimental Config Devices:
   llvm_preview_cpu.0 : LLVM_preview_CPU
   geforce_gtx_680mx.0 : NVIDIA GeForce GTX 680MX
   metal_nvidia_geforce_gtx_680mx.0 : NVIDIA GeForce GTX 680MX

Using experimental devices can cause poor performance, crashes, and other nastiness.

Enable experimental device support? (y,n)[n]:y

Multiple devices detected (You can override by setting PLAIDML_DEVICE_IDS).
Please choose a default device:

   1 : llvm_preview_cpu.0
   2 : geforce_gtx_680mx.0
   3 : metal_nvidia_geforce_gtx_680mx.0

Default device? (1,2,3)[1]:3

Selected device:
    metal_nvidia_geforce_gtx_680mx.0

PlaidML sends anonymous usage statistics to help guide improvements.
We'd love your help making it better.

Enable telemetry reporting? (y,n)[y]:

Almost done. Multiplying some matrices...
Tile code:
  function (B[X,Z], C[Z,Y]) -> (A) { A[x,y : X,Y] = +(B[x,z] * C[z,y]); }
Whew. That worked.

Save settings to /Users/flaub/.plaidml? (y,n)[y]:
Success!

Now let’s run a few benchmarks to see Metal’s performance compared with OpenCL.

# Metal is the default device as configured by plaidml-setup previously
$ plaidbench keras mobilenet
Running 1024 examples with mobilenet, batch size 1
INFO:plaidml:Opening device "metal_nvidia_geforce_gtx_680mx.0"
Model loaded.
Compiling network...
Warming up ...
Main timing
Example finished, elapsed: 3.37273907661438 (compile), 11.541990041732788 (execution), 0.011271474650129676 (execution per example)
Correctness: PASS, max_error: 8.841450835461728e-06, max_abs_error: 4.172325134277344e-07, fail_ratio: 0.0

# Override the device ID to determine how well OpenCL performs
$ PLAIDML_DEVICE_IDS=geforce_gtx_680mx.0 plaidbench keras mobilenet
Running 1024 examples with mobilenet, batch size 1
INFO:plaidml:Opening device "geforce_gtx_680mx.0"
Model loaded.
Compiling network...
Warming up ...
Main timing
Example finished, elapsed: 1.640672206878662 (compile), 57.07654571533203 (execution), 0.05573881417512894 (execution per example)
Correctness: PASS, max_error: 1.0121882951352745e-05, max_abs_error: 6.556510925292969e-07, fail_ratio: 0.0

In the above example, mobilenet performs almost 5x faster than OpenCL!

Social media:

© 2018 Vertex.AI