Tensor Compilers: Comparing PlaidML, Tensor Comprehensions, and TVM

May 19, 2018. | By: Brian Retford and Jeremy Bruestle

Comments on Hacker News

One of the most complex and performance critical parts of any machine learning framework is its support for device specific acceleration. Indeed, without efficient GPU acceleration, much of modern ML research and deployment would not be possible. This acceleration support is also a critical bottleneck, both in terms of adding support for a wider range of hardware targets (including mobile) as well as for writing new research kernels. Much of NVIDIA’s dominance in machine learning can be attributed to its greater level of software support, largely in the form of the cuDNN acceleration library.

We wrote PlaidML to overcome this bottleneck. PlaidML is capable of automatically generating efficient GPU acceleration kernels for a wide range of hardware for both existing machine learning operations and new research kernels. Because writing a kernel is a complex process, GPU kernels have typically been written by hand. Along with PlaidML, two additional projects, Tensor Comprehensions and TVM, are attempting to change this paradigm. Tensor Comprehensions makes the point about the importance of these technologies in their very well written announcement.

In this post, we compare PlaidML, Tensor Comprehensions, and TVM along multiple dimensions, including examining performance and feature set. We begin with performance.


Automatic Kernel Generation in PlaidML

May 19, 2018. | By: Jeremy Bruestle

Historically, an engineer-intensive aspect of developing a machine learning backend was producing efficient device kernels from mathematical expressions of tensor operations. Practitioners wishing to utilize cutting edge research kernels in their networks needed to either wait for this development cycle to complete or rely on order-of-magnitude slower CPU implementations. Now, PlaidML, NNVM/TVM, and Tensor Comprehensions generate efficient kernels automatically from tensor expressions, bypassing this costly bottleneck. Below we’ll give an overview of how PlaidML transforms an operation requested by a graph in a frontend (such as Keras or ONNX) into an optimized OpenCL kernel.

We’ll also take a detailed look at the unique ways that PlaidML automatically performs several key aspects of generating efficiently parallelized kernels, including streamlining cache performance and minimizing edge-case conditionals.


Fully Automatic Differentiation for Tensor Expressions

May 17, 2018. | By: Tim Zerrell

Comments on Hacker News

Graphic showing a formula for the derivative of a convolution

Deep learning advances routinely require the construction of new neural network operations. Adding such operations has proven to be a labor-intensive step that introduces delays on the order of months. PlaidML enables researchers to add ops in hours through sophisticated code generation algorithms utilizing the Tile language. This post will explain a portion of this process, unique to PlaidML, that automatically generates gradient kernels, and will compare to related works such as TensorComprehensions and NNVM/TVM.


Accelerated Deep Learning on macOS with PlaidML's new Metal support

May 10, 2018. | By: Frank Laub

Comments on Hacker News

For the 0.3.3 release of PlaidML, support for running deep learning networks on macOS has improved with the ability to use Apple’s native Metal API. Metal offers “near-direct access to the graphics processing unit (GPU)”, allowing machine learning tasks to run faster on any Mac where Metal is supported.

As previously announced, Mac users could accelerate their PlaidML workloads by using the OpenCL backend. In our internal testing, in some cases, we see an up to 5x speed up by using Metal over OpenCL.


How to Deploy ONNX Models (almost) Anywhere, with PlaidML

Apr 27, 2018. | By: Rob Earhart

At Vertex.AI, we’ve been running ONNX machine-learning models on various GPUs via our own PlaidML high-performance machine-learning backend. What makes this happen is ONNX-PlaidML, a bridge from ONNX to PlaidML’s Tile computation language.

Comments on Hacker News

What’s This All About?

ONNX provides a high-level framework-independent file format for ML models; you might train a model on one machine learning platform, export it to ONNX, and use the model for inference with a completely different framework.

ONNX is built on Protocol Buffers, making the model easy to manipulate from any language. The developers have thought carefully about versioning, provided good documentation for the ONNX operator set, and provided hundreds of tests to cover all sorts of specification edge cases.

Together, these make ONNX a solid way to publish a model with confidence that it’ll run the same way on any ONNX backend implementation, even in the face of updates to the ONNX specification.

Actually running ONNX models can be a bit tricky, though—you need an ML backend, and you need one that’s general enough to work with your model. That’s where PlaidML comes in.


Deep learning with LLVM using PlaidML

Mar 28, 2018. | By: Mars Saxman

Comments on Hacker News

In PlaidML 0.3.0, we have integrated LLVM as a new option for CPU execution.

We’ve kept things simple for now, by supporting execution only on CPUs, but the wide array of instruction sets available through LLVM means that this new hardware interface module offers exciting possibilities for future targets. For example, we could take advantage of LLVM’s support for NVPTX and AMDGPU to run Tile code directly on the GPU, bypassing OpenCL; alternately, we could provide device-independent GPU binaries by compiling out to SPIR-V. In the embedded world, generating ARM code would be an obvious win, and we could make use of Qualcomm DSPs through LLVM’s Hexagon support.


Practical Embedded Object Detection with PlaidML

Jan 23, 2018. | By: Brian Retford

Comments on Hacker News

Smart Camera

PlaidML allows GPU accelerated applications to be deployed on almost any hardware. We introduce microplaid – an open source set of tools for developing accelerated object detection applications on embedded devices. We provide a parts list and outline using microplaid to build a mobile object detector based on the UP Squared board.


Deep Learning for Everyone: PlaidML for Windows

Nov 22, 2017. | By: Brian Retford

When we first announced PlaidML we promised to bring deep learning to every platform. With today’s release of preliminary Windows support we’re moving much closer to that goal – PlaidML now supports all the common desktop and server platforms.


Tile: A New Language for Machine Learning

Nov 10, 2017. | By: Jeremy Bruestle

Comments on Hacker News

With the release of the PlaidML machine learning framework, Vertex.AI is helping make accelerated machine learning on every platform a reality. Historically the key obstacle to acceleration on a wide range of platforms has been software support, that support being constrained by the need for laborious implementation of libraries of hand-crafted software “kernels” for each processor. PlaidML takes a different approach, using a tensor manipulation language we’ve developed called Tile to automatically generate the kernels, making it many times easier to add support for GPUs and new types of processors. Our benchmarks show that this approach is competitive with existing frameworks on NVIDIA GPUs, while also extending compatibility to other common GPUs such as those from AMD and Intel.


GPU-accelerated Deep Learning on Mac with Intel, AMD, and NVIDIA

Oct 27, 2017. | By: Choong Ng

Last week we announced the release of PlaidML, an open source software framework designed to enable deep learning on every device. Our goal with PlaidML is to make deep learning accessible by supporting the most popular hardware and software already in the hands of developers, researchers, and students. Last week’s release supported Python 2.7 on Linux. We received immediate requests for Mac and Python 3, today we’re pleased to announce preliminary support for both.


Announcing PlaidML: Open Source Deep Learning for Every Platform

Oct 20, 2017. | By: Choong Ng

PlaidML Logo

We’re pleased to announce the next step towards deep learning for every device and platform. Today Vertex.AI is releasing PlaidML, our open source portable deep learning engine. Our mission is to make deep learning accessible to every person on every device, and we’re building PlaidML to help make that a reality. We’re starting by supporting the most popular hardware and software already in the hands of developers, researchers, and students. The initial version of PlaidML runs on most existing PC hardware with OpenCL-capable GPUs from NVIDIA, AMD, or Intel. Additionally, we’re including support for running the widely popular Keras framework on top of Plaid to allow existing code and tutorials to run unchanged.


Benchmarking Deep Neural Nets for Real-Time Vision

Aug 29, 2017. | By: Choong Ng

Recently we posted early results from our work to bring deep learning to more people through OpenCL support including initial benchmarks on AMD and NVIDIA hardware. As a business we are building on this technology to bring real-time computer vision to every device. In this post we will discuss the key issue of processing speed, open source a tool we use to measure speed on real workloads, and share our performance progress. Through careful optimization our runtime software, code-named Plaid, is now up to 1.4x faster than TensorFlow 1.3 + cuDNN 6 for real-time vision tasks.


Open Source Deep Learning on AMD and Beyond

Aug 17, 2017. | By: Choong Ng

Earlier this week, we posted a first look at our work to bring deep learning to more people on more platforms. Today, we’re adding details on our plan to open source our software and an update on our development progress. With our support for the OpenCL open standard, people with a GPU from any manufacturer, including NVIDIA, AMD, and Intel, will soon be able to get started with real datasets in minutes. Users won’t need to sacrifice speed for that freedom, our software is as fast as TensorFlow + cuDNN in some cases and it will continue to improve.


Bringing Deep Learning to OpenCL

Aug 14, 2017. | By: Choong Ng

I’m excited to announce Vertex.AI’s work to bring deep learning to OpenCL and share a first look at our results so far. This work is intended to make deep learning accessible to more people and speed up progress across the field. Read on for the details and what’s coming next.


Hello World

Dec 7, 2016. | By: Choong Ng

We’re working to bring the power of neural nets to every application, using new technology invented and built in-house, to make applications that weren’t possible, possible. There’s a large gap between the capabilities neural networks show in research and the practical challenges in actually getting them to run on the platforms where most applications run. Making these algorithms work in your app requires fast enough hardware paired with precisely tuned software compatible with your platform and language. Efficient plus compatible plus portable is a huge challenge—we can help.


© 2018 Intel Corporation