Tile: A New Language for Machine Learning

Nov 10, 2017 | By: Jeremy Bruestle

Comments on Hacker News

With the release of the PlaidML machine learning framework, Vertex.AI is helping make accelerated machine learning on every platform a reality. Historically the key obstacle to acceleration on a wide range of platforms has been software support, that support being constrained by the need for laborious implementation of libraries of hand-crafted software “kernels” for each processor. PlaidML takes a different approach, using a tensor manipulation language we’ve developed called Tile to automatically generate the kernels, making it many times easier to add support for GPUs and new types of processors. Our benchmarks show that this approach is competitive with existing frameworks on NVIDIA GPUs, while also extending compatibility to other common GPUs such as those from AMD and Intel.

Recently, we were asked how we wrote our accelerated kernels for the new platforms. The short answer is: we didn’t write the kernels, they are actually machine generated. Our backend produces custom kernels for each specific operation on each GPU. It does this through an intermediate language called Tile. Tile is a simple, compact language for describing machine learning operations in a way that can be efficiently implemented on parallel computing architectures. For example, a Tile matrix multiply can be written as follows:

function (A[M, L], B[L, N]) -> (C) {
    C[i, j: M, N] = +(A[i, k] * B[k, j]);

This syntax balances expressiveness and suitability for optimization to achieve broad coverage of the operations necessary to build deep neural networks. It closely resembles mathematical notation for describing linear algebra operations, and also fully support automatic differentiation. Additionally, it was designed to be highly parallelizable and to allow for analysis of issues such as cache coherency, use of shared memory (programmer managed L1 on GPUs and other devices) and memory bank conflicts.

Tile is characterized by:

PlaidML’s Keras integration uses Tile as the intermediate representation, allowing the entire Keras backend to to be written in less than 3000 lines of Python. This makes implementing new ops much easier than traditional approaches. For example, we recently received a ticket pointing out that PlaidML didn’t support dilated convolutions. Thanks to Tile it was easy to add support with some very minor changes. This prompted us to write tutorials describing how to add a new Keras backend function as well as how to write Tile code.

While the language is still subject to change, and we haven’t finished the formal specification, hopefully this is a helpful look into what’s to come. We plan use a similar approach to make PlaidML compatible with more frameworks such as TensorFlow and PyTorch to further expand each person’s freedom to use the hardware and tools most convenient for their work.

Further reading:

Social media:

© 2018 Intel Corporation