Autodiff Workshop

Abstract

The calculation of gradients and other forms of derivatives is a core part of machine learning, computer vision, and physical simulation. But the manual creation of derivatives is prone to error and requires a high "mental overhead" for practitioners in these fields. However, the process of taking derivatives is actually the highly mechanical application of the chain rule and can be computed using formal techniques such as automatic or symbolic differentiation. A family of "autodiff" approaches exist, each with their own particular strengths and trade-offs.

In the ideal case, automatically generated derivatives should be competitive with manually generated ones and run at near-peak performance on modern hardware, but the most expressive systems for autodiff which can handle arbitrary, Turing-complete programs, are unsuited for performance-critical applications, such as large-scale machine learning or physical simulation. Alternatively, the most performant systems are not designed for use outside of their designated application space, e.g. graphics or neural networks.

This workshop will bring together developers and researchers of state-of-the-art solutions to generating derivatives automatically and discuss ways in which these solutions can be evolved to be both more expressive and achieve higher performance. Topics for discussion will include:

Whether it is feasible to create a single differentiable programming language, or if we will always have separate solutions for different fields such as vision and ML.
What are the primitive data types of a differentiable language? N-dimensional arrays are useful for many machine learning applications, but other domains make use of graph types and sparse matrices.
What are the challenges in elevating an expressive autodiff implementation from just a “prototyping language” to one used directly in performance-critical industrial settings?
A shared representation of programs like LLVM IR has transformed programming language and compiler research. Is there any benefit to a common representation of differentiable programs that would enable shared tooling amongst autodiff libraries and implementations?

Schedule

The videos for the talks are now available, see this playlist or the individual links below.

Time	Activity
9:00-9:10 AM	Introduction and opening remarks
9:10-9:40 AM	Barak A. Pearlmutter – Automatic Differentiation: History and Headroom [slides] [video]
9:40-9:50 AM	Discussion
9:50-10:20 AM	Jeff Dean – TensorFlow: Future Directions for Simplifying Large-Scale Machine Learning [slides] [video]
10:20-10:30AM	Discussion
10:30-11:00AM	Coffee break
11:00-11:30AM	David Duvenaud – No more mini-languages: The power of autodiffing full-featured Python [slides] [video]
11:30-11:40AM	Discussion
11:40AM-2:30PM	Lunch break
2:30-3:00PM	Yoshua Bengio – Credit assignment: beyond backpropagation [slides] [video]
3:00-3:10PM	Discussion
3:10-3:40PM	Matthew Johnson – Autodiff writes your exponential family inference code [slides] [video]
3:40-3:50PM	Discussion
3:50-4:30PM	Coffee break
4:30-5:00PM	Jeffrey M. Siskind – The tension between convenience and performance in automatic differentiation [slides] [video]
5:00-5:10PM	Discussion
5:10-6:10PM	Panel discussion
6:10PM	End

About the speakers

Matthew Johnson, Google Brain
David Duvenaud, University of Toronto
Yoshua Bengio, MILA, Université de Montréal
Barak A. Pearlmutter, Maynooth University
Jeff Dean, Google Brain
Jeffrey M. Siskind, Purdue University

About us

Alex Wiltschko (@alexbw) is formerly a research engineer at Twitter, and a core developer of Torch Autograd, an automatic differentiation library used for both research and production at Twitter. Previously, he completed his PhD in Neurobiology at Harvard, focusing on quantifying behavior and body language using depth cameras and nonparametric time-series modeling.

Zach DeVito (@zdevito) is a Postdoc at Stanford. His work applies techniques from programming languages and compilers to make high-performance programming easier and useable by a wider audience. He has worked on domain-specific languages for physical simulation, statistics, and image processing. In addition, he created the Terra meta-programming language that makes building high performance domain-specific languages easier.

Frédéric Bastien (@nouiz) is team lead - software infrastructure at the Montreal Institute of Learning Algorithms, Canada (MILA) and lead developer for the Theano library. In 2007, he finished an M.S. in computer architectures at University of Montreal and has since worked at MILA (formerly LISA lab).

Pascal Lamblin (@lamblin) is a software analyst at MILA (Montreal Institute for Learning Algorithms). After completing an engineering degree at École Centrale Paris, he has done some research under the supervision of Yoshua Bengio at Université de Montréal, and is now working on the development of Theano.

This workshop generally stems from prior workshops on tooling in machine learning, such as:

The Big Learning workshops from 2011-12-13, http://biglearn.org/
Its successor Machine Learning Systems (http://learningsys.org/) 2015

However, our focus shifts from specific infrastructural and engineering challenges towards the most enabling programming abstractions in machine learning.