PyTorch: fast and simple
I recently came across PyTorch, a new technology prime for optimization and machine learning. The docs make it look attractive, so immediately I wondered “how does it compare with NumPy?”
Turns out it’s a pretty nice framework that’s fast and straightforward to use. I’ll detail the speed before talking about easeofuse.
Speed
The largest difference is gradient computation, and the largest potential slowdown. PyTorch automatically computes the gradient given past computations, whereas in NumPy they have to be explicitly computed. Computing gradients are part of my daily workflow, and slowness here would mean that I could not use PyTorch.
I expected NumPy to be faster while computing gradients. How could it not be? It’s been around for a long time and has been heavily optimized. It’s a mature piece of software and widely used. Because of this, I expected NumPy to be at least 2x faster than PyTorch.
PyTorch is faster. Not by a small margin like 10% (which would still be significant!) but by a whopping 8x. That’s right – an explicit NumPy gradient computation takes 8 times longer than doing more work to automatically compute the gradient with PyTorch. This is because by default PyTorch uses parallelism, something that’s not obvious how to get with NumPy (even with Anaconda).
That convinced me that PyTorch is a serious contender. I decided to verify
other less fancy computations (i.e., svd
or sqrt
) were as fast as NumPy.
The most important result from my timing comparisons is
which shows the time to compute the least squares gradient (the gradient with respect to $x$ of $\norm{y  Ax}^2_2$ when $A \in \R^{10d~\times~d}$).
This test can be run in parallel. My machine has 8 virtual cores, which is why PyTorch is 8x faster than NumPy for large $d$. If I had a GPU on my local machine this would be even faster. I could have made NumPy faster by using Numbas CUDA GPU support and my earlier post “NumPy GPU acceleration”, but I wanted to test Anaconda’s default configuration^{1}.
There are other libraries^{2} that have these same speed results – what else does PyTorch offer?
Extending PyTorch
PyTorch is not a Python binding to a monolithic C++ framework. Instead, most of the functionality is implemented as Python classes. This means that it’s easy to subclass these methods to write the code you want while having the functionality of PyTorch, and it’s easy to compare against other methods implemented in PyTorch. They even have a page titled “Extending PyTorch” in their docs!
NumPy/SciPy integration
The conversion between PyTorch tensors and NumPy arrays is simple as Tensor
the NumPy ndarray
and PyTorch Tensor
share the same memory locations
(source). This can lead to significant time savings, especially when
large arrays are used.
This means that it’s easy and fast to extend PyTorch with NumPy and SciPy. In the docs, they step through creating an extension with SciPy.
This is significant, and there are large speed benefits to this! When I compare converting to a NumPy $n\times n$ array from a Tensorflow or PyTorch tensor, I see this timing comparison:
That’s right – PyTorch is over 1000x faster than TensorFlow when converting to a 1000 $\times$ 1000 NumPy array!
This means we can use all of NumPy and SciPy without any fear of slowing our program down.
Dynamic computation graph
The biggest difference between PyTorch and other ML frameworks (Tensorflow, CNTK, MXNet, etc) is that PyTorch has a dynamic computational graph, not a static computational graph. This allows for significant ease of use.
One benefit of this is that code executes when you expect. Tensorflow uses function definitions with a asynchronous C++ library, meaning that the computational graph is defined before running. In PyTorch, the graph is defined by running. This means that PyTorch tracebacks are easy to follow as a result – they’re not an additional asynchronous traceback on top of the traceback of interest.
In my experience, it’s required to hold more mental state for Tensorflow models then with PyTorch. PyTorch has clear function arguments are because the code executes when expected. It’s not necessary to link together the input data to the model and (in my experience) there are fewer global variables.
Further benefits
 torch.multiprocessing. Similar to the standard Python multiprocessing,
but “with magical memory sharing of torch Tensors across processes.”
 They even have an example Hogwild implementation!
 torch.distributed to communicate between distributed machines.
 GPU access which can speed up code as exemplified above.
 PyTorch is memory efficient: “The memory usage in PyTorch is extremely efficient compared to Torch or some of the alternatives”, according to pytorch.org
PyTorch is already an attractive package, but they also offer
 Datasets and pretrained models at pytorch/vision
 Many examples and implementations, with a subset available at
 A strong community with a discussion board and an SO tag
Notes
 O’Reilly podcast on PyTorch, part of my motivation for checking out PyTorch
 PyTorch’s core development team has 4 members
 I think PyTorch performs reversemode autodifferentiation.
 Other autograd implementations (and inspiration for PyTorch): HIPS/autograd, twitter/torchautograd, Chainer.
 It looks like it performs reverse accumulation automatic differentiation
 PyTorch can work with tensorboard with tensorboardpytorch
 A good overview between Theano+Lasagne, PyTorch and Tensorflow on Reddit’s /r/machinelearning by /u/ajmooch
 Inspired by Chainer and similar to TensorFlow, Theano, Caffe and CNTK
 [added 20170908] fast.ai announced they’re “Introducing PyTorch for fast.ai”. Their motivation includes
 “in a recent Kaggle competition [PyTorch] was used by nearly all of the top 10 finishers”
 “Much to our surprise, we also found that many models trained quite a lot faster on pytorch than they had on Tensorflow.”
 “Because Pytorch allowed us, and our students, to use all of the flexibility and capability of regular python code to build and train neural networks, we were able to tackle a much wider range of problems.”

Which includes MKL and has other optimization (maybe Intel’s TBB?) ↩

like Tensorflow, MXNet and Theano (google trends). ↩