PyTorch: fast and simple
I recently came across PyTorch, a new technology prime for optimization and machine learning. The docs make it look attractive, so immediately I wondered “how does it compare with NumPy?”
Turns out it’s a pretty nice framework that’s fast and straightforward to use. I’ll detail the speed before talking about easeofuse.
Speed
The largest difference is gradient computation, and the largest potential slowdown. PyTorch automatically computes the gradient given past computations, whereas in NumPy they have to be explicitly computed. Computing gradients are part of my daily workflow, and slowness here would mean that I could not use PyTorch.
I expected NumPy to be faster while computing gradients. How could it not be? It’s been around for a long time and has been heavily optimized. It’s a mature piece of software and widely used. Because of this, I expected NumPy to be at least 2x faster than PyTorch.
PyTorch is faster. Not by a small margin like 10% (which would still be significant!) but by a whopping 8x. That’s right – an explicit NumPy gradient computation takes 8 times longer than doing more work to automatically compute the gradient with PyTorch. This is because by default PyTorch uses parallelism, something that’s not obvious how to get with NumPy (even with Anaconda).
That convinced me that PyTorch is a serious contender. I decided to verify
other less fancy computations (i.e., svd
or sqrt
) were as fast as NumPy.
The most important result from my timing comparisons is
which shows the time to compute the least squares gradient (the gradient with respect to $x$ of $\norm{y  Ax}^2_2$ when $A \in \R^{10d~\times~d}$).
This test can be run in parallel. My machine has 8 virtual cores, which is why PyTorch is 8x faster than NumPy for large $d$. If I had a GPU on my local machine this would be even faster. I could have made NumPy faster by using Numbas CUDA GPU support and my earlier post “NumPy GPU acceleration”, but I wanted to test Anaconda’s default configuration^{1}.
There are other libraries^{2} that have these same speed results – what else does PyTorch offer?
Extending PyTorch
PyTorch is not a Python binding to a monolithic C++ framework. Instead, most of the functionality is implemented as Python classes. This means that it’s easy to subclass these methods to write the code you want while having the functionality of PyTorch, and it’s easy to compare against other methods implemented in PyTorch. They even have a page titled “Extending PyTorch” in their docs!
NumPy/SciPy integration
The conversion between PyTorch tensors and NumPy arrays is simple as Tensor
the NumPy ndarray
and PyTorch Tensor
share the same memory locations
(source). This can lead to significant time savings, especially when
large arrays are used.
This means that it’s easy and fast to extend PyTorch with NumPy and SciPy. In the docs, they step through creating an extension with SciPy.
This is significant, and there are large speed benefits to this! When I compare converting to a NumPy $n\times n$ array from a Tensorflow or PyTorch tensor, I see this timing comparison:
That’s right – PyTorch is over 1000x faster than TensorFlow when converting to a 1000 $\times$ 1000 NumPy array!
This means we can use all of NumPy and SciPy without any fear of slowing our program down.
Dynamic computation graph
The biggest difference between PyTorch and other ML frameworks (Tensorflow, CNTK, MXNet, etc) is that PyTorch has a dynamic computational graph, not a static computational graph. This allows for significant ease of use.
One benefit of this is that code executes when you expect. With dynamic computation graphs, tracebacks are easy to follow and they can use control flow as expected. Libraries that have a static computation graph have to define their own control flow; they need to implement control flow. For example, see Tensorflow’s control flow docs or an SO question on difficulty on timing in Tensorflow
In my experience, it’s required to hold more mental state for Tensorflow models then with PyTorch. PyTorch has clear function arguments are because the code executes when expected. It’s not necessary to link together the input data to the model and (in my experience) there are fewer global variables.
Further benefits
 torch.multiprocessing. Similar to the standard Python multiprocessing,
but “with magical memory sharing of torch Tensors across processes.”
 They even have an example Hogwild implementation!
 torch.distributed to communicate between distributed machines.
 GPU access which can speed up code as exemplified above.
 PyTorch is memory efficient: “The memory usage in PyTorch is extremely efficient compared to Torch or some of the alternatives”, according to pytorch.org.
PyTorch is already an attractive package, but they also offer
 Datasets and pretrained models at pytorch/vision
 Many examples and implementations, with a subset available at
 A strong community with a discussion board and an SO tag
Notes
 O’Reilly podcast on PyTorch, part of my motivation for checking out PyTorch
 PyTorch’s core development team has 4 members
 I think PyTorch performs reversemode autodifferentiation.
 Other autograd implementations (and inspiration for PyTorch): HIPS/autograd, twitter/torchautograd, Chainer.
 It looks like it performs reverse accumulation automatic differentiation
 PyTorch can work with tensorboard with tensorboardpytorch
 A good overview between Theano+Lasagne, PyTorch and Tensorflow on Reddit’s /r/machinelearning by /u/ajmooch
 Inspired by Chainer and similar to TensorFlow, Theano, Caffe and CNTK
 fast.ai announced they’re “Introducing PyTorch for fast.ai” (and added
20170908). Their motivation includes
 “in a recent Kaggle competition [PyTorch] was used by nearly all of the top 10 finishers”
 “Much to our surprise, we also found that many models trained quite a lot faster on pytorch than they had on Tensorflow.”
 “Because Pytorch allowed us, and our students, to use all of the flexibility and capability of regular python code to build and train neural networks, we were able to tackle a much wider range of problems.”
 Chainer has a good comparison of many deep learning frameworks including PyTorch, Tensorflow and MXNet: http://chainer.readthedocs.io/en/latest/comparison.html (added 20170919)
 If you care about speed (added 20170919)…
 A speed comparison between many different frameworks can be found at soumith/convnetbenchmarks. This measures many different frameworks, but notably Torch and not PyTorch (but this tweet by a PyTorch core dev says they both call the same C libraries).
 Some anecdotal evidence (tensorflow#9322, tensorflow#7065, PyTorch forum thread) points to PyTorch being faster than Tensorflow.

Which includes MKL and has other optimization (maybe Intel’s TBB?) ↩

like Tensorflow, MXNet and Theano (google trends). ↩