I recently came across PyTorch, a new technology prime for optimization and machine learning. The docs make it look attractive, so immediately I wondered “how does it compare with NumPy?”

Turns out it’s a pretty nice framework that’s fast and straightforward to use. I’ll detail the speed before talking about ease-of-use.


The largest difference is gradient computation, and the largest potential slow-down. PyTorch automatically computes the gradient given past computations, whereas in NumPy they have to be explicitly computed. Computing gradients are part of my daily workflow, and slowness here would mean that I could not use PyTorch.

I expected NumPy to be faster while computing gradients. How could it not be? It’s been around for a long time and has been heavily optimized. It’s a mature piece of software and widely used. Because of this, I expected NumPy to be at least 2x faster than PyTorch.

PyTorch is faster. Not by a small margin like 10% (which would still be significant!) but by a whopping 8x. That’s right – an explicit NumPy gradient computation takes 8 times longer than doing more work to automatically compute the gradient with PyTorch. This is because by default PyTorch uses parallelism, something that’s not obvious how to get with NumPy (even with Anaconda).

That convinced me that PyTorch is a serious contender. I decided to verify other less fancy computations (i.e., svd or sqrt) were as fast as NumPy. The most important result from my timing comparisons is

which shows the time to compute the least squares gradient (the gradient with respect to $x$ of $\norm{y - Ax}^2_2$ when $A \in \R^{10d~\times~d}$).

This test can be run in parallel. My machine has 8 virtual cores, which is why PyTorch is 8x faster than NumPy for large $d$. If I had a GPU on my local machine this would be even faster. I could have made NumPy faster by using Numbas CUDA GPU support and my earlier post “NumPy GPU acceleration”, but I wanted to test Anaconda’s default configuration1.

There are other libraries2 that have these same speed results – what else does PyTorch offer?

Extending PyTorch

PyTorch is not a Python binding to a monolithic C++ framework. Instead, most of the functionality is implemented as Python classes. This means that it’s easy to subclass these methods to write the code you want while having the functionality of PyTorch, and it’s easy to compare against other methods implemented in PyTorch. They even have a page titled “Extending PyTorch” in their docs!

NumPy/SciPy integration

The conversion between PyTorch tensors and NumPy arrays is simple as Tensor the NumPy ndarray and PyTorch Tensor share the same memory locations (source). This can lead to significant time savings, especially when large arrays are used.

This means that it’s easy and fast to extend PyTorch with NumPy and SciPy. In the docs, they step through creating an extension with SciPy.

This is significant, and there are large speed benefits to this! When I compare converting to a NumPy $n\times n$ array from a Tensorflow or PyTorch tensor, I see this timing comparison:

That’s right – PyTorch is over 1000x faster than TensorFlow when converting to a 1000 $\times$ 1000 NumPy array!

This means we can use all of NumPy and SciPy without any fear of slowing our program down.

Dynamic computation graph

The biggest difference between PyTorch and other ML frameworks (Tensorflow, CNTK, MXNet, etc) is that PyTorch has a dynamic computational graph, not a static computational graph. This allows for significant ease of use.

One benefit of this is that code executes when you expect. Tensorflow uses function definitions with a asynchronous C++ library, meaning that the computational graph is defined before running. In PyTorch, the graph is defined by running. This means that PyTorch tracebacks are easy to follow as a result – they’re not an additional asynchronous traceback on top of the traceback of interest.

In my experience, it’s required to hold more mental state for Tensorflow models then with PyTorch. PyTorch has clear function arguments are because the code executes when expected. It’s not necessary to link together the input data to the model and (in my experience) there are fewer global variables.

Further benefits

  • torch.multiprocessing. Similar to the standard Python multiprocessing, but “with magical memory sharing of torch Tensors across processes.”
  • torch.distributed to communicate between distributed machines.
  • GPU access which can speed up code as exemplified above.
  • PyTorch is memory efficient: “The memory usage in PyTorch is extremely efficient compared to Torch or some of the alternatives”, according to pytorch.org

PyTorch is already an attractive package, but they also offer


  1. Which includes MKL and has other optimization (maybe Intel’s TBB?)

  2. like Tensorflow, MXNet and Theano (google trends).