You want to pick up Python. But it’s hard and confusing to pick up a whole new framework. You want to try and switch, but it’s too much effort and takes too much time, so you stick with MATLAB. I essentially grew up on Python, meaning I can guide you to some solid resources and hand over tips and tricks I’ve learned.
This guide aims to ease that process a bit by showing tips and tricks within Python. This guide is not a full switch-to-Python guide. There are plenty of resources for that, including some wonderful SciPy lectures, detailed guides to the same material, and Python for MATLAB users. Those links are all useful, and those links should be looked at.
For an intro to Python, including types, the scope, functions and optional keywords and syntax (string addition, etc), look at the Python docs.
But, that said, I’ll share my most valuable tips and tricks I learned from looking at the resources above. These do not serve as a complete replacement those resources! I want to emphasize that.
This would be easiest if you’re familiar with the command line. The basics
cd to navigate directories,
bash <command> to run files and
man <command> to find help, but more of the basics can be found
with this tutorial.
The land of Python has many interpreters, aligning with the Unix philosophy. But at first, it can seem confusing: you’re presented with the default python shell, bpython, IPython’s shell, notebook and QtConsole.
I most recommend IPython; they seem to be more connected with scientific computing. But which one of IPython’s shells should you use? They all have their pros and cons, but the QtConsole wins for plain interpreters. Spyder is an alternative (and IDE, meaning I haven’t used it much) out there that tries to present a MATLAB-like GUI. I do know it’s possible to have IPython’s QtConsole in Spyder.
EDIT: Apparently Spyder includes IPython’s QtConsole by default.
This is what I most highly recommend. It allows you to see plots inline. Let me repeat that: you can plot inline. To see what I mean, here’s an example:
I’ve only found one area where it’s lacking. The issue is so small, I won’t mention it.
Great for sharing results. Provides an interface friendly to reading code, LaTeX, markdown and images side-by-side. However, it’s not so great to develop code in.
Normally in Python, you have to run
attach(filename) to run an object. If you
use IPython, you have access to
%run. I’ve found it most useful for
inspecting global variables after the script has run. IPython even has other
useful tools including
%debug (debug after error occured acting like it
function?? for help on a function. The
docs on magics are handy.
My personal setup
I typically have MacVim and IPython’s QtConsole (using a special applescript to open; saves opening up iTerm.app) visible and open with an external monitor to look at documentation. A typical script look like
I can then run this script in IPython’s QtConsole with
%run script.py (using
a handy Keyboard Maestro shortcut to switch windows and
enter %run) and then can query the program, meaning I can type
z in the
QtConsole and see what
z is or even
plot(z[0,:]). This is a simple script,
but this functionality is priceless in larger and more complicated scripts.
Pylab’s goal is to bring a MATLAB-like interface to Python. They
largely succeed. With pylab, Python can almost serve as a drop-in replacement
for MATLAB. You call
x = ones(N) in MATLAB; you can do the same with pylab.
One area where it isn’t a drop-in replacement is with division. In Python 2,
1/2 == 0 through integer division and in MATLAB (and the way it should be),
1/2 == 0.5. In Python, if
int/int-->int is wanted, you can use
To present a nearly drop-in MATLAB interface, use the following code
from pylab import * is frowned upon. The Zen of Python says
Namespaces are a honking great idea – let’s use more of those!
from package import * shouldn’t be used with any package. It’s
best to use
import pylab as p but that’s kinda annoying and gets messy in
long lines with lots of function calls. I use
from pylab import *; I’m
guesing you’ll do the same. If I’m wondering if a function exists, I try
calling it and see what happens; oftentimes, I’m surprised.
Parallelism is a big deal to the scientific community: the code we have takes hours to run and we want to speed it up. Since for-loops can be sped up a ton by parallelism if each iteration is independent, there are plenty of parallelization tools out there to parallelize code, including IPython’s paralleziation toolbox.
But, this is still slightly confusing and seems like a pain to execute. I recently stumbled across on a method to parallelize a function in one line. Basically, all you do is the following:
UPDATE: see my blog post on how to parallelize easily
SymPy (+LaTeX printing!)
SymPy serves as a replacement for Mathematica (or at least it’s a close race). With SymPy, you have access to symbolic variables and can perform almost any function on them: differentiation, integration, etc. They support matrices of these symbolic variables and functions on them; it seems like a complete library.
Perhaps most attractive, you can even pretty print LaTeX or ASCII.
I haven’t used this library much, but I’m sure there are good tutorials out there.
When indexing a two-dimensional numpy array, you often use something like
array[y, x] (reversed for good reason!). The first index
y selects the
row while the second selects the column.
This makes sense because you’d normally use
x to select the element in
the 1st row and 2nd column.
x[0,1] does the same thing but drops the
unnecessary brackets. This is because Python selects the first object with the
first index. Looking at our array, the first object is another array and the
In MATLAB, indexing is 1-based but perhaps most confusingly
array[y,x] in Python. MATLAB also has a feature that allows you to select an
element based on the total number of element in an array. This is useful for
the Kroeneker product. MATLAB stacks the columns when doing this, which
is exactly the method
kron relies on. To use Kroeneker indexing in Python, I
@: Dot product operator
In any Python version <= 3.4, there’s no dot
product operator unlike MATLAB’s
*. It’s possible to multiply an array
element-wise easily through
* in Python (and
.* in MATLAB). But coming in
Python 3.5 is a new dot product operator! The choices behind
@ and the
rational are detailed in this PEP.
Until the scientific community slowly progresses towards Python 3.5, we’ll be
stuck on lowly Python 2.7. Instinct would tell you to call
to perform the dot product of $X \cdot Y \cdot Z$. But instead, you can use
x.dot(y).dot(z). Much easier and much cleaner.
This is not really related to the scientific programming process; it applies to any file, whether it be in a programming language or not (a good example: LaTeX files).
Stealing from this list, if you’ve ever
- made a change to code, realised it was a mistake and wanted to revert back?
- lost code or had a backup that was too old?
- had to maintain multiple versions of a product?
- wanted to see the difference between two (or more) versions of your code?
- wanted to prove that a particular change broke or fixed a piece of code?
- wanted to review the history of some code?
- wanted to submit a change to someone else’s code?
- wanted to share your code, or let other people work on your code?
- wanted to see how much work is being done, and where, when and by whom?
- wanted to experiment with a new feature without interfering with working code?
then you need version control. Personally, I can’t imagine doing anything
significant without source control. Whenever I’m writing a paper and working on
almost any programming project, I use
git commit -am "description" all the
time. Source control is perhaps my biggest piece of advice.
Version control is normally a bit of a pain: you normally have be familiar with the command line and (with CVS/etc) it can be an even bigger pain. Git (and it’s brother Github) are considered the easiest to use versioning tool.
They have a GUI to make version control simple. It’s simple to commit changes and roll back to changes and even branch to work on different features. It’s available for Mac, Windows and many more GUIs are available.
They even offer private licenses for users in academia. This allows you to have up to five free private code repositories online. This allows for easy collaboration and sharing (another plus: access to Github Pages). There’s a list of useful guides to getting started with Git/Github.
(shameless plug) MATLAB has a great feature that allows you to call
to have a figure update (after calling a series of plot commands). I searched
high and low for a similar syntax in Python. I couldn’t find anything but
matplotlib’s animation frameworks which didn’t jive with the global scope ease
I wanted to use. After a long and arduous search, I did find
draw(). This is simple once you know about it, but it’s a pain to find it.
So, I created python-drawnow to make this functionality easily accessible. It easily allows you to view the results of an iterative (aka for-loop) process.
As I stressed in the introduction, this guide is not meant to be a full introduction to Python; there are plenty of other tools to do that. There are many other tutorials on learning Python. These all cover the basics: syntax, scope, functions definitions, etc. And of course, the documentation is also a great place to look (NumPy, SciPy, matplotlib). Failing that, a Google/stackoverflow search will likely solve your problem. Perhaps the best part: if you find a problem in a package and fix it, you can commit your changes and make it accessible globally!