I list all posts below, but also include a list of favorite posts

Launching tasks from workers with Dask
Jan 4, 2018 in systems distributed dask
Dask recently surprised me with it’s flexibility in a recent use case, even more than the basic use detailed in a previous post. I’ll walk through my use case and the interesting problem it highlights. I’ll show a toy solution and point to the relevant parts of the Dask documentation.
Read on →

PyTorch: fast and simple
Sep 7, 2017 in optimization computing speed framework
I recently came across PyTorch, a new technology prime for optimization and machine learning. The docs make it look attractive, so immediately I wondered “how does it compare with NumPy?”
Turns out it’s a pretty nice framework that’s fast and straightforward to use. I’ll detail the speed before talking about easeofuse.
Read on →

Holoviews interactive visualization
Jul 22, 2017 in datavisualization web
I often want to provide some simple interactive visualizations for this blog. I like to include visualization to give some sense of how the data change as various parameters are changed. Examples can be found in Finding sparse solutions to linear systems, Least squares and regularization, and Computer color is only kinda broken.
I have discovered a new tool, Holoviews to create these widgets. I want to create these interactive widgets for my blog, meaning I want to embed these in a static HTML page. Previously, I used Jake Vanderplas’s ipywidgetsstatic but in this post I’ll walk through creating a widget.
Read on →

Apple CoreML model conversion
Jun 11, 2017 in apple ios machinelearning
Apple has created a new file format for machine learning models. These files can be used easily to predict, regardless of the creation process, which means that “Apple Introduces Core ML” draws an analogy between these files and PDFs. It’s possible to generate predictions with only this file, and none of the creation libraries.
Generating predictions is a pain point faced by data scientists today and often involves the underlying math. At best, this involves using training the model in Python and then calling the underlying C library in the production app.
This file format will only become widely used if easy conversion from popular machine learning libraries is possible and predictions are simple to generate. Apple made these claims during their WWDC 2017 keynote. I want to investigate their claim.
Read on →

Atmosphere and entropy
Apr 9, 2017 in probability informationtheory
I recently learned an abstract mathematical theorem, and stumbled across a remarkably direct measure. I’ll give background to this theorem before introducing it, then I’ll show the direct measure of this theorem with physical data.
This theorem has to do with entropy, which is clouded in mystery. There are several types of entropy and, during the naming of one type, Von Neumann suggested the name “entropy” to Claude Shannon in 1948 because
In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.

Motivation for sexual reproduction
Mar 11, 2017 in reproduction informationtheory
Of course, the purpose of sexual reproduction is to perpetuate our species by having offspring. Combined with natural selection, it’s enable fit our genes to our environment quickly. Buy why is it required to have two mates to produce a single offspring? Would asexual reproduction or having 3+ parents be more advantageous?
Read on →

Easy powerful parallel code execution and use on a UW cluster
Sep 9, 2016 in python numpy speed parallel dask
I often have highly optimized code that I want to run independently for different parameters. For example, I might want to see how reconstruction quality varies as I change two parameters. My code takes a moderate amount of time to run, maybe 1 minute. This isn’t huge, but if I want to average performance over 5 random runs for $20^2$ different input combinations, using a naïve forloop means about 1.5 days. Using dask.distributed, I distribute these independent jobs across different machines and different cores for a significant speedup.
Read on →

NumPy GPU acceleration
Jul 1, 2016 in python numpy gpu speed parallel
I recently had to compute many inner products with a given matrix $\Ab$ for many different vectors $\xb_i$, or $\xb_i^T \Ab \xb_i$. Each vector $\xb_i$ represents a shoe from Zappos and there are 50k vectors $\xb_i \in \R^{1000}$. This is computation took place behind a userfacing web interface and during testing had a delay of 5 minutes. This is clearly unacceptable; how can we make it faster?
Read on →

Probability of a powder day
May 22, 2016 in skiing powder probability
This last spring break, I had a ton of fun! Why?
I had the good fortune of catching a powder day with powder skis this spring break! While riding the Born Free chair at Vail, I wondered what the chances of this happening in a given trip^{1}?

The source for this post is available on GitHub at stsievert/powderdayprobability ↩


A Bayesian analysis of Clintons 6 heads
Feb 26, 2016 in probability statistics bayesian
Clinton recently won 6 coin flips during an Iowa caucus. On facebook and in the news, I’ve only seen information about how unlikely this is – the chances of 6 heads are 1.56% with a fair coin.
Yes, 6 heads is unlikely but these coin flips could have occurred by chance. I mean, on the Washington Post coin flip demo, I got all heads on my 5th try. Instead, it makes more sense a different question: given we observed these 6 heads, what are the chances this coin wasn’t fair?^{1}

If we were really testing to see if the coin was unfair, it’d make more sense to do hypothesis testing ↩


Gradient descent and physical intuition for heavyball acceleration with visualization
Jan 30, 2016 in machinelearning optimization linearalgebra
This post is a part 3 of a 3 part series: Part I, Part II, Part III.
We often make observations from some system and would like to infer something about the system parameters, and many practical problems such as the Netflix Prize can be reformulated this way. Typically, this involves making observations of the form $y = f(x)$ or $\yb = \Ab \cdot \xb$^{1} where $y$ is observed, $f/\Ab$ is known and $x$ is the unknown variable of interest.
Finding the true $x$ that gave us our observations $y$ involves inverting a function/matrix which can be costly timewise and in the matrix case often impossible. Instead, methods such as gradient descent are often involved, a technique common in machine learning and optimization.
In this post, I will try to provide calculuslevel intuition for gradient descent. I will also introduce and show the heavyball acceleration method for gradient descent^{2} and provide a physical interpretation.

I’ll use plain font/bold font for scalars/vectors (respectively) as per my notation sheet. ↩


Vim syntax highlighting for Markdown, Liquid and MathJax
Jan 6, 2016 in vim syntaxhighlighting jekyll mathjax
I write my Jekyll blog in Markdown with Vim. I often include LaTeX equations (via MathJax) in my posts and/or Liquid tags like
Read on →{% include %}
. MathJax equations and Liquid tags aren’t included in tpope/vimmarkdown which meant that the LaTeX can mess with the syntax highlighting, as illustrated by the image below. I’ll describe a quick fix for this, resulting in the image on the right.

Communicating in secret, even while being watched
Dec 13, 2015 in math communication images
How might two people communicate without others even knowing they’re communicating? They could be communicating to harm some entity and are being observed by that entity.^{1} Because of this, they want to send a message that others can’t even detect if it’s present.

I’m sure you imagine more situations where other more nefarious people are communicating and know they’re being watched. ↩


Using the uniform random variable to generate other random variables
Dec 9, 2015 in math probability
Since computers were invented we have spent a lot of time generating uniform random numbers. A quick search on Google Scholar for “Generating a uniform random variable” gives 850,000 results. But what if we want to generate another random variable? Maybe a Gaussian random variable or a binomial random variable? These are both extremely useful.^{1}

I won’t cover this here, but the Gaussian random variable is useful almost everywhere and the binomial random variable can represent performing many tests that can either pass or fail. ↩


Finding sparse solutions to linear systems
Dec 9, 2015 in math machinelearning optimization
This post is a part 2 of a 3 part series: Part I, Part II, Part III
We often have fewer measurements than unknowns, which happens all the time in genomics and medical imaging. For example, we might be collecting 8,000 gene measurements in 300 patients and we’d like to determine which ones are most important in determining cancer.
This means that we typically have an underdetermined system because we’re collecting more measurement than unknowns. This is an unfavorable situation – there are infinitely may solutions to this problem. However, in the case of breast cancer, biological intuition might tell us that most of the 8,000 genes aren’t important and have zero important in cancer expression.
How do we enforce that most of the variables are 0? This post will try and give intuition for the problem formulation and dig into the algorithm to solve the posed problem. I’ll use a realworld cancer dataset^{1} to predict which genes are important for cancer expression. It should be noted that we’re more concerned with the type of solution we obtain rather than how well it performs.

This data set is detailed in the section titled Predicting Breast Cancer ↩


Least squares and regularization
Nov 19, 2015 in math machinelearning optimization
This post is part 1 of a 3 part series: Part I, Part II, Part III.
Imagine that we have a bunch of points that we want to fit a line to:
In the plot above, $y_i = a\cdot x_i + b + n_i$ where $n_i$ is random noise. We know every point $(x_i, y_i)$ and would like to estimate $a$ and $b$ in the presence of noise.
What method can we use to find an estimation for $a$ and $b$? What constraints does this method have and when does it fail? This is a classic problem in signal processing – we are given some noisy observations and would like to determine how the data was generated.
Read on →

Stepping from Matlab to Python
Sep 1, 2015 in python
It’s not a big leap; it’s one small step. There’s only a little to pick up and there’s not a huge difference in use or functionality. The difference is so small you can switch and just google any conversion issues you have: they’re so small you’ll have no trouble finding the appropriate functions/syntax.
There is a wrapper package in Python with the aim of providing a Matlablike interface that is well suited for numerical linear algebra. This package is called pylab and wraps NumPy, SciPy and matplotlib. When I use pylab, this is how similar my Python and Matlab code is:
Python even has a matrix multiplication operator! Python 3.5 introduces the matrix multiplication operator
Read on →@
detailed in PEP 465. Python is remarkably well suited for developing numerical algorithms – what else does Python offer?

Computer color is only kinda broken
Apr 23, 2015 in images imagecompression dsp

Applying eigenvalues to the Fibonacci problem
Jan 31, 2015 in math linearalgebra
The Fibonacci problem is a well known mathematical problem that models population growth and was conceived in the 1200s. Leonardo of Pisa aka Fibonacci decided to use a recursive equation: $x_{n} = x_{n1} + x_{n2}$ with the seed values $x_0 = 0$ and $x_1 = 1$. Implementing this recursive function is straightforward:
Read on →

Common mathematical misconceptions
Jul 31, 2014 in math
When I heard course names for higher mathematical classes during high school and even part of college, it seemed as if they were teaching something simple that I learned back in middle school. I knew that couldn’t be the case, and three years of college have taught me otherwise.
Read on →

Simple Python Parallelism
Jul 30, 2014 in python parallel speed
In the scientific community, executing functions in parallel can mean hours or even days less execution time. There’s a whole array of Python parallelization toolkits, probably partially due to the competing standards issue.
Update: I’ve found
Read on →joblib
, a library that does the same thing as this post. Another blog post compares with Matlab and R.

Fourier transforms and optical lenses
The Fourier transform and it’s closely related cousin the discrete time Fourier transform (computed by the FFT) is a powerful mathematical concept. It breaks an input signal down into it’s frequency components. The best example is lifted from Wikipedia.
Read on →

Speckle and lasers
We know that lasers are very accurate instruments and emit a very precise wavelength and hence are in an array of precision applications including bloodless surgery, eye surgery and fingerprint detection. So why do we see random light/dark spots when we shine a laser on anything? Shouldn’t it all be the same color since lasers are deterministic (read: not random)? To answer that question, we need to delve into optical theory.
Read on →

Scientific Python tips and tricks
May 15, 2014 in python
You want to pick up Python. But it’s hard and confusing to pick up a whole new framework. You want to try and switch, but it’s too much effort and takes too much time, so you stick with MATLAB. I essentially grew up on Python, meaning I can guide you to some solid resources and hand over tips and tricks I’ve learned.
Read on →

Predicting the weather
Nov 14, 2013 in math probability
Let’s say that we’re accurately measuring the temperature in both Madison and Minneapolis, but then our temperature sensor in Minneapolis breaks. We could easily install a new sensor, but we would prefer to estimate the temperature in Minneapolis based on the temperature in Madison.
Read on →