I list all posts below, but also include a list of favorite posts

All posts

2018

Dask recently surprised me with it’s flexibility in a recent use case, even more than the basic use detailed in a previous post. I’ll walk through my use case and the interesting problem it highlights. I’ll show a toy solution and point to the relevant parts of the Dask documentation.

• 2017

• PyTorch: fast and simple

I recently came across PyTorch, a new technology prime for optimization and machine learning. The docs make it look attractive, so immediately I wondered “how does it compare with NumPy?”

Turns out it’s a pretty nice framework that’s fast and straightforward to use. I’ll detail the speed before talking about ease-of-use.

• Holoviews interactive visualization

I often want to provide some simple interactive visualizations for this blog. I like to include visualization to give some sense of how the data change as various parameters are changed. Examples can be found in Finding sparse solutions to linear systems, Least squares and regularization, and Computer color is only kinda broken.

I have discovered a new tool, Holoviews to create these widgets. I want to create these interactive widgets for my blog, meaning I want to embed these in a static HTML page. Previously, I used Jake Vanderplas’s ipywidgets-static but in this post I’ll walk through creating a widget.

• Apple CoreML model conversion

Apple has created a new file format for machine learning models. These files can be used easily to predict, regardless of the creation process, which means that “Apple Introduces Core ML” draws an analogy between these files and PDFs. It’s possible to generate predictions with only this file, and none of the creation libraries.

Generating predictions is a pain point faced by data scientists today and often involves the underlying math. At best, this involves using training the model in Python and then calling the underlying C library in the production app.

This file format will only become widely used if easy conversion from popular machine learning libraries is possible and predictions are simple to generate. Apple made these claims during their WWDC 2017 keynote. I want to investigate their claim.

• Atmosphere and entropy

I recently learned an abstract mathematical theorem, and stumbled across a remarkably direct measure. I’ll give background to this theorem before introducing it, then I’ll show the direct measure of this theorem with physical data.

This theorem has to do with entropy, which is clouded in mystery. There are several types of entropy and, during the naming of one type, Von Neumann suggested the name “entropy” to Claude Shannon in 1948 because

In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.

• Motivation for sexual reproduction

Of course, the purpose of sexual reproduction is to perpetuate our species by having offspring. Combined with natural selection, it’s enable fit our genes to our environment quickly. Buy why is it required to have two mates to produce a single offspring? Would asexual reproduction or having 3+ parents be more advantageous?

• 2016

• Easy powerful parallel code execution and use on a UW cluster

I often have highly optimized code that I want to run independently for different parameters. For example, I might want to see how reconstruction quality varies as I change two parameters. My code takes a moderate amount of time to run, maybe 1 minute. This isn’t huge, but if I want to average performance over 5 random runs for $20^2$ different input combinations, using a naïve for-loop means about 1.5 days. Using dask.distributed, I distribute these independent jobs across different machines and different cores for a significant speedup.

• NumPy GPU acceleration

I recently had to compute many inner products with a given matrix $\Ab$ for many different vectors $\xb_i$, or $\xb_i^T \Ab \xb_i$. Each vector $\xb_i$ represents a shoe from Zappos and there are 50k vectors $\xb_i \in \R^{1000}$. This is computation took place behind a user-facing web interface and during testing had a delay of 5 minutes. This is clearly unacceptable; how can we make it faster?

• Probability of a powder day

This last spring break, I had a ton of fun! Why?

I had the good fortune of catching a powder day with powder skis this spring break! While riding the Born Free chair at Vail, I wondered what the chances of this happening in a given trip1?

1. The source for this post is available on GitHub at stsievert/powder-day-probability

• A Bayesian analysis of Clintons 6 heads

Clinton recently won 6 coin flips during an Iowa caucus. On facebook and in the news, I’ve only seen information about how unlikely this is – the chances of 6 heads are 1.56% with a fair coin.

Yes, 6 heads is unlikely but these coin flips could have occurred by chance. I mean, on the Washington Post coin flip demo, I got all heads on my 5th try. Instead, it makes more sense a different question: given we observed these 6 heads, what are the chances this coin wasn’t fair?1

1. If we were really testing to see if the coin was unfair, it’d make more sense to do hypothesis testing

• Gradient descent and physical intuition for heavy-ball acceleration with visualization

This post is a part 3 of a 3 part series: Part I, Part II, Part III.

We often make observations from some system and would like to infer something about the system parameters, and many practical problems such as the Netflix Prize can be reformulated this way. Typically, this involves making observations of the form $y = f(x)$ or $\yb = \Ab \cdot \xb$1 where $y$ is observed, $f/\Ab$ is known and $x$ is the unknown variable of interest.

Finding the true $x$ that gave us our observations $y$ involves inverting a function/matrix which can be costly time-wise and in the matrix case often impossible. Instead, methods such as gradient descent are often involved, a technique common in machine learning and optimization.

In this post, I will try to provide calculus-level intuition for gradient descent. I will also introduce and show the heavy-ball acceleration method for gradient descent2 and provide a physical interpretation.

1. I’ll use plain font/bold font for scalars/vectors (respectively) as per my notation sheet

2. With a video

• Vim syntax highlighting for Markdown, Liquid and MathJax

I write my Jekyll blog in Markdown with Vim. I often include LaTeX equations (via MathJax) in my posts and/or Liquid tags like {% include %}. MathJax equations and Liquid tags aren’t included in tpope/vim-markdown which meant that the LaTeX can mess with the syntax highlighting, as illustrated by the image below. I’ll describe a quick fix for this, resulting in the image on the right.

• 2015

• Communicating in secret, even while being watched

How might two people communicate without others even knowing they’re communicating? They could be communicating to harm some entity and are being observed by that entity.1 Because of this, they want to send a message that others can’t even detect if it’s present.

1. I’m sure you imagine more situations where other more nefarious people are communicating and know they’re being watched.

• Using the uniform random variable to generate other random variables

Since computers were invented we have spent a lot of time generating uniform random numbers. A quick search on Google Scholar for “Generating a uniform random variable” gives 850,000 results. But what if we want to generate another random variable? Maybe a Gaussian random variable or a binomial random variable? These are both extremely useful.1

1. I won’t cover this here, but the Gaussian random variable is useful almost everywhere and the binomial random variable can represent performing many tests that can either pass or fail.

• Finding sparse solutions to linear systems

This post is a part 2 of a 3 part series: Part I, Part II, Part III

We often have fewer measurements than unknowns, which happens all the time in genomics and medical imaging. For example, we might be collecting 8,000 gene measurements in 300 patients and we’d like to determine which ones are most important in determining cancer.

This means that we typically have an underdetermined system because we’re collecting more measurement than unknowns. This is an unfavorable situation – there are infinitely may solutions to this problem. However, in the case of breast cancer, biological intuition might tell us that most of the 8,000 genes aren’t important and have zero important in cancer expression.

How do we enforce that most of the variables are 0? This post will try and give intuition for the problem formulation and dig into the algorithm to solve the posed problem. I’ll use a real-world cancer dataset1 to predict which genes are important for cancer expression. It should be noted that we’re more concerned with the type of solution we obtain rather than how well it performs.

1. This data set is detailed in the section titled Predicting Breast Cancer

• Least squares and regularization

This post is part 1 of a 3 part series: Part I, Part II, Part III.

Imagine that we have a bunch of points that we want to fit a line to:

In the plot above, $y_i = a\cdot x_i + b + n_i$ where $n_i$ is random noise. We know every point $(x_i, y_i)$ and would like to estimate $a$ and $b$ in the presence of noise.

What method can we use to find an estimation for $a$ and $b$? What constraints does this method have and when does it fail? This is a classic problem in signal processing – we are given some noisy observations and would like to determine how the data was generated.

• Stepping from Matlab to Python

It’s not a big leap; it’s one small step. There’s only a little to pick up and there’s not a huge difference in use or functionality. The difference is so small you can switch and just google any conversion issues you have: they’re so small you’ll have no trouble finding the appropriate functions/syntax.

There is a wrapper package in Python with the aim of providing a Matlab-like interface that is well suited for numerical linear algebra. This package is called pylab and wraps NumPy, SciPy and matplotlib. When I use pylab, this is how similar my Python and Matlab code is:

Python even has a matrix multiplication operator! Python 3.5 introduces the matrix multiplication operator @ detailed in PEP 465. Python is remarkably well suited for developing numerical algorithms – what else does Python offer?

• Computer color is only kinda broken

When we blur red and green, we get this:

Why? We would not expect this brownish color.

• Applying eigenvalues to the Fibonacci problem

The Fibonacci problem is a well known mathematical problem that models population growth and was conceived in the 1200s. Leonardo of Pisa aka Fibonacci decided to use a recursive equation: $x_{n} = x_{n-1} + x_{n-2}$ with the seed values $x_0 = 0$ and $x_1 = 1$. Implementing this recursive function is straightforward:

• 2014

• Common mathematical misconceptions

When I heard course names for higher mathematical classes during high school and even part of college, it seemed as if they were teaching something simple that I learned back in middle school. I knew that couldn’t be the case, and three years of college have taught me otherwise.

• Simple Python Parallelism

In the scientific community, executing functions in parallel can mean hours or even days less execution time. There’s a whole array of Python parallelization toolkits, probably partially due to the competing standards issue.

Update: I’ve found joblib, a library that does the same thing as this post. Another blog post compares with Matlab and R.

• Fourier transforms and optical lenses

The Fourier transform and it’s closely related cousin the discrete time Fourier transform (computed by the FFT) is a powerful mathematical concept. It breaks an input signal down into it’s frequency components. The best example is lifted from Wikipedia.

• Speckle and lasers

We know that lasers are very accurate instruments and emit a very precise wavelength and hence are in an array of precision applications including bloodless surgery, eye surgery and fingerprint detection. So why do we see random light/dark spots when we shine a laser on anything? Shouldn’t it all be the same color since lasers are deterministic (read: not random)? To answer that question, we need to delve into optical theory.

• Scientific Python tips and tricks

You want to pick up Python. But it’s hard and confusing to pick up a whole new framework. You want to try and switch, but it’s too much effort and takes too much time, so you stick with MATLAB. I essentially grew up on Python, meaning I can guide you to some solid resources and hand over tips and tricks I’ve learned.