When I heard course names for higher mathematical classes during high school and even part of college, it seemed as if they were teaching something simple that I learned back in middle school. I knew that couldn’t be the case, and three years of college have taught me otherwise.

N dimensions

Generalizing to $N$ dimensions is often seen as a pointless mathematical exercise because of a language confusion. Space exists in three dimensions; we just need three elements or a three dimensional vector to describe any point. Because of this, we say we live in three dimensions. So isn’t generalizing to an $N$ dimensions pointless?

Let’s go back to vectors. An $N$ dimensional vector is just one with $N$ components.1 Thinking about a grid or 2D graph, we need two numbers to describe any point on that grid, $x$ and $y$. That’s a two dimensional vector. For the four dimensions we live in, we need four: $x, y, z, t$.

But wait. Why can’t we define a five dimensional vector that includes my age or a six dimensional vector that also includes my year in school? Thinking about this in the real world, we have data that has $N$ components all the time. Images on my computer have $N$ pixels and my GPS records $N$ locations. Almost anything discrete is an $N$ dimensional vector.

This pops up all the time in image processing, machine learning and almost anywhere that uses linear algebra or a computer. All the vectors and matrices I’ve seen have $N$ components, a discrete number. Computers only have a finite number of bits2 meaning computers must also be discrete. That means if we want to do anything fancy with a computer, we need to use linear algebra.

Linear algebra

At first glance, linear algebra seems like middle school material. In middle school you might have seen $y=m\cdot x+b$ (more on this linear function later). In linear algebra, you might see $\bm{y}=\bm{A}\cdot \bm{x}+\bm{b}$ where $\bm{y}$, $\bm{x}$, and $\bm{b}$ are vectors and $\bm{A}$ is a matrix.3

A matrix is just nothing more than a collection of vectors and 2D grid of numbers. For example, if you have $M$ students with $N$ features (age, weight, GPA, etc), you can collect them by simply stacking the vectors and make an $M\times N$ matrix.

We’ve defined both vectors and matrices, but what about basic operations? How do we define addition and multiplication? Addition works element-wise, but multiplication is a bit more interesting.

Matrix multiplication is defined rather simply, but that doesn’t give you any intuition. You can think of matrix multiplication as linear combinations of the rows and columns, but that still doesn’t tell what matrix multiplication means.

Intuitively, we know that we can transform any matrix into any other matrix because we can arbitrarily choose numbers. But wait. In middle school we learned that a function could transform any number into any other number. Matrices are then analogous to functions!

For example, let’s say we have a set of inputs $\bm{x} = [1,~2,~3,~4]^T$ and we want to perform the function $y = f(x) = 2\cdot x + 1$ on each element in our vector. In matrix notation, we can just do

$$ \bm{y} = \begin{bmatrix} 2&0&0&0 \\ 0&2&0&0 \\ 0&0&2&0 \\ 0&0&0&2 \end{bmatrix} \cdot \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \end{bmatrix} + \begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \end{bmatrix} $$

With these functions, we can perform even more powerful actions. We can easily swap elements or perform different actions on different elements. It gets even more powerful but still relatively simple to an experienced mathematician. We can find when a matrix or function simply scales a vector by finding the eigenvalues and eigenvectors or use the inverse to find a discrete analog to $f^{-1}(x)$.

Linear functions

A linear function seems like the most boring function. It’s just a straight line, right? It is, but English got confused and it also means when some function obeys the superposition principle. Concisely in math, it’s when $f(\alpha x+\beta y)=\alpha\cdot f(x)+\beta\cdot f(y)$.

This means that a general linear function is not linear. If $f(x) = mx+b$, $f(x+y) = f(x)+f(y)-b$ which doesn’t satisfy the definition of a linear function. But there’s much more interesting functions out there. Integration and differentiation respect scalar multiplication and addition, meaning they’re linear.

Since a host of other functions depend on integration and differentiation, so many of these functions (but not all!) are also linear.4 The discrete Fourier transform (computed by fft), wavelet transform and a host of other functions are linear.

Addition and scalar multiplication are defined element-wise for matrices, so any linear function can be represented by a matrix. There are matrices for computing the fft and wavelet transform. While unused as they require $O(n^2)$ operations, they exist. Exploiting specific properties, mathematicians are often able to push the number of operations down to $O(n\log(n))$.

Linear functions are perceived as boring. If you know $\bm{y}$ in $\bm{y}=\bm{Ax}$ for some known5 matrix $\bm{A}$, you can easily find $\bm{x}$ by left-multipying with $\bm{A}^{-1}$. This might be expensive time-wise, but an exact solution is computable.

Nonlinear functions have a unique property that an exact and closed form solution often can’t be found. This means that no combination of elementary function like $\sin(\cdot), \exp(\cdot), \sqrt{\cdot},\int \cdot~dx, \frac{d~\cdot}{dx}$ along with respective operators and the infinitely many real numbers can describe every solution.6 Instead, those elementary functions can only describe the equation that needs to be solved.

Even a “simple” problem such as describing the motion of a pendulum is nonlinear and an exact solution can’t be found, even for the most simple case. Getting even more complex, there’s a whole list of nonlinear parital differential equations that solve important problems.7

This is why simulation is such a big deal. No closed form solution can be found meaning that you have to find a solution numerically. Supercomputers don’t exist to for mild speed boosts or storage requirements that machine learning or “big data”8 seems to require. No, supercomputers exist because with a nonlinear problem, a simulation has to be run to get any result. Running one of these simulations on my personal computer would take years if not decades.

When you hand this type of problem to an experienced mathematician, they won’t look for a solution. Instead, they’ll look for bounds and guarantees on a solution they work out. If they just came up with a solution, they wouldn’t even know how good it is! So, they’ll try to bound the error and look for ways to lower that error.

When I started college, I was under the impression that my later career would involve long and messy exact solutions. In my second and third years, I came to realize that those messy solutions didn’t exist and instead we used extremely elegant math to find exact solutions. In the last couple months9, I have come to realize that while simple math might describe the problem, there’s no closed form solution and the only way to get a solution to “interesting” problems is with a computer.

  1. And hence an ndarray or “N dimensional array” in NumPy 

  2. When computers can have an uncountably infinite number of bits they can be continuous and I’ll eat my hat 

  3. For the rest of this post, vectors will be described with bold mathematical notation, matrices like vectors but capitalized, and scalars with regular math font. 

  4. That is if they only depend on the simplest forms of integration and differentiation 

  5. And matrix completion handles when $A$ isn’t known. This is analgous to finding an unknown scalar function 

  6. It should be noted that occaisonally a closed form solution can be found assuming certain conditions apply 

  7. And these problems are wayyy over my head 

  8. Yeah it’s a buzz word. For example, satellites that collect 2 billion measurements a day. That’s a lot of data, but analyzing it and using it is not “big data.” 

  9. Or even as I write this blog post. I’ve known this for a while but this is a new light