We've Been Taught Matrices Wrong: Relearning Matrices as Linear Function...

Many of us have probably encountered matrices at some point in math. They’re these tables with seemingly byzantine rules for combining them. Take the first element of the first row, first element of the first column, multiply them together, then add, then spin three times fast… And this is to say nothing for the rules of inverting.

6R3uiuv.png!web How did someone come up with this?

We all probably memorized these rules at some point, but what’s really going on underneath it all? How did someone actually come up with these rules and the very idea of matrices? Through this post, we’ll aim to answer these questions through deriving matrices ourselves and seeing their underlying intuition.

Representing Functions

Before we jump into matrices, we need to start by understanding functions and how we represent them. Functions can be thought of as rules that take in an input and return an output.

One simple way to represent a function is to have a table that takes every input and displays the function’s output. For instance, for a function f f we could write:

x x f ( x ) f(x) 1 2 2 4 3 6

This is a perfectly fine representation, although very verbose (the table would include every input and output combination). Can we create a simpler representation?

Yes: f ( x ) = 2 x f(x) = 2x f ( x ) = 2 x . This representation, a polynomial, is not only concise, but powerful.

How? Let’s say I have another function g ( x ) = 1 0 x g(x) = 10x g ( x ) = 1 0 x , and someone asks, “what happens if I apply f f , followed by g g ?”.

Before I had this representation, I would have had to make a full table where I write out f(x), and then apply g(x) on it per input like below:

x x f ( x ) f(x) g ( f ( x ) ) g(f(x)) g ( f ( x ) ) 1 2 20 2 4 40 3 6 60

Yikes. This is tedious.

But now that we have the polynomial representation, we can just do g ( x ) = 1 0 x g(x) = 10x g ( x ) = 1 0 x g ( f ( x ) ) = 1 0 ⋅ f ( x ) g(f(x)) = 10\cdot f(x) g ( f ( x ) ) = 1 0 ⋅ f ( x ) g ( f ( x ) ) = 2 0 x g(f(x)) = 20x g ( f ( x ) ) = 2 0 x

Using simple steps, we can figure out g ( f ( x ) ) g(f(x)) g ( f ( x ) ) and describe the composition function for any arbitrary input. The same is true of g ( x ) + f ( x ) g(x) + f(x) g ( x ) + f ( x ) etc.

In short, the right representation allows us to quickly understand, combine, and process functions.

Linear Maps

Let’s now focus on a specific type of functions that form the foundations of matrices: Linear Maps . Linear Maps are functions that have a few special properties. Here’s a simple example of a linear map:

f ( x ) = 3 x f(x) = 3x f ( x ) = 3 x for real numbers x x .

The special properties of these functions are:

f ( c ⋅ x ) = c ⋅ f ( x ) f(c \cdot x) = c \cdot f(x) f ( c ⋅ x ) = c ⋅ f ( x )
- You can multiply by a scalar before or after applying the function and get the same result.
- Our example function:
  - f ( 5 ⋅ x ) = 3 ⋅ 5 x = 1 5 x = 5 ⋅ f ( x ) f(5\cdot x) = 3 \cdot 5x = 15x = 5 \cdot f(x) f ( 5 ⋅ x ) = 3 ⋅ 5 x = 1 5 x = 5 ⋅ f ( x ) .
f ( x + y ) = f ( x ) + f ( y ) f(x+y) = f(x) + f(y) f ( x + y ) = f ( x ) + f ( y )
- You can add before or after applying the function and get the same result.
- Our example function:
  - f ( x + y ) = 3 ⋅ ( x + y ) = 3 x + 3 y = f ( x ) + f ( y ) f(x + y) = 3 \cdot (x + y) = 3x + 3y = f(x) + f(y) f ( x + y ) = 3 ⋅ ( x + y ) = 3 x + 3 y = f ( x ) + f ( y ) .

Early mathematicians realized that such functions can model a lot of the phenomena that happen in the real world from physics, to statistics. In fact, geometrically, all linear maps can be thought of as rotations and scalings :

Rotations
- Functions that rotate shapes around a point.
Scalings
- Functions that takes a 2d image and scale their width and/or height.

Representing Linear Maps

Now that we’ve understood linear maps, how can we represent succinctly in a way that makes them easy to combine and use?

1-Dimension

Let’s start with linear maps that take in real numbers and return real numbers (all in 1 dimension - the real line).

6zAB3uv.png!web Let's start with linear maps that take inputs on the real line and return outputs on the real line

These linear transforms can always be written as f ( x ) = m x f(x) = mx f ( x ) = m x for some m m . Pretty simple!

2-Dimensions

But what about in 2 dimensions (i.e. the input is a vector such as [ 1 0 ] \textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}} and the output is also a vector in 2D)?

Let’s say f ( x ) f(x) is a scaling function that stretches its input width by 3x and its height by 5x. You can see it visually below:

myy6Bzn.png!web The linear map f f scales the purple rectangle into the black rectangle.

Let’s just start by describing what the function does to two very simple vectors [ 1 0 ] \textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}} and [ 0 1 ] \textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}} .

We define:

f ( [ 1 0 ] ) = [ 3 0 ] f(\textcolor{blue}{\begin{bmatrix}1 \\0\end{bmatrix}}) = \begin{bmatrix} 3 \\ 0\end{bmatrix} f ( [ 1 0 ] ) = [ 3 0 ] f ( [ 0 1 ] ) = [ 0 5 ] f(\textcolor{#228B22}{\begin{bmatrix}0 \\1\end{bmatrix}}) = \begin{bmatrix}0 \\5\end{bmatrix} f ( [ 0 1 ] ) = [ 0 5 ]

Based on this, can we figure out f [ 1 0 9 ] f\begin{bmatrix} \textcolor{blue}{10} \\ \textcolor{#228B22}{9}\end{bmatrix} f [ 1 0 9 ] is? Amazingly, yes!

The below diagram is going to show our approach visually:

m6jmEbb.png!web We will decompose [ 1 0 9 ] \begin{bmatrix} \textcolor{blue}{10} \\ \textcolor{#228B22}{9}\end{bmatrix} into a combination of [ 1 0 ] ) \textcolor{blue}{\begin{bmatrix} 1 \\ 0\end{bmatrix}}) and [ 0 1 ] ) \textcolor{#228B22}{\begin{bmatrix} 0 \\1\end{bmatrix})} (the two vectors which we know the value of f f for).

More precisely, here’s how we find the final value:

f ( [ 1 0 9 ] ) = f ( [ 1 0 0 ] ) + f ( [ 0 9 ] ) f(\begin{bmatrix} \textcolor{blue}{10} \\ \textcolor{#228B22}{9}\end{bmatrix}) = f(\begin{bmatrix} \textcolor{blue}{10} \\ 0\end{bmatrix}) + f(\begin{bmatrix} 0 \\ \textcolor{#228B22}{9}\end{bmatrix}) f ( [ 1 0 9 ] ) = f ( [ 1 0 0 ] ) + f ( [ 0 9 ] )

= 1 0 ⋅ f ( [ 1 0 ] ) + 9 ⋅ f ( [ 0 1 ] ) = 10 \cdot f( \textcolor{blue}{\begin{bmatrix}1 \\0\end{bmatrix})} + 9 \cdot f(\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix})} = 1 0 ⋅ f ( [ 1 0 ] ) + 9 ⋅ f ( [ 0 1 ] ) = 1 0 ⋅ [ 3 0 ] + 9 ⋅ [ 0 5 ] = 10 \cdot \textcolor{blue}{\begin{bmatrix} 3 \\ 0 \end{bmatrix}} + 9\cdot \textcolor{#228B22}{\begin{bmatrix} 0 \\ 5 \end{bmatrix}} = 1 0 ⋅ [ 3 0 ] + 9 ⋅ [ 0 5 ] = [ 3 0 0 ] + [ 0 4 5 ] = \textcolor{blue}{\begin{bmatrix} 30 \\ 0 \end{bmatrix}} + \textcolor{#228B22}{\begin{bmatrix} 0 \\ 45 \end{bmatrix}} = [ 3 0 0 ] + [ 0 4 5 ] = [ 3 0 4 5 ] = \begin{bmatrix} 30 \\ 45 \end{bmatrix} = [ 3 0 4 5 ]

So we’re able to find f ( [ 1 0 9 ] ) f(\begin{bmatrix}\textcolor{blue}{10} \\ \textcolor{#228B22}{9}\end{bmatrix}) f ( [ 1 0 9 ] ) by just knowing f ( [ 1 0 ] ) f(\textcolor{blue}{\begin{bmatrix} 1 \\ 0\end{bmatrix})} f ( [ 1 0 ] ) and f ( [ 0 1 ] ) f(\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1\end{bmatrix}}) f ( [ 0 1 ] ) .

In fact this is true more generally - we can represent the function entirely by what it does to the vectors [ 1 0 ] \textcolor{blue}{\begin{bmatrix}1 \\0\end{bmatrix}} and [ 0 1 ] \textcolor{#228B22}{\begin{bmatrix}0 \\1 \end{bmatrix}} and use that to find all potential outputs!

Matrices

So is there a way I can quickly say what a linear map does to [ 1 0 ] \textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}} and [ 0 1 ] \textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}} ?

Yes! A simple 2x2 matrix!

f ( x ) f(x) (as we defined it in the previous section) can be represented by the notation [ f ( [ 1 0 ] ) f ( [ 0 1 ] ) ] \begin{bmatrix} f(\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}) & f(\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}}) \end{bmatrix} [ f ( [ 1 0 ] ) f ( [ 0 1 ] ) ] = [ 3 0 0 5 ] = \begin{bmatrix} \textcolor{blue}{3} & \textcolor{#228B22}{0} \\ \textcolor{blue}{0} & \textcolor{#228B22}{5} \end{bmatrix} = [ 3 0 0 5 ]

This is extremely cool - we can describe the entire function and how it operates on an infinite number of points by a little 4 value table.

So just like whenever you see f ( x ) = 3 x f(x) = 3x f ( x ) = 3 x , you immediately know you’re dealing with a function, when you see a matrix, know that you are dealing with a linear map.

Example of Creating a Matrix Representation

Let’s quickly see an example of how we find the matrix representation for a linear map. Let’s imagine we have the linear map g g that rotates vectors 90º counter clockwise. What would the matrix G G for this function g g be?

Let’s as earlier start with just understanding g g on our two simple vectors:

[ 1 0 ] \textcolor{blue}{\begin{bmatrix}1 \\ 0\end{bmatrix}} and [ 0 1 ] \textcolor{#228B22}{\begin{bmatrix}0 \\ 1 \end{bmatrix}} .

[ 1 0 ] \textcolor{blue}{\begin{bmatrix} 1 \\ 0\end{bmatrix}} turned 90º counterclockwise is [ 0 1 ] \begin{bmatrix}0 \\ 1\end{bmatrix} .
[ 0 1 ] \textcolor{#228B22}{\begin{bmatrix} 0 \\ 1\end{bmatrix}} turned 90º counterclockwise is [ − 1 0 ] \begin{bmatrix}-1 \\ 0\end{bmatrix} .

Putting it altogether, we see that G G , the matrix representation of g g , is G = [ 0 − 1 1 0 ] G = \begin{bmatrix} \textcolor{blue}{0} & \textcolor{#228B22}{-1} \\ \textcolor{blue}{1} & \textcolor{#228B22}{0} \end{bmatrix} G = [ 0 1 − 1 0 ] .

Applying the Function

Ok so we know we can represent a linear map as a matrix - but how do we apply this function on an input?

Specifically, say I have the same function g g . I want to apply this function on the vector [ x y ] \begin{bmatrix} \textcolor{blue}{x} \\ \textcolor{#228B22}{y} \end{bmatrix} . How do we do it?

Let’s start with what we know:

The first column of the matrix tells us g ( [ 1 0 ] ) g(\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}) g ( [ 1 0 ] ) .
the second column tells us g ( [ 0 1 ] ) g(\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}}) g ( [ 0 1 ] ) .

To make use of that, let’s break [ x y ] \begin{bmatrix} \textcolor{blue}{x} \\ \textcolor{#228B22}{y} \end{bmatrix} into a combination of [ 1 0 ] \textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}} and [ 0 1 ] \textcolor{#228B22}{\begin{bmatrix}0 \\ 1 \end{bmatrix}} .

[ x y ] = x ⋅ [ 1 0 ] + y ⋅ [ 0 1 ] \begin{bmatrix} \textcolor{blue}{x} \\ \textcolor{#228B22}{y} \end{bmatrix} = x \cdot \textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}} + y \cdot \textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}} [ x y ] = x ⋅ [ 1 0 ] + y ⋅ [ 0 1 ]

g ( [ x y ] ) = g ( x ⋅ [ 1 0 ] + y ⋅ [ 0 1 ] ) g(\begin{bmatrix} \textcolor{blue}{x} \\ \textcolor{#228B22}{y} \end{bmatrix}) = g(x \cdot \textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}} + y \cdot \textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}}) g ( [ x y ] ) = g ( x ⋅ [ 1 0 ] + y ⋅ [ 0 1 ] )

Thanks to the amazing properties of linear maps ( g ( a + b ) = g ( a ) + g ( b ) g(a+b) = g(a) + g(b) g ( a + b ) = g ( a ) + g ( b ) ), we can now simplify this to:

= g ( x ⋅ [ 1 0 ] ) + g ( y ⋅ [ 0 1 ] ) = g(x \cdot \textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}) + g(y \cdot \textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}}) = g ( x ⋅ [ 1 0 ] ) + g ( y ⋅ [ 0 1 ] )

And then using another property of linear maps ( g ( c ⋅ a ) = c ⋅ g ( a ) g(c \cdot a) = c\cdot g(a) g ( c ⋅ a ) = c ⋅ g ( a ) ),

= x ⋅ g ( [ 1 0 ] ) + y ⋅ g ( [ 0 1 ] ) = x\cdot g(\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}) + y \cdot g(\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}}) = x ⋅ g ( [ 1 0 ] ) + y ⋅ g ( [ 0 1 ] )

And now we apply the knowledge we have (from the columns of the matrix)!

= x ⋅ [ 0 1 ] + y ⋅ [ − 1 0 ] = x \cdot \textcolor{blue}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}} + y\cdot \textcolor{#228B22}{\begin{bmatrix} -1 \\ 0 \end{bmatrix}} = x ⋅ [ 0 1 ] + y ⋅ [ − 1 0 ] g ( [ x y ] ) = [ 0 x − 1 y x + 0 y ] g(\begin{bmatrix} \textcolor{blue}{x} \\ \textcolor{#228B22}{y} \end{bmatrix}) = \begin{bmatrix} 0x - 1y \\ x + 0y \end{bmatrix} g ( [ x y ] ) = [ 0 x − 1 y x + 0 y ]

Does this look familiar? Because it is exactly what you would get by carrying out the matrix multiplication [ 0 − 1 1 0 ] ⋅ [ x y ] \begin{bmatrix} \textcolor{blue}{0} & \textcolor{#228B22}{-1} \\ \textcolor{blue}{1} & \textcolor{#228B22}{0} \end{bmatrix} \cdot \begin{bmatrix} \textcolor{blue}{x} \\ \textcolor{#228B22}{y} \end{bmatrix} [ 0 1 − 1 0 ] ⋅ [ x y ] !

The matrix multiplication rule is just a shorthand for applying the function on a vector! If we ever want to find the result of rotating some vector v v 90º counterclockwise, we just have to compute G v Gv .

Matrix Multiplication as Function Composition

Ok so we’ve seen that multiplying a 2x2 matrix by a vector is just applying the function on the vector. But what does matrix multiplication when we are multiplying two 2x2 matrices?

To answer this, we’re going to take a small detour. We know how to apply a function f f on an input vector v v . What if I ask you to find the composition h = g ( f ( v ) ) h = g(f(v)) h = g ( f ( v ) ) where we know f f and g g ?

Let’s work through this now:

Let G G be the matrix for the function g g .
Let F F be the matrix for function f f .

We want to find the matrix H H that represents the function h = g ( f ( v ) ) h = g(f(v)) h = g ( f ( v ) ) .

This is going to be:

H = [ g ( f ( [ 1 0 ] ) g ( f ( [ 0 1 ] ) ) ] H = \begin{bmatrix} g(f(\textcolor{blue}{\begin{bmatrix} 1 \\ 0 \end{bmatrix}}) & g(f(\textcolor{#228B22}{\begin{bmatrix} 0 \\ 1 \end{bmatrix}})) \end{bmatrix} H = [ g ( f ( [ 1 0 ] ) g ( f ( [ 0 1 ] ) ) ]

Let F c o l 1 \textcolor{blue}{F_{col1}} be the first column of F F and let F c o l 2 \textcolor{#228B22}{F_{col2}} be the second column.

Then we can simplify to:

H = [ g ( F c o l 1 ) g ( F c o l 2 ) ] H = \begin{bmatrix} g(\textcolor{blue}{F_{col1}}) & g(\textcolor{#228B22}{F_{col2}}) \end{bmatrix} H = [ g ( F c o l 1 ) g ( F c o l 2 ) ]

So what’s g ( F c o l 1 ) g(\textcolor{blue}{F_{col1}}) g ( F c o l 1 ) ?

Well we just saw earlier that this is just G ⋅ F c o l 1 G \cdot \textcolor{blue}{F_{col1}} G ⋅ F c o l 1 . So we can now write:

H = [ G ⋅ F c o l 1 G ⋅ F c o l 2 ] H = \begin{bmatrix} G \cdot \textcolor{blue}{F_{col1}} & G \cdot \textcolor{#228B22}{F_{col2}} \end{bmatrix} H = [ G ⋅ F c o l 1 G ⋅ F c o l 2 ]

Seem familiar? Because it’s exactly what the formula for the 2x2 matrix multiplication H = G ⋅ F H = G\cdot F !

So H H , the matrix representing g ( f ( x ) ) g(f(x)) g ( f ( x ) ) , is just G ⋅ F G\cdot F .

Thus, matrix multiplication is just a way to compose linear maps.

An Example of Function Composition

Let’s work through a full example to see this linear map composition in action.

Let f f be our function as earlier (represented by the matrix F = [ 3 0 0 5 ] F = \begin{bmatrix}3 & 0 \\ 0 & 5\end{bmatrix} F = [ 3 0 0 5 ] ).
Let g g be the function that rotates a vector 90º counter-clockwise as before (represented by G = [ 0 − 1 1 0 ] G = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} G = [ 0 1 − 1 0 ] ).

Finding g g Composed With f f

Let’s work through h = g ∘ f h = g \circ f applied on the vector v = [ 2 1 ] v = \begin{bmatrix} 2 \\ 1\end{bmatrix} v = [ 2 1 ] . We can use just the properties of f f and g g to figure out this result (see diagram below).

B7ZbMvn.png!web

We start with v v .
We know f f stretches its input by 3 x 3x in the x direction and by 5 x 5x in the y direction, leading to f ( v ) = [ 6 5 ] f(v) = \begin{bmatrix} 6 \\ 5 \end{bmatrix} f ( v ) = [ 6 5 ] .
We then just rotate the resulting vector 90º counter-clockwise to get g ( f ( v ) ) = [ − 5 6 ] g(f(v)) = \begin{bmatrix} -5 \\ 6 \end{bmatrix} g ( f ( v ) ) = [ − 5 6 ] .

g g Composed With f f Using Matrices

Do we get the same result when using function composition through matrices? Let’s find out:

The matrix for g ∘ f g \circ f is H = G ⋅ F H = G \cdot F .
H = G ⋅ F = [ 0 − 1 1 0 ] [ 3 0 0 5 ] = [ 0 − 5 3 0 ] H = G \cdot F = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} 3 & 0 \\ 0 & 5 \end{bmatrix} = \begin{bmatrix} 0 & -5 \\ 3 & 0 \end{bmatrix} H = G ⋅ F = [ 0 1 − 1 0 ] [ 3 0 0 5 ] = [ 0 3 − 5 0 ]
To apply H H on the vector v v we just do H ⋅ v H \cdot v .
H ⋅ v = [ 0 − 5 3 0 ] [ 2 1 ] = [ − 5 6 ] H \cdot v = \begin{bmatrix} 0 & -5 \\ 3 & 0 \end{bmatrix} \begin{bmatrix} 2 \\ 1 \end{bmatrix} = \begin{bmatrix} -5 \\ 6 \end{bmatrix} H ⋅ v = [ 0 3 − 5 0 ] [ 2 1 ] = [ − 5 6 ] .

This is exactly what we got when visually applying g ( f ( v ) ) g(f(v)) g ( f ( v ) ) . So we were able to get the same result without having to apply each function one at a time - we were just able to use matrix multiplication to find out the final function and apply it directly. While this already saves us time, imagine how much time this would save if we had to compose 10 or 20 different functions!

Big Picture: Matrices Give Us Power

Remember how much power we gained by being able to represent a function as a polynomial? We were able to combine polynomials, compose them, multiply them - we now have all that same power for linear maps!

We can compose them (matrix multiplication), add them (matrix addition), and invert them (matrix inversion) all by following some simple rules.

A more sophisticated way of saying this is we have an algebra on these functions (just like we have an algebra on integers etc.).

This is what linear algebra broadly means - the study of how we can combine and compose linear maps just as we combine and compose numbers etc.

Looking Forward

So we’ve seen pretty clearly that matrices are just functions and that linear algebra is the study of combining these functions.

Understanding this helps us perceive some of the key ideas of linear algebra in a new way. In the next post, we’re going to focus on eigenvectors and see how they provide valuable insights into these linear maps.

Representing Functions