In a previous video, I’ve talked about linear

systems of equations, and I sort of brushed aside the discussion of actually computing

solutions to these systems. And while it’s true that number-crunching

is something we typically leave to the computers, digging into some of these computational methods

is a good litmus test for whether or not you actually understand what’s going on, since

this is really where the rubber meets the road. Here I want to describe the geometry behind

a certain method for computing solutions to these systems, known as Cramer’s rule. The relevant background needed here is an

understanding of determinants, dot products, and of linear systems of equations, so be

sure to watch the relevant videos on those topics if you’re unfamiliar or rusty. But first! I should say up front that Cramer’s rule

is not the best way for computing solutions to linear systems of equations. Gaussian elimination, for example, will always

be faster. So why learn it? Think of this as a sort of cultural excursion;

it’s a helpful exercise in deepening your knowledge of the theory of these systems. Wrapping your mind around this concept will

help consolidate ideas from linear algebra, like the determinant and linear systems, by

seeing how they relate to each other. Also, from a purely artistic standpoint, the

ultimate result is just really pretty to think about, much more so that Gaussian elimination. Alright, so the setup here will be some linear

system of equations, say with two unknowns, x and y, and two equations. In principle, everything we’re talking about

will work systems with a larger number of unknowns, and the same number of equations. But for simplicity, a smaller example is nicer

to hold in our heads. So as I talked about in a previous video,

you can think of this setup geometrically as a certain known matrix transforming an

unknown vector, [x; y], where you know what the output is going to be, in this case [-4;

-2]. Remember, the columns of this matrix tell

you how the matrix acts as a transform, each one telling you where the basis vectors of

the input space land. So this is a sort of puzzle, what input [x;

y], is going to give you this output [-4; -2]? Remember, the

type of answer you get here can depend on whether or not the transformation squishes

all of space into a lower dimension. That is if it has zero determinant. In that case, either none of the inputs land

on our given output or there are a whole bunch of inputs landing on that output. But for this video we’ll limit our view

to the case of a non-zero determinant, meaning the output of this transformation still spans

the full n-dimensional space it started in; every input lands on one and only one output

and every output has one and only one input. One way to think about our puzzle is that

we know the given output vector is some linear combination of the columns of the matrix;

x*(the vector where i-hat lands) + y*(the vector where j-hat lands), but we wish to

compute what exactly x and y are. As a first pass, let me show an idea that

is wrong, but in the right direction. The x-coordinate of this mystery input vector

is what you get by taking its dot product with the first basis vector, [1; 0]. Likewise, the y-coordinate is what you get

by dotting it with the second basis vector, [0; 1]. So maybe you hope that after the transformation,

the dot products with the transformed version of the mystery vector with the transformed

versions of the basis vectors will also be these coordinates x and y. That’d be fantastic because we know the

transformed versions of each of these vectors. There’s just one problem with this: it’s

not at all true! For most linear transformations, the dot product

before and after the transformation will be very different. For example, you could have two vectors generally

pointing in the same direction, with a positive dot product, which get pulled away from each

other during the transformation, in such a way that they then have a negative dot product. Likewise, if things start off perpendicular,

with dot product zero, like the two basis vectors, there’s no guarantee that they

will stay perpendicular after the transformation, preserving that zero dot product. In the example we were looking at, dot products

certainly aren’t preserved. They tend to get bigger since most vectors

are getting stretched. In fact, transformations which do preserve

dot products are special enough to have their own name: Orthonormal transformations. These are the ones which leave all the basis

vectors perpendicular to each other with unit lengths. You often think of these as rotation matrices. The correspond to rigid motion, with no stretching,

squishing or morphing. Solving a linear system with an orthonormal

matrix is very easy: Since dot products are preserved, taking the dot product between

the output vector and all the columns of your matrix will be the same as taking the dot

products between the input vector and all the basis vectors, which is the same as finding

the coordinates of the input vector. So, in that very special case, x would be

the dot product of the first column with the output vector, and y would be the dot product

of the second column with the output vector. Now, even though this idea breaks down for

most linear systems, it points us in the direction of something to look for: Is there an alternate

geometric understanding for the coordinates of our input vector which remains unchanged

after the transformation? If your mind has been mulling over determinants,

you might think of this clever idea: Take the parallelogram defined by the first basis

vector, i-hat, and the mystery input vector [x; y]. The area of this parallelogram is its base,

1, times the height perpendicular to that base, which is the y-coordinate of our input

vector. So, the area of this parallelogram is sort

of a screwy roundabout way to describe the vector’s y-coordinate; it’s a wacky way

to talk about coordinates, but run with me. Actually, to be more accurate, you should

think of the signed area of this parallelogram, in the sense described by the determinant

video. That way, a vector with negative y-coordinate

would correspond to a negative area for this parallelogram. Symmetrically, if you

look at the parallelogram spanned by the vector and the second basis vector, j-hat, its area

will be the x-coordinate of the vector. Again, it’s a strange way to represent the

x-coordinate, but you’ll see what it buys us in a moment. Here’s what this would look like in three-dimensions:

Ordinarily the way you might think of one of a vector’s coordinate, say its z-coordinate,

would be to take its dot product with the third standard basis vector, k-hat. But instead, consider the parallelepiped it

creates with the other two basis vectors, i-hat and j-hat. If you think of the square with area 1 spanned

by i-hat and j-hat as the base of this guy, its volume is the same its height, which is

the third coordinate of our vector. Likewise, the wacky way to think about any

other coordinate of this vector is to form the parallelepiped between this vector an

all the basis vectors other than the one you’re looking for, and get its volume. Or, rather, we should talk about the signed

volume of these parallelepipeds, in the sense described in the determinant video, where

the order in which you list the three vectors matters and you’re using the right-hand

rule. That way negative coordinates still make sense. Okay, so why think of coordinates as areas

and volumes like this? As you apply some matrix transformation, the

areas of the parallelograms don’t stay the same, they may get scaled up or down. But(!), and this is a key idea of determinants,

all these areas get scaled by the same amount. Namely, the determinant of our transformation

matrix. For example, if you look the parallelogram

spanned by the vector where your first basis vector lands, which is the first column of

the matrix, and the transformed version of [x; y], what is its area? Well, this is the transformed version of that

parallelogram we were looking at earlier, whose area was the y-coordinate of the mystery

input vector. So its area will be the determinant of the

transformation multiplied by that value. So, the y-coordinate of our mystery input

vector is the area of this parallelogram, spanned by the first column of the matrix

and the output vector, divided by the determinant of the full transformation. And how do you get this area? Well, we know the coordinates for where the

mystery input vector lands, that’s the whole point of a linear system of equations. So, create a matrix whose first column is

the same as that of our matrix, and whose second column is the output vector, and take

its determinant. So look at that; just using data from the

output of the transformation, namely the columns of the matrix and the coordinates of our output

vector, we can recover the y-coordinate of our mystery input vector. Likewise, the same idea can get you the x-coordinate. Look at that parallelogram we defined early

which encodes the x-coordinate of the mystery input vector, spanned by the input vector

and j-hat. The transformed version of this guy is spanned

by the output vector and the second column of the matrix, and its area will have been

multiplied by the determinant of the matrix. So the x-coordinate of our mystery input vector

is this area divided by the determinant of the transformation. Symmetric to what we did before, you can compute

the area of that output parallelogram by creating a new matrix whose first column is the output

vector, and whose second column is the same as the original matrix. So again, just using data from the output

space, the numbers we see in our original linear system, we can recover the x-coordinate

of our mystery input vector. This formula for finding the solutions to

a linear system of equations is known as Cramer’s rule. Here, just to sanity check ourselves, plug

in the numbers here. The determinant of that top altered matrix

is 4+2, which is 6, and the bottom determinant is 2, so the x-coordinate should be 3. And indeed, looking back at that input vector

we started with, it’s x-coordinate is 3. Likewise, Cramer’s rule suggests the y-coordinate

should be 4/2, or 2, and that is indeed the y-coordinate of the input vector we started

with here. The case with three dimensions is similar,

and I highly recommend you pause to think it through yourself. Here, I’ll give you a little momentum. We have this known transformation, given by

a 3×3 matrix, and a known output vector, given by the right side of our linear system, and

we want to know what input vector lands on this output vector. If you think of, say, the z-coordinate of

the input vector as the volume of this parallelepiped spanned by i-hat, j-hat, and the mystery input

vector, what happens to the volume of this parallelepiped after the transformation? How can you compute that new volume? Really, pause and take a moment to think through

the details of generalizing this to higher dimensions; finding an expression for each

coordinate of the solution to larger linear systems. Thinking through more general cases and convincing

yourself that it works is where all the learning will happen, much more so than listening to

some dude on YouTube walk through the reasoning again.

Awesome !!

Still waiting to hear the Russian accent version of parallelepiped 😉 Thank u for updating. Still a bit not clear but will likely have to watch few times.

Hi,

Fantastic Video, Helped a lot to understand very basics.

I have one query. As we know, 2×2 can be representation of 2D vector, 3×3 is 3D vector, like wise do we have any geometrical meaning for 4×4 matrix and higher order matrix. Can you tell me, what is the physical significance of 4×4 matrix and higher order square matrix.

thank you so much in advance.

please do a video on tensors take the challenge

where is the video about Gaussian Eliminationnnnnn?

It is not the spoon that bends, it is only yourself

Amazing content, thank you sir, you have a special place in my life to reveal the bright side of mathematics

This is awesome! Keep it up! You are a gift to the world!

Should have found this channel before. Absolute genius

My two cents is that unlike Gauss-Jordan elimination algorithm, Cramer's rule is a rational function, so you can answer questions with variable parameters by solving algebraic equations. (Akin to computing eigenvalues using determinant.)

Beautiful

Maybe i misunderstood, but can's you just compute the product of your output vector and the inverse of the original matrix to get your original vector!

how did u do that to the seemingly most boring rule I've ever learned?

I hated it soooo much. Hated linear alg because it made no intuitive sense to me whatsoever

but now, god, it's beautiful.

The rule of some people’s method for exam revision.

woah nice intro

Can you please explain the geometric interpretation of the transpose and adjugate of a matrix

@ 3:00 -ish when you show graphically the det(A)=0 solutions was profound.

Seeing the many solutions coalescing onto a single point just nails home the eigen value / eigen vector relationship, IMO.

Cool

I see, you mentioned signed area last video, but I didn't understand then.

If determinant of 3 by 3 matrice is zero, that is it squished into lower dimension but how do we confirm that it is squished into plane,line or dot.Does rank of a matrix tell what squishing is done.

Im not good at math but i love science

Enlightening. Just purely enlightening! I think the key to understanding here, as pointed out in the video, is that under linear transformation all areas (or volumes in 3d case) change in the same way, so that the RATIO of change is the same. Cramer's rule is really all about this change. Rearranging the equations to reflect this ratio of change really helped me digest this one.

I've never taken any linear algebra class before, but this brilliant series makes me really want to learn much more about the subject. To enlight, not to daunt, students, is the only golden standard of teaching. Can't imagine how much happier and more satisfactory students could have been if they were taught this way in school. Oh man, this even makes me want to become a teacher like him.

Keep up the enlightning process, please!

I love it

I have a question about the orthonormal matrix example you showed. You said that x = first column of transformation and the output vector and similarly for y. But don't orthonormal matrix get transposed when we shift it to other side of the equation? you multiplied the output vector with the orthonormal matrix but instead, we should have multiplied it by the transposed orthonormal matrix.