How and why rotation matrices work

If you've done any sort of work with graphics, you're no doubt familiar with matrix operations — scaling, translating and rotating. In the matrix model of graphics operations, each operation is parameterized by a specific matrix, and the individual coordinates of the shapes to be manipulated are represented as column vectors. Scaling (resizing) and translating (moving) are easy enough to understand (but I'll go ahead and review them briefly anyway), but I've never seen the rotation matrix adequately explained. Most books about linear algebra try to frame rotation matrices around the context of changes of basis of orthogonal vectors (say that five times fast!) — I'm sure that's a useful explanation to somebody, somewhere, but I prefer a geometrical explanation.

To start with, shapes are represented as polygons. Each point in the polygon assigned an n-element column vector, where n is the number of dimensions of the polygon. In two dimensions, then, a five pointed star might be represented as the column vectors:

0
30
7
7
30
7
10
-5
20
-30
0
-13
-20
-30
-10
-5
-30
7
-7
7

And be displayed like this:

Figure 1: Five-pointed star

If you wanted to stretch the star in the x direction, you would multiply each x coordinate by the stretching or scaling factor. So, to double the width of the star, you'd have:

0
30
14
7
60
7
20
-5
40
-30
0
-13
-30
-30
-20
-5
-60
7
-14
7

Figure 2: Five-pointed star, stretched along the x-axis

Notice above that the second column (the y coordinates) don't change, but the values of the first column double. This operation can be represented in matrix form as:

sx0
01
x
y
=
sx * x + 0 * y
0 * x + 1 * y
=
sx * x
y

More generally, the figure can be stretched in either direction, or both simultaneously:

sx0
0sy
x
y

Translating — moving a figure from one place in the grid to another — is a separate operation. Conceptually, it's just a matter of adding the x offset and the y offset to each point in the polygon. To make this matrix friendly, though, matrix graphics add a third coordinate, which is always just a 1. So the five-pointed star from above becomes:

0
30
1
7
7
1
30
7
1
10
-5
1
20
-30
1
0
-13
1
-20
-30
1
-10
-5
1
-30
7
1
-7
7
1

In this way, the translation matrix can be implemented as a 3x3 matrix in the form:

10tx
01ty
001
x
y
1
=
1 * x + 0 * y + 1 * tx
0 * x + 1 * y + 1 * ty
0 * x + 0 * y + 1 * 1
=
x + tx
y + ty
1

So far, all I've done is overcomplicate things. For simple scaling operations, I've replaced two multiplications with four multiplication and two add operations, and actually slowed things down a teeny fraction. Translation, which was just a couple of adds, is now nine multiplications and six adds! The real benefit comes from the combinability of the transformation matrices. The star in figure 1 is centered on the origin; this isn't particularly useful in a computer graphics scenario, since the origin is the upper-left corner of the screen. To display this star, you want to translate it to screen coordinates. But once you do, it's no longer stretchable; simple scaling operations (multiplying the x and y coordinates by the scaling factors) occur relative to the origin. So, if the star was in the upper right as in figure 3:

Figure 3: Five-pointed star, centered at (50,50)

and you tried to apply a scale by 2 operation to it by simply multiplying each component by 2, you'd end up with figure 4:

Figure 4: Scaling gone wrong

The star has, indeed, double its size, but it's also moved twice as far away from the origin — not necessarily what you wanted (and if that was what you wanted, you probably wanted more control over it anyway). Working out how to double the size of the star while keeping it in place is a daunting task; first, you need to find the centroid of the polygon, and then you need to halve each x and y coordinate less than the centroid, but double each x and y coordinate greater than it. A conceptually much simpler approach is to move the star to the origin, apply a common scale factor, and then move it back where it came from.

Still; that's a lot of mathematical operations. This is where the matrix format earns its keep. If you represent the translation and scale operations as matrices, you can "pre-multiply" them into a single uber matrix that can be applied as one to each point in the polygon. So (working right-to-left):

10cx
01cy
001
sx00
0sy0
001
10-cx
01-cy
001

10cx
01cy
001
(sx * 1 + 0 * 0 + 0 * 0) (sx * 0 + 0 * 1 + 0 * 0) (sx * -cx + 0 * -cy + 0 * 1)
(0 * 1 + sy * 0 + 0 * 0) (0 * 0 + sy * 1 + 0 * 0) (0 * -cx + sy * -cy + 0 * 1)
(0 * 1 + 0 * 0 + 1 * 0) (0 * 0 + 0 * 1 + 1 * 0) (0 * -cx + 0 * -cy + 1 * 1)

10cx
01cy
001
sx 0 (sx * -cx)
0 sy (sy * -cy)
0 0 1

(1 * sx + 0 * 0 + cx * 0) (1 * 0 + 0 * sy + cx * 0) (1 * (sx * -cx) + 0 * (sy * -cy) + cx * 1)
(0 * sx + 1 * 0 + cy * 0) (0 * 0 + 1 * sy + cy * 0) (0 * (sx * -cx) + 1 * (sy * -cy) + cy * 1)
(0 * sx + 0 * 0 + 1 * 0) (0 * 0 + 0 * sy + 1 * 0) (0 * (sx * -cx) + 0 * (sy * -cy) + 1 * 1)

sx 0 (sx * -cx) + cx
0 sy (sy * -cy) + cy
0 0 1

Where (cx,cy) is the centroid of the figure and sx and sy are the scale factors to apply. Of course, you don't need to work out the algebra like I did here; your GPU does all of that for you. You just plug in the three matrices and apply them. So, given the polygon in figure 3 with centroid (50, 50), to double it, you'd multiply the matrix:

2 0 (2 * -50) + 50
0 2 (2 * -50) + 50
0 0 1
=
2 0 -50
0 2 -50
0 0 1

Applied to each point in the figure gives you figure 5:

Figure 5: Scaling done correctly

Click the display to see each suboperation, but recognize that the end result is computed all at once.

Where matrix representation really derives its power — and what I promised to talk about at the beginning of this post — is when you start rotating shapes around arbitrary points. Since you can translate and "untranslate" to and from the origin, all you have to figure out is how to represent rotation about the origin in matrix form, and you can rotate around any arbitrary point in the figure.

As you can probably guess, this involves trigonometry. But how, exactly? The sine of an angle is the ratio of its opposite side to its hypotenuse; the y coordinate is the height of the opposite side. So, to rotate a point (x,y) about the origin by an angle of θ, you could find the distance of the point from the center using the pythagorean theorem, divide its y coordinate by that distance, compute the arcsine of that ratio, add the desired angle to it, and then multiply back out by the distance:

d = √x2+y2
y = d * sin( arcsin( y / d ) + θ )
x = d * cos( arccos( x / d ) + θ )

This would work, but would be very computationally expensive — square roots and trignometric formulas are slow, and you'd have to apply this formula individually to each point. A better, faster (and, of course, more matrix-friendly) way to do this takes advantage of the trigonometric identities:

cos( α + β ) = cos α cos β - sin α sin β
sin( α + β ) = sin α cos β + cos α sin β

If you scale the polygon so that it's centered around the origin, then the known values (x,y) = (cos α, sin α), so this reduces to:

cos( α + β ) = x cos β - y sin β
sin( α + β ) = y cos β + x sin β

So, to rotate a point (x,y) about the origin by angle θ, you can apply the rotation matrix:

cos θ-sin θ
sin θcos θ
x
y
=
(x cos θ - y sin θ)
(x sin θ + y cos θ)

The beauty of this is that the matrix itself is a constant; cos θ and sin θ only need to be computed once per rotation and the multiplied through each point. And, since it's a matrix, it can be combines with other arbitrary scale and translation operations into a single point operation; if you want to rotate about a point other than the origin, apply a transformation matrix so that that point becomes the origin, rotate, and "untranslate" just as you did in the case of scaling. This extends out to three dimensions, too — this is how 3D graphics represent virtual cameras. Figure 6 shows the original star from figure 1 rotated through 30, 45 & 60 degrees about the origin and then the offset star from figure 3 rotated by the same angles, again about the origin and finally figure 3 rotated about its lower-left corner (click to see each step).

100
010
001

Figure 6: Rotation about various points

As you see, the rotation matrices vary by angle when the rotation point is the origin, but they vary also by rotation point when it isn't. It would have been extremely difficult to compute the correct translation values "by hand", but matrix representation makes it simple to calculate.

Add a comment:

Completely off-topic or spam comments will be removed at the discretion of the moderator.

You may preserve formatting (e.g. a code sample) by indenting with four spaces preceding the formatted line(s)

Name: Name is required
Email (will not be displayed publicly):
Comment:
Comment is required
Vilu, 2013-02-19
That's a mold-breaker. Great thiknnig!
Matt, 2014-04-09
Hey, awesome explanation! Can you just clarify why we can substitute (x,y) with (cos a, tan a)? I understand that for a *unit circle* the x value is cos a / 1 = cos a , and likewise for sin, but wouldn't you have to divide by d for any point that's not d = 1?
Matt, 2014-04-09
Sorry, I mean why we can substitute (x,y) with (cos a, sin a), not (cos a, tan a)
Josh, 2014-04-14

Ah, fair point - I suppose I wasn't being as mathematically rigorous as I ought to have been. Consider a point d units away from the origin, coincident with angle α from the X axis. Now, the x and y coordinates of that point are given by:

x = d cos α

y = d sin α

So to find the x and y coordinates of the point after it has been rotated through β degrees from its starting point, you want to compute:

x' = d cos (α+β)

y' = d sin (α+β)

which expands to:

x' = d (cos α cos β - sin α sin β)

y' = d (sin α cos β + cos α sin β)

or:

x' = d cos α cos β - d sin α sin β

y' = d sin α cos β + d cos α sin β

Substituting x and y above:

x' = x cos β - y sin β

y' = y cos β + x sin β

So you see that the d drops out (or, it's part of the original x and y) and you end up computing the correct point.

Chris, 2016-11-21
Thanks for this and the follow up comment for getting rid of d. I'm pretty sure my students won't care, but I feel like I can have some authority on it now.
pirloForPresident, 2017-06-14
thanks a lot for this explanation! Will surely help me with my attempt to understand image processing for camera pose estimation.
My Book

I'm the author of the book "Implementing SSL/TLS Using Cryptography and PKI". Like the title says, this is a from-the-ground-up examination of the SSL protocol that provides security, integrity and privacy to most application-level internet protocols, most notably HTTP. I include the source code to a complete working SSL implementation, including the most popular cryptographic algorithms (DES, 3DES, RC4, AES, RSA, DSA, Diffie-Hellman, HMAC, MD5, SHA-1, SHA-256, and ECC), and show how they all fit together to provide transport-layer security.

My Picture

Joshua Davies

Past Posts