How and why rotation matrices work
If you've done any sort of work with graphics, you're no doubt familiar with matrix operations — scaling, translating and rotating. In the matrix model of graphics operations, each operation is parameterized by a specific matrix, and the individual coordinates of the shapes to be manipulated are represented as column vectors. Scaling (resizing) and translating (moving) are easy enough to understand (but I'll go ahead and review them briefly anyway), but I've never seen the rotation matrix adequately explained. Most books about linear algebra try to frame rotation matrices around the context of changes of basis of orthogonal vectors (say that five times fast!) — I'm sure that's a useful explanation to somebody, somewhere, but I prefer a geometrical explanation.
To start with, shapes are represented as polygons. Each point in the polygon assigned an n-element column vector, where n is the number of dimensions of the polygon. In two dimensions, then, a five pointed star might be represented as the column vectors:
0 |
30 |
7 |
7 |
30 |
7 |
10 |
-5 |
20 |
-30 |
0 |
-13 |
-20 |
-30 |
-10 |
-5 |
-30 |
7 |
-7 |
7 |
And be displayed like this:
If you wanted to stretch the star in the x direction, you would multiply each x coordinate by the stretching or scaling factor. So, to double the width of the star, you'd have:
0 |
30 |
14 |
7 |
60 |
7 |
20 |
-5 |
40 |
-30 |
0 |
-13 |
-30 |
-30 |
-20 |
-5 |
-60 |
7 |
-14 |
7 |
Notice above that the second column (the y coordinates) don't change, but the values of the first column double. This operation can be represented in matrix form as:
sx | 0 |
0 | 1 |
x |
y |
sx * x + 0 * y |
0 * x + 1 * y |
sx * x |
y |
More generally, the figure can be stretched in either direction, or both simultaneously:
sx | 0 |
0 | sy |
x |
y |
Translating — moving a figure from one place in the grid to another — is a separate operation. Conceptually, it's just a matter of adding the x offset and the y offset to each point in the polygon. To make this matrix friendly, though, matrix graphics add a third coordinate, which is always just a 1. So the five-pointed star from above becomes:
0 |
30 |
1 |
7 |
7 |
1 |
30 |
7 |
1 |
10 |
-5 |
1 |
20 |
-30 |
1 |
0 |
-13 |
1 |
-20 |
-30 |
1 |
-10 |
-5 |
1 |
-30 |
7 |
1 |
-7 |
7 |
1 |
In this way, the translation matrix can be implemented as a 3x3 matrix in the form:
1 | 0 | tx |
0 | 1 | ty |
0 | 0 | 1 |
x |
y |
1 |
1 * x + 0 * y + 1 * tx |
0 * x + 1 * y + 1 * ty |
0 * x + 0 * y + 1 * 1 |
x + tx |
y + ty |
1 |
So far, all I've done is overcomplicate things. For simple scaling operations, I've replaced two multiplications with four multiplication and two add operations, and actually slowed things down a teeny fraction. Translation, which was just a couple of adds, is now nine multiplications and six adds! The real benefit comes from the combinability of the transformation matrices. The star in figure 1 is centered on the origin; this isn't particularly useful in a computer graphics scenario, since the origin is the upper-left corner of the screen. To display this star, you want to translate it to screen coordinates. But once you do, it's no longer stretchable; simple scaling operations (multiplying the x and y coordinates by the scaling factors) occur relative to the origin. So, if the star was in the upper right as in figure 3:
and you tried to apply a scale by 2 operation to it by simply multiplying each component by 2, you'd end up with figure 4:
The star has, indeed, double its size, but it's also moved twice as far away from the origin — not necessarily what you wanted (and if that was what you wanted, you probably wanted more control over it anyway). Working out how to double the size of the star while keeping it in place is a daunting task; first, you need to find the centroid of the polygon, and then you need to halve each x and y coordinate less than the centroid, but double each x and y coordinate greater than it. A conceptually much simpler approach is to move the star to the origin, apply a common scale factor, and then move it back where it came from.
Still; that's a lot of mathematical operations. This is where the matrix format earns its keep. If you represent the translation and scale operations as matrices, you can "pre-multiply" them into a single uber matrix that can be applied as one to each point in the polygon. So (working right-to-left):
1 | 0 | cx |
0 | 1 | cy |
0 | 0 | 1 |
sx | 0 | 0 |
0 | sy | 0 |
0 | 0 | 1 |
1 | 0 | -cx |
0 | 1 | -cy |
0 | 0 | 1 |
1 | 0 | cx |
0 | 1 | cy |
0 | 0 | 1 |
(sx * 1 + 0 * 0 + 0 * 0) | (sx * 0 + 0 * 1 + 0 * 0) | (sx * -cx + 0 * -cy + 0 * 1) |
(0 * 1 + sy * 0 + 0 * 0) | (0 * 0 + sy * 1 + 0 * 0) | (0 * -cx + sy * -cy + 0 * 1) |
(0 * 1 + 0 * 0 + 1 * 0) | (0 * 0 + 0 * 1 + 1 * 0) | (0 * -cx + 0 * -cy + 1 * 1) |
1 | 0 | cx |
0 | 1 | cy |
0 | 0 | 1 |
sx | 0 | (sx * -cx) |
0 | sy | (sy * -cy) |
0 | 0 | 1 |
(1 * sx + 0 * 0 + cx * 0) | (1 * 0 + 0 * sy + cx * 0) | (1 * (sx * -cx) + 0 * (sy * -cy) + cx * 1) |
(0 * sx + 1 * 0 + cy * 0) | (0 * 0 + 1 * sy + cy * 0) | (0 * (sx * -cx) + 1 * (sy * -cy) + cy * 1) |
(0 * sx + 0 * 0 + 1 * 0) | (0 * 0 + 0 * sy + 1 * 0) | (0 * (sx * -cx) + 0 * (sy * -cy) + 1 * 1) |
sx | 0 | (sx * -cx) + cx |
0 | sy | (sy * -cy) + cy |
0 | 0 | 1 |
Where (cx,cy) is the centroid of the figure and sx and sy are the scale factors to apply. Of course, you don't need to work out the algebra like I did here; your GPU does all of that for you. You just plug in the three matrices and apply them. So, given the polygon in figure 3 with centroid (50, 50), to double it, you'd multiply the matrix:
2 | 0 | (2 * -50) + 50 |
0 | 2 | (2 * -50) + 50 |
0 | 0 | 1 |
2 | 0 | -50 |
0 | 2 | -50 |
0 | 0 | 1 |
Applied to each point in the figure gives you figure 5:
Click the display to see each suboperation, but recognize that the end result is computed all at once.
Where matrix representation really derives its power — and what I promised to talk about at the beginning of this post — is when you start rotating shapes around arbitrary points. Since you can translate and "untranslate" to and from the origin, all you have to figure out is how to represent rotation about the origin in matrix form, and you can rotate around any arbitrary point in the figure.
As you can probably guess, this involves trigonometry. But how, exactly? The sine of an angle is the ratio of its opposite side to its hypotenuse; the y coordinate is the height of the opposite side. So, to rotate a point (x,y) about the origin by an angle of θ, you could find the distance of the point from the center using the pythagorean theorem, divide its y coordinate by that distance, compute the arcsine of that ratio, add the desired angle to it, and then multiply back out by the distance:
d = √x2+y2
y = d * sin( arcsin( y / d ) + θ )
x = d * cos( arccos( x / d ) + θ )
This would work, but would be very computationally expensive — square roots and trignometric formulas are slow, and you'd have to apply this formula individually to each point. A better, faster (and, of course, more matrix-friendly) way to do this takes advantage of the trigonometric identities:
cos( α + β ) = cos α cos β - sin α sin β
sin( α + β ) = sin α cos β + cos α sin β
If you scale the polygon so that it's centered around the origin, then the known values (x,y) = (cos α, sin α), so this reduces to:
cos( α + β ) = x cos β - y sin β
sin( α + β ) = y cos β + x sin β
So, to rotate a point (x,y) about the origin by angle θ, you can apply the rotation matrix:
cos θ | -sin θ |
sin θ | cos θ |
x |
y |
(x cos θ - y sin θ) |
(x sin θ + y cos θ) |
The beauty of this is that the matrix itself is a constant; cos θ and sin θ only need to be computed once per rotation and the multiplied through each point. And, since it's a matrix, it can be combines with other arbitrary scale and translation operations into a single point operation; if you want to rotate about a point other than the origin, apply a transformation matrix so that that point becomes the origin, rotate, and "untranslate" just as you did in the case of scaling. This extends out to three dimensions, too — this is how 3D graphics represent virtual cameras. Figure 6 shows the original star from figure 1 rotated through 30, 45 & 60 degrees about the origin and then the offset star from figure 3 rotated by the same angles, again about the origin and finally figure 3 rotated about its lower-left corner (click to see each step).
1 | 0 | 0 |
0 | 1 | 0 |
0 | 0 | 1 |
As you see, the rotation matrices vary by angle when the rotation point is the origin, but they vary also by rotation point when it isn't. It would have been extremely difficult to compute the correct translation values "by hand", but matrix representation makes it simple to calculate.
Add a comment:
Ah, fair point - I suppose I wasn't being as mathematically rigorous as I ought to have been. Consider a point d units away from the origin, coincident with angle α from the X axis. Now, the x and y coordinates of that point are given by:
x = d cos α
y = d sin α
So to find the x and y coordinates of the point after it has been rotated through β degrees from its starting point, you want to compute:
x' = d cos (α+β)
y' = d sin (α+β)
which expands to:
x' = d (cos α cos β - sin α sin β)
y' = d (sin α cos β + cos α sin β)
or:
x' = d cos α cos β - d sin α sin β
y' = d sin α cos β + d cos α sin β
Substituting x and y above:
x' = x cos β - y sin β
y' = y cos β + x sin β
So you see that the d drops out (or, it's part of the original x and y) and you end up computing the correct point.