How and why rotation matrices work
If you've done any sort of work with graphics, you're no doubt familiar
with matrix operations — scaling, translating and rotating. In the
matrix model of graphics operations, each operation is parameterized by a
specific matrix, and the individual coordinates of the shapes to be manipulated
are represented as column vectors. Scaling (resizing) and translating
(moving) are easy enough to understand (but I'll go ahead and review
them briefly anyway), but I've never seen the rotation matrix adequately
explained. Most books about linear algebra try to frame rotation matrices
around the context of changes of basis of orthogonal vectors (say that
five times fast!) — I'm sure that's a useful explanation to somebody,
somewhere, but I prefer a geometrical explanation.
To start with, shapes are represented as polygons. Each point in the
polygon assigned an n-element column vector, where n is the
number of dimensions of the polygon. In two dimensions, then, a five pointed
star might be represented as the column vectors:
And be displayed like this:
Figure 1: Five-pointed star
If you wanted to stretch the star in the x direction, you would multiply
each x coordinate by the stretching or scaling factor. So, to
double the width of the star, you'd have:
Figure 2: Five-pointed star, stretched along the x-axis
Notice above that the second column (the y coordinates) don't change, but
the values of the first column double.
This operation can be represented in matrix form as:
=
| sx * x + 0 * y |
| 0 * x + 1 * y |
=
More generally, the figure can be stretched in either direction, or
both simultaneously:
Translating — moving a figure from one place in the grid to
another — is a separate operation. Conceptually, it's just a matter
of adding the x offset and the y offset to each point in the polygon. To
make this matrix friendly, though, matrix graphics add a third coordinate,
which is always just a 1. So the five-pointed star from above becomes:
In this way, the translation matrix can be implemented as a 3x3 matrix in the
form:
=
| 1 * x + 0 * y + 1 * tx |
| 0 * x + 1 * y + 1 * ty |
| 0 * x + 0 * y + 1 * 1 |
=
So far, all I've done is overcomplicate things. For simple scaling
operations, I've replaced two multiplications with four multiplication
and two add operations, and actually slowed things down a teeny fraction.
Translation, which was just a couple of adds, is now nine multiplications
and six adds! The real benefit comes from the combinability of
the transformation matrices. The star in figure 1 is centered on the origin;
this isn't particularly useful in a computer graphics scenario, since the
origin is the upper-left corner of the screen. To display this star, you
want to translate it to screen coordinates. But once you do, it's
no longer stretchable; simple scaling operations (multiplying the x and y
coordinates by the scaling factors) occur relative to the origin. So, if
the star was in the upper right as in figure 3:
Figure 3: Five-pointed star, centered at (50,50)
and you tried to apply a scale by 2 operation to it by simply multiplying
each component by 2, you'd end up with figure 4:
Figure 4: Scaling gone wrong
The star has, indeed, double its size, but it's also moved twice as far
away from the origin — not necessarily what you wanted (and if that
was what you wanted, you probably wanted more control over it anyway).
Working out how to double the size of the star while keeping it in place
is a daunting task; first, you need to find the centroid of the polygon,
and then you need to halve each x and y coordinate less than the centroid,
but double each x and y coordinate greater than it. A conceptually much
simpler approach is to move the star to the origin, apply a common scale
factor, and then move it back where it came from.
Still; that's a lot of mathematical operations. This is where the matrix
format earns its keep. If you represent the translation and scale operations as
matrices, you can "pre-multiply" them into a single uber matrix that can
be applied as one to each point in the polygon. So (working right-to-left):
| (sx * 1 + 0 * 0 + 0 * 0) |
(sx * 0 + 0 * 1 + 0 * 0) |
(sx * -cx + 0 * -cy + 0 * 1) |
| (0 * 1 + sy * 0 + 0 * 0) |
(0 * 0 + sy * 1 + 0 * 0) |
(0 * -cx + sy * -cy + 0 * 1) |
| (0 * 1 + 0 * 0 + 1 * 0) |
(0 * 0 + 0 * 1 + 1 * 0) |
(0 * -cx + 0 * -cy + 1 * 1) |
| sx |
0 |
(sx * -cx) |
| 0 |
sy |
(sy * -cy) |
| 0 |
0 |
1 |
| (1 * sx + 0 * 0 + cx * 0) |
(1 * 0 + 0 * sy + cx * 0) |
(1 * (sx * -cx) + 0 * (sy * -cy) + cx * 1) |
| (0 * sx + 1 * 0 + cy * 0) |
(0 * 0 + 1 * sy + cy * 0) |
(0 * (sx * -cx) + 1 * (sy * -cy) + cy * 1) |
| (0 * sx + 0 * 0 + 1 * 0) |
(0 * 0 + 0 * sy + 1 * 0) |
(0 * (sx * -cx) + 0 * (sy * -cy) + 1 * 1) |
| sx |
0 |
(sx * -cx) + cx |
| 0 |
sy |
(sy * -cy) + cy |
| 0 |
0 |
1 |
Where (cx,cy) is the centroid of the figure
and sx and sy are the scale factors
to apply. Of course, you don't need to work out the algebra like I did here;
your GPU does all of
that for you. You just plug in the three matrices and apply them. So,
given the polygon in figure 3 with centroid (50, 50), to double it,
you'd multiply the matrix:
| 2 |
0 |
(2 * -50) + 50 |
| 0 |
2 |
(2 * -50) + 50 |
| 0 |
0 |
1 |
=
Applied to each point in the figure gives you figure 5:
Figure 5: Scaling done correctly
Click the display to see each suboperation, but recognize that the end
result is computed all at once.
Where matrix representation really derives its power — and what
I promised to talk about at the beginning of this post — is when you
start rotating shapes around arbitrary points. Since you can translate
and "untranslate" to and from the origin, all you have to figure out is
how to represent rotation about the origin in matrix form, and you can
rotate around any arbitrary point in the figure.
As you can probably guess, this involves trigonometry. But how, exactly?
The sine of an angle is the ratio of its opposite side to its hypotenuse;
the y coordinate is the height of the opposite side. So, to rotate a point
(x,y) about the origin by an angle of θ, you could find the
distance of the point from the center using the pythagorean theorem, divide
its y coordinate by that distance, compute the arcsine of that ratio, add
the desired angle to it, and then multiply back out by the distance:
d = √x2+y2
y = d * sin( arcsin( y / d ) + θ )
x = d * cos( arccos( x / d ) + θ )
This would work, but would be very computationally expensive — square
roots and trignometric formulas are slow, and you'd have to apply this
formula individually to each point. A better, faster (and, of course,
more matrix-friendly) way to do this takes advantage of the
trigonometric identities:
cos( α + β ) = cos α cos β - sin α sin β
sin( α + β ) = sin α cos β + cos α sin β
If you scale the polygon so that it's centered around the origin, then
the known values (x,y) = (cos α, sin α), so this
reduces to:
cos( α + β ) = x cos β - y sin β
sin( α + β ) = y cos β + x sin β
So, to rotate a point (x,y) about the origin by angle θ, you
can apply the rotation matrix:
=
| (x cos θ - y sin θ) |
| (x sin θ + y cos θ) |
The beauty of this is that the matrix itself is a constant; cos θ and
sin θ only need to be computed once per rotation and the multiplied
through each point. And, since it's a matrix, it can be combines with other
arbitrary scale and translation operations into a single point operation;
if you want to rotate about a point other than the origin, apply a transformation
matrix so that that point becomes the origin, rotate, and "untranslate" just
as you did in the case of scaling.
This extends out to three dimensions, too — this is how 3D graphics
represent virtual cameras. Figure 6 shows the original star from figure 1
rotated through 30, 45 & 60 degrees about the origin and then the offset
star from figure 3 rotated by the same angles, again about the origin and
finally figure 3 rotated about its lower-left corner (click to see each
step).
Figure 6: Rotation about various points
As you see, the rotation matrices vary by angle when the rotation point
is the origin, but they vary also by rotation point when it isn't. It
would have been extremely difficult to compute the correct translation
values "by hand", but matrix representation makes it simple to calculate.
Add a comment:
Completely off-topic or spam comments will be removed at the discretion of the moderator.