This article explains where the camera View Matrix comes from in computer graphics and addresses a common pain point: why camera transformation is equivalent to applying the inverse transformation to the scene. It connects three main threads—2D rotation, homogeneous coordinates, and the 3D camera coordinate system. Keywords: View Matrix, homogeneous coordinates, orthogonal matrices.
Technical specifications at a glance
| Parameter | Details |
|---|---|
| Domain | Computer Graphics / Linear Algebra / Rendering Math |
| Core topics | View Matrix, rotation matrices, homogeneous coordinates |
| Applicable languages | C++ / Python / GLSL concepts apply universally |
| Related license | Original article declares CC 4.0 BY-SA |
| Platform info | CSDN article, approximately 907 views, 58 likes, 31 bookmarks |
| Core dependencies | Vector dot product, cross product, normalization, 4×4 matrix operations |
2D and 3D transformations share the same mathematical backbone
Before understanding the View Matrix, first lock in one fact: transformations in graphics fundamentally describe mappings between coordinate systems. 2D rotation, 3D rotation, translation, and camera transforms differ only in dimension and representation.
In 2D, the matrix for rotating around the origin by an angle θ is:
[ cosθ -sinθ ]
[ sinθ cosθ ]
This matrix represents rigid-body rotation in the plane.
The most important property of a rotation matrix is that its inverse equals its transpose. This is not a trick; it follows directly from the definition of an orthogonal matrix. That property is exactly why the rotational part of the camera transform can be derived efficiently by writing the inverse first and then taking the transpose.
Homogeneous coordinates unify translation and linear transforms
A translation cannot be represented by ordinary matrix-vector multiplication alone, so graphics introduces homogeneous coordinates. A 2D point becomes (x, y, 1), and a 3D point becomes (x, y, z, 1). With that extension, translation, scaling, and rotation all fit into one unified matrix framework.
import numpy as np
def translate(tx, ty, tz):
return np.array([
[1, 0, 0, tx], # The last column stores the translation
[0, 1, 0, ty], # Translation along the y-axis
[0, 0, 1, tz], # Translation along the z-axis
[0, 0, 0, 1] # Constant term for homogeneous coordinates
], dtype=float)
This code shows the standard form of a 3D translation matrix in 4×4 homogeneous space.
3D rotation provides the directional foundation for the View Matrix
Once we move into 3D, scaling and translation simply expand from 3×3 to 4×4 matrices. The more important change is rotation. In 3D, rotation can be expressed around the X, Y, and Z axes, or represented using Euler angles, Rodrigues’ formula, or quaternions.
For the View Matrix, you do not need to get lost in complex rotation formulas at the start. What actually matters is this: how to construct a standard orthonormal basis from the camera parameters—in other words, the camera’s own right, up, and backward directions.
Camera pose is defined by three inputs
A usable camera typically needs three quantities:
e: the camera position,eyeg: the viewing direction,gazeorlookt: the up direction,up
The goal is not to move the camera directly. Instead, we transform the entire world into a standard camera pose where the camera sits at the origin, looks toward -Z, and uses +Y as the upward direction. That is the engineering definition of the View transform.
The View Matrix can be decomposed into translation and rotation
The first step is to move the camera position e to the origin. Since we are moving the scene rather than the camera, we use -e as the translation.
import numpy as np
def view_translation(e):
ex, ey, ez = e
return np.array([
[1, 0, 0, -ex], # Translate the scene opposite to the camera position
[0, 1, 0, -ey], # Cancel the camera's y offset
[0, 0, 1, -ez], # Cancel the camera's z offset
[0, 0, 0, 1]
], dtype=float)
This step only answers where the camera is. It does not yet answer which direction the camera is facing.
The second step is to build the camera coordinate system. Define:
w = -normalize(g), the camera’s backward directionu = normalize(t × w), the camera’s right directionv = w × u, the corrected up direction
The resulting u, v, and w are mutually orthogonal and form a right-handed coordinate system.
The rotation matrix derivation relies on the transpose property of an orthonormal basis
Start by writing the inverse rotation matrix that rotates from the standard coordinate system into the camera coordinate system. Its column vectors are u, v, and w. Because it is an orthogonal matrix, the actual view rotation matrix is simply its transpose.
import numpy as np
def normalize(v):
return v / np.linalg.norm(v)
def view_rotation(g, up):
w = -normalize(np.array(g, dtype=float)) # Camera backward direction
u = normalize(np.cross(np.array(up, dtype=float), w)) # Camera right direction
v = np.cross(w, u) # Camera up direction
return np.array([
[u[0], u[1], u[2], 0], # First row corresponds to the right direction
[v[0], v[1], v[2], 0], # Second row corresponds to the up direction
[w[0], w[1], w[2], 0], # Third row corresponds to the backward direction
[0, 0, 0, 1]
], dtype=float)
This code directly constructs the view rotation matrix.
The final View Matrix is the product of two matrices
The full formula is: M_view = R_view · T_view. Note the multiplication order: apply translation first, then rotation, because under the column-vector convention matrices act from right to left.
Expanded into its common engineering form, the first three rows are u, v, and w, while the translation terms in the last column are -u·e, -v·e, and -w·e. This shows that the translation part is fundamentally the projection of the camera position onto the new basis.
def look_at(e, g, up):
T = view_translation(e)
R = view_rotation(g, up)
return R @ T # Apply translation first, then rotation, to build the View Matrix
This code gives the cleanest LookAt-style implementation skeleton.
The diagram visually highlights the mathematical convergence point of the lesson
AI Visual Insight: This diagram serves as a summary illustration of the lesson. Visually, it emphasizes the progression from “2D transformation → 3D transformation → View Matrix.” It helps readers map abstract formulas onto the camera-positioning problem inside the rendering pipeline and highlights the View Matrix as the bridge within the MVP stack.
This derivation transfers directly to OpenGL and game engines
Whether you are working with OpenGL teaching code, Unity camera math, or Unreal view transforms, the core logic remains the same: first construct the camera’s orthonormal basis, then generate the transformation matrix from world space to camera space.
If you can never remember the View Matrix formula, remember just one sentence: Fix the camera in the canonical pose, then let the world undergo the inverse translation and inverse rotation. The formula is simply the matrix form of that statement.
FAQ
1. Why does the View Matrix transform the scene instead of “moving the camera”?
Because rendering calculations work more cleanly when every object is transformed into camera space in a unified way. Mathematically, the two approaches are equivalent, but “fix the camera and transform the world” makes it easier to build a standardized rendering pipeline.
2. Why is the fact that a rotation matrix’s inverse equals its transpose so important in View Matrix derivation?
Because directly solving for the rotation from an arbitrary camera direction to the canonical direction is not intuitive. If you first construct the inverse rotation matrix and then take its transpose, both the derivation and the implementation become much simpler.
3. Why can’t the up vector be used directly as the final camera up direction?
Because the input up vector only provides a reference upward direction and is not guaranteed to be orthogonal to gaze. You must rebuild u, v, and w using cross products to obtain a strictly orthonormal camera coordinate system.
AI Readability Summary
This article reconstructs the GAMES 101 View transform material and systematically explains 2D rotation, homogeneous coordinates, the 3D camera coordinate system, and the derivation of the View Matrix. It clarifies why the View transform is fundamentally “moving the scene in reverse” and provides matrix formulas and code that you can use directly in practice.