How the Camera View Matrix Works: A Unified Derivation from 2D Rotation to the 3D View Matrix - Devuly | Smart Analytics for Developers & Projects

This article explains where the camera View Matrix comes from in computer graphics and addresses a common pain point: why camera transformation is equivalent to applying the inverse transformation to the scene. It connects three main threads—2D rotation, homogeneous coordinates, and the 3D camera coordinate system. Keywords: View Matrix, homogeneous coordinates, orthogonal matrices.

Table of Contents

Technical specifications at a glance

Parameter	Details
Domain	Computer Graphics / Linear Algebra / Rendering Math
Core topics	View Matrix, rotation matrices, homogeneous coordinates
Applicable languages	C++ / Python / GLSL concepts apply universally
Related license	Original article declares CC 4.0 BY-SA
Platform info	CSDN article, approximately 907 views, 58 likes, 31 bookmarks
Core dependencies	Vector dot product, cross product, normalization, 4×4 matrix operations

2D and 3D transformations share the same mathematical backbone

Before understanding the View Matrix, first lock in one fact: transformations in graphics fundamentally describe mappings between coordinate systems. 2D rotation, 3D rotation, translation, and camera transforms differ only in dimension and representation.

In 2D, the matrix for rotating around the origin by an angle θ is:

[ cosθ  -sinθ ]
[ sinθ   cosθ ]

This matrix represents rigid-body rotation in the plane.

The most important property of a rotation matrix is that its inverse equals its transpose. This is not a trick; it follows directly from the definition of an orthogonal matrix. That property is exactly why the rotational part of the camera transform can be derived efficiently by writing the inverse first and then taking the transpose.

Homogeneous coordinates unify translation and linear transforms

A translation cannot be represented by ordinary matrix-vector multiplication alone, so graphics introduces homogeneous coordinates. A 2D point becomes (x, y, 1), and a 3D point becomes (x, y, z, 1). With that extension, translation, scaling, and rotation all fit into one unified matrix framework.

import numpy as np

def translate(tx, ty, tz):
    return np.array([
        [1, 0, 0, tx],  # The last column stores the translation
        [0, 1, 0, ty],  # Translation along the y-axis
        [0, 0, 1, tz],  # Translation along the z-axis
        [0, 0, 0, 1]    # Constant term for homogeneous coordinates
    ], dtype=float)

This code shows the standard form of a 3D translation matrix in 4×4 homogeneous space.

3D rotation provides the directional foundation for the View Matrix

Once we move into 3D, scaling and translation simply expand from 3×3 to 4×4 matrices. The more important change is rotation. In 3D, rotation can be expressed around the X, Y, and Z axes, or represented using Euler angles, Rodrigues’ formula, or quaternions.

For the View Matrix, you do not need to get lost in complex rotation formulas at the start. What actually matters is this: how to construct a standard orthonormal basis from the camera parameters—in other words, the camera’s own right, up, and backward directions.

Camera pose is defined by three inputs

A usable camera typically needs three quantities:

e: the camera position, eye
g: the viewing direction, gaze or look
t: the up direction, up

The goal is not to move the camera directly. Instead, we transform the entire world into a standard camera pose where the camera sits at the origin, looks toward -Z, and uses +Y as the upward direction. That is the engineering definition of the View transform.

The View Matrix can be decomposed into translation and rotation

The first step is to move the camera position e to the origin. Since we are moving the scene rather than the camera, we use -e as the translation.

import numpy as np

def view_translation(e):
    ex, ey, ez = e
    return np.array([
        [1, 0, 0, -ex],  # Translate the scene opposite to the camera position
        [0, 1, 0, -ey],  # Cancel the camera's y offset
        [0, 0, 1, -ez],  # Cancel the camera's z offset
        [0, 0, 0, 1]
    ], dtype=float)

This step only answers where the camera is. It does not yet answer which direction the camera is facing.

The second step is to build the camera coordinate system. Define:

w = -normalize(g), the camera’s backward direction
u = normalize(t × w), the camera’s right direction
v = w × u, the corrected up direction

The resulting u, v, and w are mutually orthogonal and form a right-handed coordinate system.

The rotation matrix derivation relies on the transpose property of an orthonormal basis

Start by writing the inverse rotation matrix that rotates from the standard coordinate system into the camera coordinate system. Its column vectors are u, v, and w. Because it is an orthogonal matrix, the actual view rotation matrix is simply its transpose.

import numpy as np

def normalize(v):
    return v / np.linalg.norm(v)

def view_rotation(g, up):
    w = -normalize(np.array(g, dtype=float))                # Camera backward direction
    u = normalize(np.cross(np.array(up, dtype=float), w))   # Camera right direction
    v = np.cross(w, u)                                      # Camera up direction

    return np.array([
        [u[0], u[1], u[2], 0],  # First row corresponds to the right direction
        [v[0], v[1], v[2], 0],  # Second row corresponds to the up direction
        [w[0], w[1], w[2], 0],  # Third row corresponds to the backward direction
        [0,    0,    0,    1]
    ], dtype=float)

This code directly constructs the view rotation matrix.

The final View Matrix is the product of two matrices

The full formula is: M_view = R_view · T_view. Note the multiplication order: apply translation first, then rotation, because under the column-vector convention matrices act from right to left.

Expanded into its common engineering form, the first three rows are u, v, and w, while the translation terms in the last column are -u·e, -v·e, and -w·e. This shows that the translation part is fundamentally the projection of the camera position onto the new basis.

def look_at(e, g, up):
    T = view_translation(e)
    R = view_rotation(g, up)
    return R @ T  # Apply translation first, then rotation, to build the View Matrix

This code gives the cleanest LookAt-style implementation skeleton.

The diagram visually highlights the mathematical convergence point of the lesson

Simplified GAMES 101: Where the camera View Matrix comes from AI Visual Insight: This diagram serves as a summary illustration of the lesson. Visually, it emphasizes the progression from “2D transformation → 3D transformation → View Matrix.” It helps readers map abstract formulas onto the camera-positioning problem inside the rendering pipeline and highlights the View Matrix as the bridge within the MVP stack.

This derivation transfers directly to OpenGL and game engines

Whether you are working with OpenGL teaching code, Unity camera math, or Unreal view transforms, the core logic remains the same: first construct the camera’s orthonormal basis, then generate the transformation matrix from world space to camera space.

If you can never remember the View Matrix formula, remember just one sentence: Fix the camera in the canonical pose, then let the world undergo the inverse translation and inverse rotation. The formula is simply the matrix form of that statement.

FAQ

1. Why does the View Matrix transform the scene instead of “moving the camera”?

Because rendering calculations work more cleanly when every object is transformed into camera space in a unified way. Mathematically, the two approaches are equivalent, but “fix the camera and transform the world” makes it easier to build a standardized rendering pipeline.

2. Why is the fact that a rotation matrix’s inverse equals its transpose so important in View Matrix derivation?

Because directly solving for the rotation from an arbitrary camera direction to the canonical direction is not intuitive. If you first construct the inverse rotation matrix and then take its transpose, both the derivation and the implementation become much simpler.

3. Why can’t the `up` vector be used directly as the final camera up direction?

Because the input up vector only provides a reference upward direction and is not guaranteed to be orthogonal to gaze. You must rebuild u, v, and w using cross products to obtain a strictly orthonormal camera coordinate system.

AI Readability Summary

This article reconstructs the GAMES 101 View transform material and systematically explains 2D rotation, homogeneous coordinates, the 3D camera coordinate system, and the derivation of the View Matrix. It clarifies why the View transform is fundamentally “moving the scene in reverse” and provides matrix formulas and code that you can use directly in practice.