Camera Matrix - Derivation

Derivation

The mapping from the coordinates of a 3D point P to the 2D image coordinates of the point's projection onto the image plane, according to the pinhole camera model is given by

where are the 3D coordinates of P relative to a camera centered coordinate system, are the resulting image coordinates, and f is the camera's focal length for which we assume f > 0. Furthermore, we also assume that x₃ > 0.

To derive the camera matrix this expression is rewritten in terms of homogeneous coordinates. Instead of the 2D vector we consider the projective element (a 3D vector) and instead of equality we consider equality up to scaling by a non-zero number, denoted . First, we write the homogeneous image coordinates as expressions in the usual 3D coordinates.

Finally, also the 3D coordinates are expressed in a homogeneous representation and this is how the camera matrix appears:

where is the camera matrix, which here is given by

and the corresponding camera matrix now becomes

The last step is a consequence of itself being a projective element.

The camera matrix derived here may appear trivial in the sense that it contains very few non-zero elements. This depends to a large extent on the particular coordinate systems which have been chosen for the 3D and 2D points. In practice, however, other forms of camera matrices are common, as will be shown below.

Read more about this topic: Camera Matrix