Matrix Calculus - Layout Conventions

Layout Conventions

This section discusses the similarities and differences between notational conventions that are used in the various fields that take advantage of matrix calculus. Although there are largely two consistent conventions, some authors find it convenient to mix the two conventions in forms that are discussed below. After this section equations will be listed in both competing forms separately.

The fundamental issue is that the derivative of a vector with respect to a vector, i.e., is often written in two competing ways. If the numerator y is of size m and the denominator x of size n, then the result can be laid out as either an m×n matrix or n×m matrix, i.e. the elements of y laid out in columns and the elements of x laid out in rows, or vice-versa. This leads to the following possibilities:

Numerator layout, i.e. lay out according to y and xT (i.e. contrarily to x). This is sometimes known as the Jacobian formulation.
Denominator layout, i.e. lay out according to yT and x (i.e. contrarily to y). This is sometimes known as the Hessian formulation. Some authors term this layout the gradient, in distinction to the Jacobian (numerator layout), which is its transpose. (However, "gradient" more commonly means the derivative regardless of layout.)
A third possibility sometimes seen is to insist on writing the derivative as (i.e. the derivative is taken with respect to the transpose of x) and follow the numerator layout. This makes it possible to claim that the matrix is laid out according to both numerator and denominator. In practice this produces results the same as the numerator layout.

When handling the gradient and the opposite case we have the same issues. To be consistent, we should do one of the following:

If we choose numerator layout for we should lay out the gradient as a row vector, and as a column vector.
If we choose denominator layout for we should lay out the gradient as a column vector, and as a row vector.
In the third possibility above, we write and and use numerator layout.

Not all math textbooks and papers are consistent in this respect throughout the entire paper. That is, sometimes different conventions are used in different contexts within the same paper. For example, some choose denominator layout for gradients (laying them out as column vectors), but numerator layout for the vector-by-vector derivative

Similarly, when it comes to scalar-by-matrix derivatives and matrix-by-scalar derivatives then consistent numerator layout lays out according to Y and XT, while consistent denominator layout lays out according to YT and X. In practice, however, following a denominator layout for and laying the result out according to YT, is rarely seen because it makes for ugly formulas that do not correspond to the scalar formulas. As a result, the following layouts can often be found:

Consistent numerator layout, which lays out according to Y and according to XT.
Mixed layout, which lays out according to Y and according to X.
Use the notation with results the same as consistent numerator layout.

In the following formulas, we handle the five possible combinations and separately. We also handle cases of scalar-by-scalar derivatives that involve an intermediate vector or matrix. (This can arise, for example, if a multi-dimensional parametric curve is defined in terms of a scalar variable, and then a derivative of a scalar function of the curve is taken with respect to the scalar that parameterizes the curve.) For each of the various combinations, we give numerator-layout and denominator-layout results, except in the cases above where denominator layout rarely occurs. In cases involving matrices where it makes sense, we give numerator-layout and mixed-layout results. As noted above, cases where vector and matrix denominators are written in transpose notation are equivalent to numerator layout with the denominators written without the transpose.

Keep in mind that various authors use different combinations of numerator and denominator layouts for different types of derivatives, and there is no guarantee that an author will consistently use either numerator or denominator layout for all types. Match up the formulas below with those quoted in the source to determine the layout used for that particular type of derivative, but be careful not to assume that derivatives of other types necessarily follow the same kind of layout.

When taking derivatives with an aggregate (vector or matrix) denominator in order to find a maximum or minimum of the aggregate, it should be kept in mind that using numerator layout will produce results that are transposed with respect to the aggregate. For example, in attempting to find the maximum likelihood estimate of a multivariate normal distribution using matrix calculus, if the domain is a kx1 column vector, then the result using the numerator layout will be in the form of a 1xk row vector. Thus, either the results should be transposed at the end or the denominator layout (or mixed layout) should be used.

Result of differentiating various kinds of aggregates with other kinds of aggregates
	Scalar y		Vector y (size m)		Matrix Y (size m×n)
	Notation	Type	Notation	Type	Notation	Type
Scalar x		scalar		(numerator layout) size-m column vector (denominator layout) size-m row vector		(numerator layout) m×n matrix
Vector x (size n)		(numerator layout) size-n row vector (denominator layout) size-n column vector		(numerator layout) m×n matrix (denominator layout) n×m matrix		?
Matrix X (size p×q)		(numerator layout) q×p matrix (denominator layout) p×q matrix		?		?

The results of operations will be transposed when switching between numerator-layout and denominator-layout notation.

Read more about this topic: Matrix Calculus

Famous quotes containing the word conventions:

“It is not human nature we should accuse but the despicable conventions that pervert it.”
—Denis Diderot (1713–1784)