This article describes the row-orientation or “derivative” convention; see here for the column-orientation or “gradient” convention most often used in statistics.

Definition: For a function f:RdR, its derivative is the 1×d row vector

f˙(x)=[fx1fxd].

For a function f:RdRk, its derivative is the k×d matrix with ijth element

f˙(x)ij=fi(x)xj.

Identities

Inner product:Dx(Ax)=AQuadratic form:Dx(xAx)=x(A+A)Chain rule:Dxf(y)=DyfDxyProduct rule:D(fg)=gf˙+fg˙Inverse function theorem:Dxy=(Dyx)1

Note that for the inverse function theorem to apply, the derivative must be invertible

Relation to gradient form

The derivatives given above are the transposes of the gradients defined here:

f(x)=f˙(x).

Most authors stick to one convention and use the terms “gradient” and “derivative” interchangeably, although some authors reserve “derivative” specifically for the above convention and “gradient” for the column-oriented convention.