Definition: For a function $f: \real^d \mapsto \real$, its derivative is the $d \times 1$ column vector

\[\pf{f}{\x} = \left[\tfrac{\partial f}{\partial x_1} \cdots \tfrac{\partial f}{\partial x_d} \right] \Tr.\]

This quantity is also known as the gradient of $f$, and written $\nabla f(\x)$.

Definition: For a function $\f: \real^d \mapsto \real^k$, its derivative, $\partial{\f}/\partial{\x}$, is the $d \times k$ matrix with $ij$th element

\[\left[\pf{\f}{\x}\right]_{ij} = \pf{f_j(\x)}{x_i}.\]

This is also known as the Jacobian, or Jacobian matrix (the term “gradient” is specific to vectors). The gradient notation is still used, however, in the specific context of taking second derivatives: $\nabla^2 f(\x)$ is the matrix containing all second-order partial derivatives of a scalar-valued $f$ with respect to the vector $\x$, also known as the Hessian or Hessian matrix. Note that this matrix is symmetric since the order in which one takes partial derivatives doesn’t matter (this result is known as Clairaut’s Theorem).

Identities

\[\begin{alignat*}{2} &\text{Inner product:} \hspace{6em} & \pf{\A \Tr \x}{\x} &= \A \\ &\text{Quadratic form:} & \pf{\x \Tr \A \x}{\x} &= (\A + \A \Tr)\x\\ &\text{Chain rule:} & \pf{\f}{\x} &= \pf{\y}{\x} \pf{\f}{\y} \\ &\text{Product rule:} & \pf{\f \Tr \g}{\x} &= \pf{\f}{\x} \g + \pf{\g}{\x} \f \\ &\text{Inverse function theorem:} & \pf{\y}{\x} &= \left( \pf{\x}{\y} \right)^{-1} \end{alignat*}\]

Note that for the inverse function theorem to apply, the Jacobian must be invertible.

Usage

The above convention (that taking derivatives produces column vectors) is the most common convention, especially in statistics and mathematics. However, this convention is arbitrary and it is also possible to adopt the alternative convention that derivative produces row vectors.