This article describes the row-orientation or “derivative” convention; see here for the column-orientation or “gradient” convention most often used in statistics.
Definition:
For a function
For a function
Identities
Note that for the inverse function theorem to apply, the derivative must be invertible
Relation to gradient form
The derivatives given above are the transposes of the gradients defined here:
Most authors stick to one convention and use the terms “gradient” and “derivative” interchangeably, although some authors reserve “derivative” specifically for the above convention and “gradient” for the column-oriented convention.