Definition:
For a function
For a function
Identities
Note that for the inverse function theorem to apply, the gradient must be invertible.
Usage
The above convention (that taking derivatives produces column vectors) is the one most often encountered in the statistics literature. However, this convention is arbitrary and it is also possible to adopt the alternative convention that derivative produce row vectors. Most authors stick to one convention and use the terms “gradient” and “derivative” interchangeably, although some authors reserve “gradient” specifically for the above convention and “derivative” for the alternate convention.