The Kullback-Leibler divergence (or relative entropy) measures the distance between two probability distributions. Specifically, the distance from distribution \(F\) to distribution \(G\) for two distributions with on the same support is

\[\as{ \KL(F|G) &= \Ex_F(\log f - \log g) \\ &= \int \left\{\log f(x) - \log g(x)\right\} f(x) dx }\]

provided the distributions have the same support. Note that this distance is not symmetric; it treats \(F\) as the reference distribution in the sense of taking expectation with respect to \(F\). For this reason, some authors prefer the term “divergence” to “distance”.

Important properties:

Remark on support: In the above definition, we required F and G to have the same support; if this is not the case, the KL divergence is ambiguous as it involves \(\log(0)\). In such situations, we define the KL integrand to be \(+\infty\) if \(g(x)=0, f(x)>0\) and 0 if \(f(x)=0\). In other words, the divergence between two distributions is infinite if \(F\) has positive probability over any set where \(G\) does not. In practice, this tends to limit the utility of KL divergence and other measures such as the Wasserstein distance are typically more informative in such situations.