## Monday, December 13, 2010

### [Basic] Cross-Entropy Criterion

Cross-Entropy Criterion is actually the Kullback Leibler Divergence.

KL divergence is a non-symmetric measure of the difference between two probability distributions P and Q. KL measures the expected number of extra bits required to code samples from P when using a code based on Q, rather than using a code based on P. Thus, P is the true distribution and Q is the estimated distribution. Typically P represent the "true" distribution of data, observations, or a precise calculated theoretical distribution. The measure Q typically represent a theory, model, description, or approximation of P.

Although it is often intuited as a distance metric, the KL divergence is not a true metric - for example, it's not symmetric: the KL from P to Q is not necessarily the same as the KL from  Q to P.

For probability distributions P and Q of a discrete random variable their KL divergence is defined to be:
D_KL(P||Q)=\sum{ P(i) log[ P(i)/Q(i) ] }

In words, it is the average of the logarithmic difference between the probabilities P and Q, where the average is taken using the probabilities P. The KL divergence is only defined if P and Q both sum to 1 and if Q(i) > 0 for any i such that P(i) > 0. If the quantity 0 log 0 appears in the formula, it is interpreted as zero.

The attached document is the explanation for KL divergence from wikipedia.