Why Use Cross Entropy in Classification Task?

2021-10-16

366 words 2 mins read 7 times read

In classification tasks, the de facto loss to use is the cross entropy loss.

Suppose that we have 10 classes, we would like the network to predict the probabilities of current sample belonging to each of the 10 classes. However, the raw output from from a neural network is just floating point values. So the softmax function is used to normalize the output to fall in the range (0,1).

After softmax, all output values are between 0 and 1 and their sum is 1. So the output now can be considered as the probability distribution over the predicted classes. The element with largest probability is the predicted class.

Now, suppose we have a batch of N data samples and their class labels, from the point of maximum likelihood estimation (or MLE in short), we want to find the parameters of the neural network that can maximize the product of probabilities that each sample get in their ground truth class.

For example, if we have data sample x1, x2 and x3, and their class label is 1, 3, 5. Then we want to find network parameters that can maximize p{11} * p{23} * p{35}.

Then according to theory of MLE, we need to do derivative stuff and find the parameter. However, the multiplication form is not suitable for calculating derivatives. That is why log() function used in this.

Why log function? Because:

(1) log function is monotonic. Or our objective is the same. Maximizing the old objective is equivalent to maximizing the new objective. (2) log(x * y) = log(x) + log(y), so that we can greatly simplify calculation of derivatives.

Another question, why do we use the minus sign?

Because in machine learning, we always talk about minimizing the loss/cost, which is equivalent to maximize log likelihood. It is just a convention. By adding a minus sign, we transform the initial problem of maximizing likelihood to minimize the new loss function. They are essentially the same, but loss function is a more familiar jargon to machine learning practitioners.

The cross entropy loss is also called log loss.

To be continued…

References

Author jdhao

LastMod 2021-10-25

License CC BY-NC-ND 4.0

Reward

loss

Scheduling Your Tasks with Package Apscheduler

Setting up Yasnippet for Emacs

Why Use Cross Entropy in Classification Task?

Why Use Cross Entropy in Classification Task?

References

Recommend

SQL DELETE Statement (Transact SQL)

SQL MERGE Statement (Transact SQL)

What is the Difference between TRUNCATE and DELETE?

GitHub - craftzdog/craftzdog-homepage: My homepage

GitHub - jessfraz/dockerfiles: Various Dockerfiles I use on the desktop and on s...

GitHub - oceanbase/oceanbase: OceanBase is an enterprise distributed relational...

GitHub - JasonkayZK/rust-learn at feature-phantom

GitHub - misterokaygo/MapAssist

Docker原理实战-1：Namespace

Docker原理实战-3：UnionFS

About Joyk