$$\begin{align*} y_1&=\frac{e^2}{e^2+e^{1}+e^{0.1}}\approx0.66\\ y_2&=\frac{e^1}{e^2+e^{1}+e^{0.1}}\approx0.24\\ y_3&=\frac{e^{0.1}}{e^2+e^{1}+e^{0.1}}\approx0.10\\ \end{align*}$$

Therefore, we have that:

$$\boldsymbol{y}=\begin{pmatrix} 0.66\\ 0.24\\ 0.10\\ \end{pmatrix}$$

Note the following:

the output of the entries sum to $1$, which means you can interpret them as probabilities.
the output of the Softmax function $\boldsymbol{y}$ is sometimes referred to as the logit.

Application to neural network

When modelling with neural networks, we often run into the Softmax function. Suppose we wanted to build a neural network that aims to classify whether the image is a cat, a dog or a bird. In such a case, we often use the Softmax function as the activation function for the final layer. The output probabilities are saying that the model is:

70% sure the image is a cat
20% sure the image is a dog
10% sure the image is a bird

If you are performing predictions only without the need of probabilities, then the Softmax function is not necessary.

Comparison with Sigmoid function

Both the Softmax and sigmoid functions map inputs to a range of 0 to 1. However, the difference is that the inputs of the sigmoid do not sum to one as probabilities should.

Implementing Softmax function using Python's NumPy

Basic implementation

We can easily implement the Softmax function as described by equation \eqref{eq:yHWjjyQou5VFcDGhpZV} using NumPy like so:


        
        
            
                
                
                    import numpy as np

def softmax(x):
    """ x: 1D NumPy array of inputs """
    return np.exp(x) / np.sum(np.exp(x))

Let's use this function to compute the Softmax of vector \eqref{eq:ei1XptgAwaMk77sju1d}:


        
        
            
                
                
                    softmax([2, 1, 0.1])
                
            
            array([0.65900114, 0.24243297, 0.09856589])

Notice how the output is identical to what we calculated by hand.

Optimised implementation

Our basic implementation of the Softmax function is based directly on the definition of the Softmax function as described by \eqref{eq:yHWjjyQou5VFcDGhpZV}:

$$y_i=\frac{e^{x_i}}{\sum^N_{j=1}e^{x_j}}$$

The problem with this implementation is that exponential functions $e^x$ quickly become large as the value of $x$ increase. For instance, consider $\exp(100)$:


        
        
            
                
                
                    np.exp(100)
                
            
            2.6881171418161356e+43

Notice how even a small input of $x=100$ would result in extremely large numbers. In fact, if we try $\exp(800)$, the value is so large that it cannot be computed:


        
        
            
                
                
                    np.exp(800)
                
            
            inf

This happens because computers represent numerical values using a fixed number of bytes (e.g. 8 bytes). The caveat is that extremely small or large numbers cannot be defined simply because there aren't enough bytes. If the number is so large that it cannot be represented using a fixed-number of bytes, then NumPy will return inf.

This limitation of our basic implementation means that large inputs will fail:


        
        
            
                
                
                    softmax([800, 500, 600])
                
            
            array([1.00000000e+000, 5.14820022e-131, 1.38389653e-087])

Here, nan stands for not-a-number, that is, the number is too large that it cannot be computed. For this reason, the basic implementation is never used in practise.

The way to overcome this limitation is to reformulate the Softmax function like so:

$$\begin{equation}\label{eq:u0SjbfEiloxYNtGxR2o} \begin{aligned}[b] y_i&=\frac{\exp(x_i)}{\sum^N_{j=1}\exp(x_j)}\\ &=\frac{C\cdot\exp(x_i)}{C\cdot\sum^N_{j=1}\exp(x_j)}\\ &=\frac{\exp(\ln(C))\cdot\exp(x_i)}{\exp(\ln(C))\cdot\sum^N_{j=1}\exp(x_j)}\\ &=\frac{\exp(\ln(C)+x_i)}{\sum^N_{j=1}\exp(\ln(C)+x_j)}\\ &=\frac{\exp(C'+x_i)}{\sum^N_{j=1}\exp(C'+x_j)}\\ \end{aligned} \end{equation}$$

Note that all we have done is multiplied the numerator and denominator by some scalar constant $C$, and hence \eqref{eq:u0SjbfEiloxYNtGxR2o} is equivalent to the original equation of the Softmax function \eqref{eq:yHWjjyQou5VFcDGhpZV}.

Let's now understand why \eqref{eq:u0SjbfEiloxYNtGxR2o} is better for numerical computation. $C'$ can be any constant value, so we can choose $C'$ such that the exponent ($C'+x_i$) is small. This is how we can avoid large uncomputable numbers.

Now, what is a good value of $C'$? If our goal is to minimize the exponent $C'+x_i$, we could set C' to be the negative maximum of our input vector x.

For instance, consider the following input vector:

$$ \boldsymbol{x}=\begin{pmatrix} 800\\500\\600\\ \end{pmatrix} $$

The negative of the maximum of $\boldsymbol{x}$ is:

$$\begin{align*} C'&=-\max(\boldsymbol{x})\\ &=-800 \end{align*}$$

From \eqref{eq:u0SjbfEiloxYNtGxR2o} we know that:

$$\begin{align*} \frac{\exp(-800+800)}{\exp(-800+800)+\exp(-800+500)+\exp(-800+600)}= \frac{\exp(0)}{\exp(0)+\exp(-300)+\exp(-200)} \end{align*}$$

Notice how we now avoid $\exp(800)$, and our exponents are much smaller!

The implementation of \eqref{eq:u0SjbfEiloxYNtGxR2o} in NumPy is as follows:


        
        
            
                
                
                    def softmax(x):
    """ x: 1D NumPy array of inputs """
    c = -np.max(x)
    x += c
    return np.exp(x) / np.sum(np.exp(x))

Now, we can use the function like so:


        
        
            
                
                
                    softmax([800, 500, 600])
                
            
            array([1.00000000e+000, 5.14820022e-131, 1.38389653e-087])

Notice how we do not have any nan this time.

Published by Isshin Inada

Edited by 0 others

Did you find this page useful?

thumb_up

thumb_down

Comment

Citation

Ask a question or leave a feedback...

thumb_up

thumb_down

chat_bubble_outline

settings

Enjoy our search

Hit / to insta-search docs and recipes!