CS Notebook

1st Derivative of the Sigmoid Function

In neural networks a activation function is a function that defines a threshold that makes a node of a neural network to activate. One example of a such activation function is the sigmoid function

$$ \sigma(x) = \dfrac{1}{1 + e^{-x}}. $$
(Aggarwal, 2023; Rojas, 1996).

When training a neural network, activation function’s derivative is needed, but what it is for the sigmoid function?

Given the derivative

$$ \dfrac{d}{dx} \sigma(x) = \dfrac{d}{dx} \dfrac{1}{1 + e^{-x}}, \tag{1}. $$

it can be obvious that applying the quotient rule of derivatives (E.g. see Grossman, 1986), i.e. for a given two functions $f : \mathbb{R} \rightarrow \mathbb{R}$ and $g : \mathbb{R} \rightarrow \mathbb{R}$

$$ \dfrac{d}{dx}\left( \dfrac{f}{g} \right) = \dfrac{g(x)(df/dx) - f(x)(dg/dx)}{g^2(x)} $$

to $(1)$. Letting $f(x) = 1$ and $g(x) = 1 + e^{-x}$ the quotient rule can be written

$$ \dfrac{d}{dx} \sigma(x) = \dfrac{ (1 + e^{-x})\left( \dfrac{d}{dx}1 \right) - (1)\left( \dfrac{d}{dx} 1 + e^{-x} \right) }{(1 + e^{-x})^2}. \tag{2} $$

Because

$$ \dfrac{d}{dx} 1 = 0 $$

and

$$ \dfrac{d}{dx} 1 + e^{-x} = \dfrac{d}{dx} 1 + \dfrac{d}{dx} e^{-x} = 0 - e^{-x} = -e^{-x} \ , $$ $$ \begin{array}{r c l} \dfrac{d}{dx} 1 + e^{-x} & = & \dfrac{d}{dx} 1 + \dfrac{d}{dx} e^{-x} \\[1em] & = & 0 - e^{-x} \\[1em] & = & -e^{-x} \ , \end{array} $$

the nominator of $(2)$ can be written as

$$ \dfrac{d}{dx} \sigma(x) = \dfrac{e^{-x}}{(1 + e^{-x})^2}. $$

Continuing from the previous expression, next steps are then applying common algebraic transformations as follows;

$$ \begin{array}{r c l} \dfrac{d}{dx} \sigma(x) & = & \dfrac{0 + e^{-x}}{(1 + e^{-x})^2} \\[1em] & = & \dfrac{(1 - 1) + e^{-x}}{(1 + e^{-x})^2} \\[1em] & = & \dfrac{1 + e^{-x}}{(1 + e^{-x})^2} - \dfrac{1}{(1 + e^{-x})^2} \\[1em] & = & \dfrac{1}{1 + e^{-x}} - \dfrac{1}{(1 + e^{-x})^2} \\[1em] & = & \dfrac{1}{(1 + e^{-x})} \left( 1 - \dfrac{1}{1 + e^{-x}} \right) \ , \end{array} $$ $$ \begin{array}{r c l} \dfrac{d}{dx} \sigma(x) & = & \dfrac{0 + e^{-x}}{(1 + e^{-x})^2} \\[1em] & = & \dfrac{(1 - 1) + e^{-x}}{(1 + e^{-x})^2} \\[1em] & = & \dfrac{1 + e^{-x}}{(1 + e^{-x})^2} - \dfrac{1}{(1 + e^{-x})^2} \\[1em] & = & \dfrac{1}{1 + e^{-x}} - \dfrac{1}{(1 + e^{-x})^2} \\[1em] & = & \dfrac{1}{(1 + e^{-x})} \left( 1 - \dfrac{1}{1 + e^{-x}} \right) \ , \end{array} $$

and because

$$ \dfrac{1}{(1 + e^{-x})} = \sigma(x) \ , $$

the next to last step shows that

$$ \dfrac{d}{dx} \sigma(x) = \dfrac{1}{(1 + e^{-x})} \left( 1 - \dfrac{1}{1 + e^{-x}} \right) = \sigma(x)(1 - \sigma(x)) $$ $$ \begin{array}{r c l} \dfrac{d}{dx} \sigma(x) & = & \dfrac{1}{(1 + e^{-x})} \left( 1 - \dfrac{1}{1 + e^{-x}} \right) \\[1em] & = & \sigma(x)(1 - \sigma(x)) \end{array} $$
$\square$

References

Aggarwal, C. C. (2023). Neural Networks and Deep Learning - A Textbook. 2nd ed.
Grossman, S. I. (1986). Calculus of One Variable. 2nd ed.
Rojas, R. (1996). Neural Networks - A Systematic Introduction. Springer.