11.4. Sigmoid & Approximation

Warning

This activation function has an asymptote to negative \(y\) infinity and positive y infinity outside of a very small safe band of input \(x\) values. This will cause nan and extremely large numbers if you aren’t especially careful and keep all values passed into this activation function within the range -4 to 4 which is its golden range. Think especially carefully of your initial weights, and whether or not they will exceed this band into the danger zone. See: \sigma_a(x)

To be able to use (fully homomorphically encrypted) cyphertexts with deep learning we need to ensure our activations functions are abelian compatible operations, polynomials. Sigmoid (1) is not a polynomial, thus we approximate (3). Similarly since we used an approximation for the forward activations we use a derivative of the sigmoid approximation (4) for the backward pass to calculate the local gradient in hopes of descending towards the global optimum (gradient descent).

11.4.1. Sigmoid \(\sigma(x)\)

11.4.1.1. \(\sigma(x)\)

(1) Sigmoid

(1)\[\sigma(x)=\frac{1}{1+e^{-x}}\]

Graph of sigmoid plotted on 2d axes

11.4.1.2. \(\frac{d\sigma(x)}{dx}\)

(2) Sigmoid derivative (Andrej Karpathy CS231n lecture)

(2)\[\frac{d\sigma(x)}{dx} = \frac{e^{-x}}{(1+e^{-x})^2} = (\frac{1+e^{-x}-1}{1+e^{-x}})(\frac{1}{1+e^{-x}}) = (1-\sigma(x))\sigma(x)\]

Graph of sigmoids derivative plotted on 2d axes, showing a flat line at y=0 with a slight bump near x=0

11.4.2. Sigmoid-Approximation \(\sigma_a(x)\)

11.4.2.1. \(\sigma_a(x)\)

(3) Sigmoid-approximation

(3)\[\sigma(x) \approx \sigma_a(x) = 0.5 + 0.197x + -0.004x^3, where\ x \in \{4 > x > -4 \subset \R \}\]

Graph of sigmoid-approximation plotted on 2d axes, where there is overlap between the range -4 and 4 but significant divergence outside this range

11.4.2.2. \(\frac{d\sigma_a(x)}{dx}\)

(4) Sigmoid-approximation derivative

(4)\[\frac{d\sigma(x)}{dx} \approx \frac{d\sigma_a(x)}{dx} = 0.0 + 0.197 + (-0.004*3)x^2 = 0.197 + -0.012x^2, where\ x \in \{4 > x > -4 \subset \R \}\]

Graph of sigmoid-apprximation derivative plotted on 2d axes showing significant overlap with the normal sigmoid derivative, within the range -4 to 4 =x where the normal derivative "bumps", but this extends down to negative infinity quickly after on both sides

11.4.3. Sigmoid API

Note

You may see some or no content at all on this page that documents the API. If that is the case please build the documentation locally (Docker Build), or view this documentation using a “autodoc”-ed version of this documentation (see: Documentation Variations).

class fhez.nn.activation.sigmoid.Sigmoid

Sigmoid approximation.

backward(gradient: numpy.array)

Calculate gradient of sigmoid with respect to input x.

property cost

Get computational depth of this node.

forward(x)

Calculate sigmoid approximation while minimising depth.

sigmoid(x)

Calculate standard sigmoid activation.

update()

Update nothing, as sigmoid has no parameters.

updates()

Update nothing, as sigmoid has no parameters.