11.3. ReLU & Approximation

Warning

This activation function has an asymptote to \(y\) infinity outside of a very small safe band of input \(x\) values. This will cause nan and extremely large numbers if you aren’t especially careful and keep all values passed into this activation function within the range -q to q which is its golden range, which can also be learned with backpropagation. Think especially carefully of your initial weights, and whether or not they will exceed this band into the danger zone. See: R_a(x)

To be able to use (fully homomorphically encrypted) cyphertexts with deep learning we need to ensure our activations functions are abelian compatible operations, polynomials. relu (1) is not a polynomial, thus we approximate (3). Similarly since we used an approximation for the forward activations we use a derivative of the relu approximation (4) for the backward pass to calculate the local gradient in hopes of descending towards the global optimum (gradient descent).

11.3.1. ReLU \(R(x)\)

11.3.1.1. \(R(x)\)

(1) Relu

(1)\[R(x)=\text{max}(0,x)\]

Graph of relu plotted on 2d axes

11.3.1.2. \(\frac{dR(x)}{dx}\)

(2)

(2)\[\begin{split}\frac{dR(x)}{dx} = \begin{cases} 1, & \text{if}\ x>0 \\ 0, & \text{otherwise} \end{cases}\end{split}\]

Graph of relus derivative plotted on 2d axes, showing a flat line at y=0 with a slight bump near x=0

11.3.2. ReLU-Approximation \(R_a(x)\)

11.3.2.1. \(R_a(x)\)

(3) relu-approximation

(3)\[R(x) \approx R_a(x) = \frac{4}{3\pi q}x^2 + \frac{1}{2}x + \frac{q}{3\pi}, where\ x \in \{q > x > -q \subset \R \}\]

where q is 1:

Graph of relu-approximation plotted on 2d axes, where the approximation range is -1,1

where q is 2

Graph of relu-approximation plotted on 2d axes, where the approximation range is -2,2

11.3.2.2. \(\frac{dR_a(x)}{dx}\)

(4) relu-approximation derivative

(4)\[\frac{dR(x)}{dx} \approx \frac{dR_a(x)}{dx} = \frac{8}{3\pi q}x + \frac{1}{2}, where\ x \in \{q > x > -q \subset \R \}\]

Graph of relu-apprximation derivative plotted on 2d axes showing significant overlap with the normal relu derivative, within the range -4 to 4 =x where the normal derivative "bumps", but this extends down to negative infinity quickly after on both sides

11.3.3. ReLU Approximate API

Note

You may see some or no content at all on this page that documents the API. If that is the case please build the documentation locally (Docker Build), or view this documentation using a “autodoc”-ed version of this documentation (see: Documentation Variations).

class fhez.nn.activation.relu.RELU(q=None)

Rectified Liniar Unit (ReLU) approximation computational graph node.

backward(gradient)

Calculate backward pass for singular example.

property cost

Get the computational cost of traversing to this RELU node.

forward(x)

Calculate forward pass for singular example.

local_dfdq(x, q)

Calculate local derivative dfdq.

local_dfdx(x, q)

Calculate local derivative dfdx.

property q

Get the current ReLU approximation range.

update()

Update node state/ weights for a single example.

updates()

Update node state/ weights for multiple examples simultaneously.