11.3. ReLU & Approximation

Warning

This activation function has an asymptote to \(y\) infinity outside of a very small safe band of input \(x\) values. This will cause nan and extremely large numbers if you aren’t especially careful and keep all values passed into this activation function within the range -q to q which is its golden range, which can also be learned with backpropagation. Think especially carefully of your initial weights, and whether or not they will exceed this band into the danger zone. See: R_a(x)

To be able to use (fully homomorphically encrypted) cyphertexts with deep learning we need to ensure our activations functions are abelian compatible operations, polynomials. relu (1) is not a polynomial, thus we approximate (3). Similarly since we used an approximation for the forward activations we use a derivative of the relu approximation (4) for the backward pass to calculate the local gradient in hopes of descending towards the global optimum (gradient descent).

11.3.1. ReLU \(R(x)\)

11.3.1.1. \(R(x)\)

(1) Relu

(1)\[R(x)=\text{max}(0,x)\]

11.3.1.2. \(\frac{dR(x)}{dx}\)

(2)

(2)\[\begin{split}\frac{dR(x)}{dx} = \begin{cases} 1, & \text{if}\ x>0 \\ 0, & \text{otherwise} \end{cases}\end{split}\]

11.3.2. ReLU-Approximation \(R_a(x)\)

11.3.2.1. \(R_a(x)\)

(3) relu-approximation

(3)\[R(x) \approx R_a(x) = \frac{4}{3\pi q}x^2 + \frac{1}{2}x + \frac{q}{3\pi}, where\ x \in \{q > x > -q \subset \R \}\]

where q is 1:

where q is 2

11.3.2.2. \(\frac{dR_a(x)}{dx}\)

(4) relu-approximation derivative

(4)\[\frac{dR(x)}{dx} \approx \frac{dR_a(x)}{dx} = \frac{8}{3\pi q}x + \frac{1}{2}, where\ x \in \{q > x > -q \subset \R \}\]

11.3.3. ReLU Approximate API

Note

You may see some or no content at all on this page that documents the API. If that is the case please build the documentation locally (Docker Build), or view this documentation using a “autodoc”-ed version of this documentation (see: Documentation Variations).

class fhez.nn.activation.relu.RELU(q=None)

Rectified Liniar Unit (ReLU) approximation computational graph node.

backward(gradient): Calculate backward pass for singular example.

property cost: Get the computational cost of traversing to this RELU node.

forward(x): Calculate forward pass for singular example.

local_dfdq(x, q): Calculate local derivative dfdq.

local_dfdx(x, q): Calculate local derivative dfdx.

property q: Get the current ReLU approximation range.

update(): Update node state/ weights for a single example.

updates(): Update node state/ weights for multiple examples simultaneously.