10.1. Fully Connected Dense Net (ANN)

Here ANN shall mean a fully connected/ dense neuron. Usually these are depicted similar to the following:

We however want to keep using a computational graph style. This computational graph style is not as neat as the traditional style, however we find it much more helpful when doing things like visually inspecting operations that are important to account for in FHE, as each individual operation comes at a cost, and sometimes it can be difficult to see how many steps are involved in the traditional style depictions. Follows is out computational graph variant of the previous neuron:

We can then expand these computational graphs to show en-mass operations. This is even more helpful as now we can see how the data comes in together, and how each multi-dimensional matrix accrues the same operations upon it. This is important as we do not encrypt individual values by themselves. Instead they are encoded into a polynomial and that polynomial is then encrypted. Please keep in mind the Commuted-Sum.

10.1.1. ANN Equations

Thankfully there needs not be any approximation in an ANN ignoring the activation function. Thus our ANN can be largely unchanged compared to standard implementations, both being polynomials (excluding \(g\))

10.1.1.1. ANN

(1) ANN: There is little unique about our ANN with the exception of the application of the bias.

Normal ANN equation (not compatible with our representations, where \(w_0\) is actually the bias):

(1)\[a = g(\sum_{i=1}^{T_x}(w_ix_i)+w_0)\]

Our ANN implementation (2) slightly differs to this (1), to handle the Commuted-Sum problem is as follows but note how the bias is divided by \(N\) which in normal scenarios is simply 1 since it would be a single value, whereas in scenarios where an input \(x\) is an un-summable cyphertext holding a multi-dimensional array, \(b/N\) serves to counteract broadcasting of values keeping activations in the golden range for our activation function:

(2)\[a^{(i)} = g(\sum_{t=0}^{T_x-1}(w^{<t>}x^{(i)<t>})+b/N)\]

10.1.1.2. ANN Derivatives

The derivative of an ANN (\(f\)) with respect to the bias \(b\):

(3)\[\frac{df}{db} = 1 \frac{dg}{dx}\]

The derivative of an ANN (\(f\)) with respect to the weights \(w\):

(4)\[\frac{df}{dw^{<t>}} = x^{(i)<t>} \frac{dg}{dx}\]

The derivative of a ANN (\(f\)) with respect to the input \(x\):

(5)\[\frac{df}{dx^{(i)<t>}} = w^{<t>} \frac{dg}{dx}\]

Note

where:

\(x\):
- \(x^{(i)}\) ; the multidimensional-input array used as the \(i\)’th training example / pass of the network. E.G cnn.forward is one whole forward pass.
- \(x_{n}^{(i)<t>}\) ; The \(n\)’th input value of the multi-dimensional input array \(x^{(i)}\). Corresponding to the \(i\)’th training example of the network, and branch/ time-step \(t\).
\(T_x\) and \(t\):
- \(T_x\) ; The total number of branches per input array \(x\). No need for \(T_x^{(i)}\) as branches should be the same every time.
- \(t\) ; The current (relative)/ \(t\)’th timestep/ branch.
\(N\) and \(n\):
- \(N\); the total number of elements in any individual multi-dimensional input array \(x\)
- \(n\); the \(n\)’th input element any individual multi-dimensional input array \(x\), e.g \(x_n\) is the \(n\)’th value \(x\) in the multi-dimensional array \(x\).
\(g\) and \(a\)
- \(g\); some activation function e.g \(\sigma_a\) (see:\sigma_a(x))
- \(a\); the sum of output / activation of this neural network (if the last network then \(a=\hat{y}\))
\(y\) and \(\hat{y}:\)
- \(y\); the (normalized) ground-truth / observed outcome
- \(\hat{y}\); the (normalized) prediction of \(y\)
\(w\), \(k\), and \(b\):
- \(w\); a weight
- \(b\); a bias
- \(k\); a kernel that multiplies over some input data, for us this is the Kernel-Masquerade

Please also note, this is with respect to each network. One networks output activation \(a\) might be another networks input \(x\)

10.1.2. ANN API

Artificial Neural Network (ANN) as node abstraction.

class fhez.nn.layer.ann.ANN(weights: Optional[numpy.array] = None, bias: Optional[int] = None)

Dense artificial neural network as computational graph.

property b: Shorthand for bias.

backward(gradient): Compute backward pass of neural network.

\[ \begin{align}\begin{aligned}\frac{df}{db} = 1 \frac{dg}{dx}\\\frac{df}{dw^{<t>}} = x^{(i)<t>} \frac{dg}{dx}\\\frac{df}{dx^{(i)<t>}} = w^{<t>} \frac{dg}{dx}\end{aligned}\end{align} \]

property bias: Get ANN sum of products bias.

property cost: Get no cost of a this node.

forward(x): Compute forward pass of neural network.

\[a^{(i)} = \sum_{t=0}^{T_x-1}(w^{<t>}x^{(i)<t>})+b\]

update(): Update weights and bias of the network stocastically.

updates(): Update weights and bias as one batch all together.

property w: Shorthand for weights.

property weights: Get the current weights.