11.5. Softmax

11.5.1. Example Architecture

This figure shows a generic classification network, and how the softmax is likeley to be used.

11.5.1.1. API

Softmax activation as node abstraction.

class fhez.nn.activation.softmax.Softmax

Softmax activation, normalising sum of inputs to 1, as probability.

backward(gradient: numpy.ndarray)

Calculate the soft maximum derivative with respect to each input.

\[\begin{split}\frac{d\textit{SMAX(a)}}{da_i} = \begin{cases} \hat{p(y_i)} (1 - \hat{p(y_i)}), & \text{if}\ c=i \\ -\hat{p(y_c)} * \hat{p(y_i)}, & \text{otherwise} \end{cases}\end{split}\]

where: \(c\) is the one hot encoded index of the correct/ true classification, and \(i\) is the current index for the current classification.

property cost: Get computational cost of this activation.

forward(x: numpy.ndarray)

Calculate the soft maximum of some input \(x\).

\(\hat{p(y_i)} = \frac{e^{a_i}}{\sum_{j=0}^{C-1}e^{a_j}}\)

where: \(C\) is the number of classes, and \(i\) is the current class being processed.

update(): Update parameters, so nothing for softmax.

updates(): Update parameters using average of gradients so none for softmax.