9.5. Softmax
9.5.1. Example Architecture
This figure shows a generic classification network, and how the softmax is likeley to be used.
9.5.1.1. API
Softmax activation as node abstraction.
- class fhez.nn.activation.softmax.Softmax
Softmax activation, normalising sum of inputs to 1, as probability.
- backward(gradient: numpy.ndarray)
Calculate the soft maximum derivative with respect to each input.
\[\begin{split}\frac{d\textit{SMAX(a)}}{da_i} = \begin{cases} \hat{p(y_i)} (1 - \hat{p(y_i)}), & \text{if}\ c=i \\ -\hat{p(y_c)} * \hat{p(y_i)}, & \text{otherwise} \end{cases}\end{split}\]where: \(c\) is the one hot encoded index of the correct/ true classification, and \(i\) is the current index for the current classification.
- property cost
Get computational cost of this activation.
- forward(x: numpy.ndarray)
Calculate the soft maximum of some input \(x\).
\(\hat{p(y_i)} = \frac{e^{a_i}}{\sum_{j=0}^{C-1}e^{a_j}}\)
where: \(C\) is the number of classes, and \(i\) is the current class being processed.
- update()
Update parameters, so nothing for softmax.
- updates()
Update parameters using average of gradients so none for softmax.