12.1. Adam

Our implementation API:

class fhez.nn.optimiser.adam.Adam(alpha: float = 0.001, beta_1: float = 0.9, beta_2: float = 0.999, epsilon: float = 1e-08)

Adaptive moment optimiser abstraction.

Sources:

property alpha

Get learning rate hyperparameter.

Returns: alpha \(\alpha\), defaults to \(0.001\)
Return type: float

property beta_1

Get first order moment exponential decay rate.

Returns: beta_1 \(\beta_1\), defaults to \(0.9\)
Return type: float

property beta_2

Get second order moment exponential decay rate.

Returns: beta_2 \(\beta_2\), defaults to \(0.999\)
Return type: float

property cache

Cache of iteration specific values.

This cache is a dictionary of keys (the parameter name) and values (the parameter specific variables). For example in this cache you can expect to get the previous iterations moment, and number of iterations.

property epsilon

Get epsilon.

Returns: epsilon \(\epsilon\) (not \(\varepsilon\)), defaults to \(1e^{-8}\)
Return type: float

momentum(gradient: float, param_name: str, ord: int = 1)

Calculate momentum, of a single parameter-category/ name.

This function can calculate either 1st order momentum or 2nd order momentum (rmsprop) since they are both almost identical.

where moment is 1 (I.E first order):

current moment \(m_t = \beta_1 * m_{t-1} + (1-\beta_1) * g_t\)
decayed moment \(\hat{m_t} = \frac{m_t}{1 – \beta_1^t}\)

where moment is 2 (I.E second order/ RMSprop):

current moment \(v_t = \beta_2 * v_{t-1} + (1-\beta_2) * g_t^2\)
decayed moment \(\hat{v_t} = \frac{v_t}{1 – \beta_2^t}\)

Steps taken:

retrieve previous momentum from cache dictionary using key (param_name) and number of iterations
calculate current momentum using previous momentum:
Save current momentum into cache dictionary using key
calculate current momentum correction/ decay:
return decayed momentum

Parameters

gradient (float) – gradient at current timestep, usually minibatch
param_name (str) – key used to look up parameters in m_t dictionary
ord (int) – the order of momentum to calculate defaults to 1

Returns

\(\hat{m_t}\) corrected/ averaged momentum of order ord

Return type

float

Example

Adam().momentum(gradient=100, param_name=”w”, ord=1)

optimise(parms: dict, grads: dict)

Update given params based on gradients using Adam.

Params and grads keys are expected to be x and dfdx respectiveley. They should match although the x in this case should re replaced by any uniquely identifying string sequence.

Parameters

parms (dict[str, float]) – Dictionary of keys (param name), values (param value)
grads (dict[str, float]) – Dictionary of keys (param name), values (param gradient)

Returns

Dictionary of keys (param name), values (proposed new value)

Return type

dict[str, float]

Example

Adam().optimise({“b”: 1},{“dfdb”: 200})

rmsprop(gradient: float, param_name: str): Get second order momentum.

property schema

Get Marshmallow schema representation of this class.

Marshmallow schemas allow for easy and trustworthy serialisation and deserialisation of arbitrary objects either to inbulit types or json formats. This is an inherited member of the abstract class Serialise.

Note

Anything not listed here will inevitably be lost, ensure anything important is identified and expressley stated its type and structure.