Example of a linear model adapted by gradient descent
(we could use least squares directly, but this algorithm can be extended for nonlinear models and does not use inverse matrix)
Fundamental gradient descent rule:
Adaptation in Epochs
When the learning rate m is chosen too small, the adaptation needs more epochs (if it can work for the data at all).
When the learning rate m is chosen too big, the adaptation becomes unstable.
μ =1 !