Visualizing Gradient Descent and its types

Gradient descent is one of the most important optimization technique. The basic idea behind gradient descent is to reduce the error by increasing the learning rate slowly and by updating the parameters. So the main aim here is to reach the global minima and minimize the error as much as possible

Picture taken from Wikipedia

The above picture shows how the gradient descent technique tries to minimize the error by increasing the learning rate slowly and reaching to that global minima. There are 3 types of gradient descent algorithms

Batch Gradient Descent :

Observe the above figure carefully. The graph shows it takes almost 100 epochs to reach the global minima or to minimize the error. As batch gradient descent updates weights after the completion of an epoch, it takes more epochs to reduce the error.

On the left side, you can observe ground truth & model. Ground truth represents the data points of the dataset and model updates its weights to get the best fit line. Weights and iterations shows how the weights (a0, a1) get updated and how the loss gets to minima. Here the model forms the best fit line very smoothly

If the dataset is too huge, then batch gradient descent requires too much of computation power.

Mini Batch Gradient Descent :

Observe the above figure carefully. The graph shows it takes around 10 epochs to reach the global minima or to minimize the error. As mini batch gradient descent updates weights after iterating through a batch of data points, it takes less epochs to reduce the error.

On the left side, you can observe ground truth & model. Ground truth represents the data points of the dataset and model updates its weights to get the best fit line. Weights and iterations shows how the weights (a0, a1) get updated and how the loss gets to minima.

Here the smoothness of the model has reduced as it is not updating the weights after iterating through all data points, instead it updates the weights after iterating through the batch of the data.

Stochastic Gradient Descent :

Observe the above figure carefully. The graph shows that it reaches the global minima in just 1 or 2 epochs. As stochastic gradient descent updates weights after iterating through every data point. Even the graph is not linear, it fluctuates a lot as it updates weights after every data point.

On the left side, you can observe ground truth & model. Ground truth represents the data points of the dataset and model updates its weights to get the best fit line. Weights and iterations shows how the weights (a0, a1) get updated and how the loss gets to minima. The smoothness of the model while forming the best fit line has reduced when compared to other 2 techniques.

If the dataset is too huge, then stochastic gradient descent requires less computation power.

Self motivated writer, Loves reading anything. Interested in exploring new technologies like DL, ML, AI. Writing is my passion and I would never leave it