Visualizing Gradient Descent and its types

Sjatin
3 min readApr 16, 2021

Gradient descent is one of the most important optimization technique. The basic idea behind gradient descent is to reduce the error by increasing the learning rate slowly and by updating the parameters. So the main aim here is to reach the global minima and minimize the error as much as possible

Picture taken from Wikipedia

The above picture shows how the gradient descent technique tries to minimize the error by increasing the learning rate slowly and reaching to that global minima. There are 3 types of gradient descent algorithms

Batch Gradient Descent :

It is a technique where the model goes through all the data points in every epoch and then back propagate after an epoch to update the weights. So here we will consider the cumulative error from all the data points.

Observe the above figure carefully. The graph shows it takes almost 100 epochs to reach the global minima or to minimize the error. As batch gradient descent updates weights after the completion of an epoch, it takes more epochs to reduce the error.

On the left side, you can observe ground truth & model. Ground truth represents the data points of the dataset and model updates its weights to get the best fit line. Weights and iterations shows how the weights (a0, a1) get updated and how the loss gets to minima. Here the model forms the best fit line very smoothly

If the dataset is too huge, then batch gradient descent requires too much of computation power.

Mini Batch Gradient Descent :

It is a technique where model updates the weights after iterating through a batch of data points. It iterates through all the data points, batch wise randomly. So here we will be calculating error after every batch of data points.

Observe the above figure carefully. The graph shows it takes around 10 epochs to reach the global minima or to minimize the error. As mini batch gradient descent updates weights after iterating through a batch of data points, it takes less epochs to reduce the error.

On the left side, you can observe ground truth & model. Ground truth represents the data points of the dataset and model updates its weights to get the best fit line. Weights and iterations shows how the weights (a0, a1) get updated and how the loss gets to minima.

Here the smoothness of the model has reduced as it is not updating the weights after iterating through all data points, instead it updates the weights after iterating through the batch of the data.

Stochastic Gradient Descent :

It is a technique where model updates the weights after iterating through each and every data point. It iterates through all the data points randomly. So here we will be calculating error after each and every data point.

Observe the above figure carefully. The graph shows that it reaches the global minima in just 1 or 2 epochs. As stochastic gradient descent updates weights after iterating through every data point. Even the graph is not linear, it fluctuates a lot as it updates weights after every data point.

On the left side, you can observe ground truth & model. Ground truth represents the data points of the dataset and model updates its weights to get the best fit line. Weights and iterations shows how the weights (a0, a1) get updated and how the loss gets to minima. The smoothness of the model while forming the best fit line has reduced when compared to other 2 techniques.

If the dataset is too huge, then stochastic gradient descent requires less computation power.

--

--

Sjatin

Self motivated writer, Loves reading anything. Interested in exploring new technologies like DL, ML, AI. Writing is my passion and I would never leave it