15

Batch vs Stochastic Gradient Descent

 3 years ago
source link: https://towardsdatascience.com/batch-vs-stochastic-gradient-descent-a6c89d709b47?gi=bba97e6fdcd7
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Batch vs Stochastic Gradient Descent

Learn difference between Batch & Stochastic Gradient Descent and choose best descent for your model.

May 31 ·4min read

3am2uqv.jpg!web

Photo by Bailey Zindel on Unsplash

Before diving into Gradient Descent, we’ll look how a Linear Regression model deals with Cost function. Main motive to reach Global minimum is to minimize Cost function which is given by,

qmQzy2f.png!web

Here, Hypothesis represents linear equation where, theta(0) is the bias AKA intercept and theta(1) are the weight(slope) given to the feature ‘x’.

BBnQnae.png!web

Fig: 1

Weights and intercept are randomly initialized taking baby step to reach minimum point. An important parameter in Gradient Descent is the size of the steps, determined by the learning rate hyper-parameter. It’s important to note that if we set high value of learning rate, point will end up taking large steps and probably will not reach global minimum( having large errors). On the other hand, if we take small value of learning rate, purple point will take large amount of time to reach global minimum. Therefore, Optimal learning rate should be taken.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK