Proving that gradient descent is effective.

Post Body

Hi! I'm a high school student currently writing a paper about neural networks and the backpropogation algorithm.

I've understood pretty much all the basics of neural networks and gradient descent(I am still wrapping my head around backpropogation though).

One of the tasks i have been given is to explain why stochastic gradient descent is a much more effective approach than other minimization approaches.

I noticed that one of the e-books I've been reading(http://neuralnetworksanddeeplearning.com/chap1.html) mentions that you can prove that gradient descent is the optimal strategy for minimizing a function, and that this can be proved using the Cauchy-Schwarz inequality(search for "Cauchy-Schwarz" on the page to get to where it's talked about).

I would love to include such a proof, but I'm having trouble finding out where to start. I would appreciate any help/tips :)

Author

Account Strength

90%

Account Age

12 years

Verified Email

Yes

Verified Flair

Total Karma

2,560

Link Karma

309

Comment Karma

2,251

Profile updated: 6 days ago

Posts updated: 10 months ago

alexgotyou

Subreddit

r/MachineLearning

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 9 years ago
Reddit URL: View post on reddit.com
External URL: reddit.com/r/MachineLear...