Hi! I'm a high school student currently writing a paper about neural networks and the backpropogation algorithm.
I've understood pretty much all the basics of neural networks and gradient descent(I am still wrapping my head around backpropogation though).
One of the tasks i have been given is to explain why stochastic gradient descent is a much more effective approach than other minimization approaches.
I noticed that one of the e-books I've been reading(http://neuralnetworksanddeeplearning.com/chap1.html) mentions that you can prove that gradient descent is the optimal strategy for minimizing a function, and that this can be proved using the Cauchy-Schwarz inequality(search for "Cauchy-Schwarz" on the page to get to where it's talked about).
I would love to include such a proof, but I'm having trouble finding out where to start. I would appreciate any help/tips :)
Subreddit
Post Details
- Posted
- 9 years ago
- Reddit URL
- View post on reddit.com
- External URL
- reddit.com/r/MachineLear...