Learn Before
Objective Function Change Bounds in Gradient Descent
Assuming a sufficiently smooth objective function is Lipschitz continuous with constant (meaning that for any and , the objective satisfies ), the change in the objective value after a gradient descent update is bounded by the inequality . This bound demonstrates that the maximum change in the loss during a single step is constrained by the learning rate , the gradient norm , and the Lipschitz constant . A small value for this upper bound presents a trade-off: it limits the speed at which the objective value can be reduced, but it advantageously limits how much progress can go wrong or be undone in any single gradient step.
0
1
Tags
D2L
Dive into Deep Learning @ D2L
Related
Gradient Descent Reference
Linear Regression and Gradient Descent
Numerical Approximation of Gradients
Gradient Checking
Gradient Descent Explained
Why Gradient descent might fail?
A Chat with Andrew on MLOps: From Model-centric to Data-centric AI
Big Data to Good Data: Andrew Ng Urges ML Community To Be More Data-Centric and Less Model-Centric
MLOps: Data-centric and Model-centric approaches
Critical Points
First-order Optimization Algorithm
Method of Steepest Descent
Second-Order Gradient Methods
Gradient Descent Explanation
Gradient Descent Variants
Notes about gradient descent
Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements is true?
Vanishing/exploding gradient
BERT Training Process
Objective Function
Distributed Training
The Problem with Constant Initialization
Objective Function Change Bounds in Gradient Descent
One-Dimensional Gradient Descent
Multivariate Gradient Descent
Second-Order Optimization Algorithm
Average Objective Function in Deep Learning
Accelerated Gradient Methods
Batch Gradient Descent Update Formula