Learn Before
Multivariate Gradient Descent
When the objective function maps a -dimensional vector to a scalar, i.e., , its gradient becomes a vector of partial derivatives:
abla f(\mathbf{x}) = \left[\frac{\partial f(\mathbf{x})}{\partial x_1}, \frac{\partial f(\mathbf{x})}{\partial x_2}, \ldots, \frac{\partial f(\mathbf{x})}{\partial x_d} ight]^ op$$ Each component $$\partial f(\mathbf{x})/\partial x_i$$ captures the rate at which $$f$$ changes with respect to $$x_i$$ alone. Using the first-order multivariate Taylor expansion, $$f(\mathbf{x} + \boldsymbol{\epsilon}) = f(\mathbf{x}) + \boldsymbol{\epsilon}^ op abla f(\mathbf{x}) + \mathcal{O}(\|\boldsymbol{\epsilon}\|^2)$$ one can show that the steepest-descent direction (up to second-order terms) is the negative gradient $$- abla f(\mathbf{x})$$. Choosing a suitable learning rate $$\eta > 0$$ yields the multivariate gradient descent update rule: $$\mathbf{x} \leftarrow \mathbf{x} - \eta abla f(\mathbf{x})$$ This directly generalizes the scalar update $$x \leftarrow x - \eta f'(x)$$ to vector-valued parameters.0
1
Tags
D2L
Dive into Deep Learning @ D2L
Related
Gradient Descent Reference
Linear Regression and Gradient Descent
Numerical Approximation of Gradients
Gradient Checking
Gradient Descent Explained
Why Gradient descent might fail?
A Chat with Andrew on MLOps: From Model-centric to Data-centric AI
Big Data to Good Data: Andrew Ng Urges ML Community To Be More Data-Centric and Less Model-Centric
MLOps: Data-centric and Model-centric approaches
Critical Points
First-order Optimization Algorithm
Method of Steepest Descent
Second-Order Gradient Methods
Gradient Descent Explanation
Gradient Descent Variants
Notes about gradient descent
Suppose you have built a neural network. You decide to initialize the weights and biases to be zero. Which of the following statements is true?
Vanishing/exploding gradient
BERT Training Process
Objective Function
Distributed Training
The Problem with Constant Initialization
Objective Function Change Bounds in Gradient Descent
One-Dimensional Gradient Descent
Multivariate Gradient Descent
Second-Order Optimization Algorithm
Average Objective Function in Deep Learning
Accelerated Gradient Methods
Batch Gradient Descent Update Formula