Gradient Descent VS Normal Equation-Data Science Journey 5#

3 min readJan 7, 2023

Gradient Descent and the Normal equation are both methods that can be used to find the optimal solution for a Linear Regression model.

Both methods are used to find the values of the model parameters (such as the coefficients of the features) that minimize the error between the predicted values and the actual values in the training set.

The learning rate is a hyperparameter that determines the size of the steps taken by the algorithm to adjust the model parameters during training.

Gradient Descent :

✅Need to choose learning rate.

The learning rate determines the size of the steps taken to update the model parameters in the direction of the gradient.A larger learning rate may lead to faster convergence, but it can also make the algorithm more prone to overshooting the optimal solution. A smaller learning rate may lead to slower convergence, but it can also help the algorithm to more accurately converge on the optimal solution.

✅Faster for datasets with a large number of features or for cases where the optimal solution requires many iterations.

Gradient descent is an iterative algorithm that starts with initial values for the model parameters and adjusts them iteratively to minimize the error. It does this by using the gradient of the error function to determine the direction in which the parameters should be adjusted.

✅Works well even n is large.

“n” typically refers to the number of features in the dataset being used for training. The value of “n” can affect the performance of the Gradient Descent Algorithm, as a larger number of features can make the optimization problem more complex and may require a larger number of iterations to converge on the optimal solution.Reduce the number of features in the dataset by selecting only the most relevant and informative features, as this can help to improve the efficiency and effectiveness of the Gradient Descent Algorithm.

✅Time complexity is O(kn), where k is the number of iterations.

Normal Equation :

✅No need to choose learning rate.

The Normal Equation does not have a learning rate parameter, as it directly computes the optimal solution for the model parameters in a single step. This means that the normal equation does not have the ability to adjust the step size like Gradient Descent does.

✅Don’t need to iterate.

It is a direct method that finds the optimal solution in one step by solving a system of linear equations.

✅Can be more computationally efficient for datasets with a small number of features.

✅Time complexity is O(n³) for a dataset with n features.

The choice of which method to use will depend on the specific characteristics of the problem .

And the choice is yours 😉

Gradient Descent VS Normal Equation-Data Science Journey 5#

Gradient Descent :

Normal Equation :

Written by Sevval Hatice ÖTER