Artificial Intelligence and Machine Learning

A Gentle Introduction To Hessian Matrices

By mullaned2002

August 6, 2021

803

Last Updated on August 5, 2021

Hessian matrices belong to a class of mathematical structures that involve second order derivatives. They are often used in machine learning and data science algorithms for optimizing a function of interest.

In this tutorial, you will discover Hessian matrices, their corresponding discriminants, and their significance. All concepts are illustrated via an example.

After completing this tutorial, you will know:

Hessian matrices
Discriminants computed via Hessian matrices
What information is contained in the discriminant

Let’s get started.

A Gentle Introduction to Hessian Matrices. Photo by Beenish Fatima, some rights reserved.

Tutorial Overview

This tutorial is divided into three parts; they are:

Definition of a function’s Hessian matrix and the corresponding discriminant
Example of computing the Hessian matrix, and the discriminant
What the Hessian and discriminant tell us about the function of interest

Prerequisites

For this tutorial, we assume that you already know:

Derivative of functions
Function of several variables, partial derivatives and gradient vectors
Higher order derivatives

You can review these concepts by clicking on the links given above.

What Is A Hessian Matrix?

The Hessian matrix is a matrix of second order partial derivatives. Suppose we have a function f of n variables, i.e.,

f: R^n → R

The Hessian of f is given by the following matrix on the left. The Hessian for a function of two variables is also shown below on the right.

Hessian a function of n variables (left). Hessian of f(x,y) (right)

We already know from our tutorial on gradient vectors that the gradient is a vector of first order partial derivatives. The Hessian is similarly, a matrix of second order partial derivatives formed from all pairs of variables in the domain of f.

What Is The Discriminant?

The determinant of the Hessian is also called the discriminant of f. For a two variable function f(x, y), it is given by:

Discriminant of f(x, y)

Examples of Hessian Matrices And Discriminants

Suppose we have the following function:

g(x, y) = x^3 + 2y^2 + 3xy^2

Then the Hessian H_g and the discriminant D_g are given by:

Hessian and discriminant of g(x, y) = x^3 + 2y^2 + 3xy^2

Let’s evaluate the discriminant at different points:

D_g(0, 0) = 0

D_g(1, 0) = 36 + 24 = 60

D_g(0, 1) = -36

D_g(-1, 0) = 12

What Do The Hessian And Discriminant Signify?

The Hessian and the corresponding discriminant are used to determine the local extreme points of a function. Evaluating them helps in the understanding of a function of several variables. Here are some important rules for a point (a,b) where the discriminant is D(a, b):

The function f has a local minimum if f_xx(a, b) > 0 and the discriminant D(a,b) > 0
The function f has a local maximum if f_xx(a, b) < 0 and the discriminant D(a,b) > 0
The function f has a saddle point if D(a, b) < 0
We cannot draw any conclusions if D(a, b) = 0 and need more tests

Example: g(x, y)

For the function g(x,y):

We cannot draw any conclusions for the point (0, 0)
f_xx(1, 0) = 6 > 0 and D_g(1, 0) = 60 > 0, hence (1, 0) is a local minimum
The point (0,1) is a saddle point as D_g(0, 1) < 0
f_xx(-1,0) = -6 < 0 and D_g(-1, 0) = 12 > 0, hence (-1, 0) is a local maximum

The figure below shows a graph of the function g(x, y) and its corresponding contours.

Graph of g(x,y) and contours of g(x,y)

Why Is The Hessian Matrix Important In Machine Learning?

The Hessian matrix plays an important role in many machine learning algorithms, which involve optimizing a given function. While it may be expensive to compute, it holds some key information about the function being optimized. It can help determine the saddle points, and the local extremum of a function. It is used extensively in training neural networks and deep learning architectures.

Extensions

This section lists some ideas for extending the tutorial that you may wish to explore.

Optimization
Eigen values of the Hessian matrix
Inverse of Hessian matrix and neural network training

If you explore any of these extensions, I’d love to know. Post your findings in the comments below.

Tutorials

Derivatives
Gradient descent for machine learning
What is gradient in machine learning
Partial derivatives and gradient vectors
Higher order derivatives
How to choose an optimization algorithm

Resources

Additional resources on Calculus Books for Machine Learning

Books

Thomas’ Calculus, 14th edition, 2017. (based on the original works of George B. Thomas, revised by Joel Hass, Christopher Heil, Maurice Weir)
Calculus, 3rd Edition, 2017. (Gilbert Strang)
Calculus, 8th edition, 2015. (James Stewart)

Summary

In this tutorial, you discovered what are Hessian matrices. Specifically, you learned:

Hessian matrix
Discriminant of a function

Do you have any questions?

Ask your questions in the comments below and I will do my best to answer.

The post A Gentle Introduction To Hessian Matrices appeared first on Machine Learning Mastery.

A Gentle Introduction To Hessian Matrices

Tutorial Overview

Prerequisites

What Is A Hessian Matrix?

What Is The Discriminant?

Examples of Hessian Matrices And Discriminants

What Do The Hessian And Discriminant Signify?

Example: g(x, y)

Why Is The Hessian Matrix Important In Machine Learning?

Extensions

Further Reading

Tutorials

Resources

Books

Summary

Do you have any questions?

Use Kubernetes Operators for new inference capabilities in Amazon SageMaker that reduce LLM deployment costs by 50% on average

Scale AI training and inference for drug discovery through Amazon EKS and Karpenter

Talk to your slide deck using multimodal foundation models hosted on Amazon Bedrock and Amazon SageMaker – Part 2

LEAVE A REPLY Cancel reply

Most Popular

The overwhelmed person’s guide to Google Cloud: week of April 11

Monitor query plans for Amazon Aurora PostgreSQL

Cloud CISO Perspectives: 20 major security announcements from Next ‘24

Use Kubernetes Operators for new inference capabilities in Amazon SageMaker that reduce LLM deployment costs by 50% on average

Recent Comments

EDITOR PICKS

Exploring the Click Element Variable in Google Tag Manager

How to track events with Google Tag Manager and Google Analytics

Data Layer Variable in GTM: What, Why, and Where?

POPULAR POSTS

Unravel the knowledge in Slack workspaces with intelligent search using the Amazon Kendra Slack connector

Cloud Wisdom Weekly: 4 ways AI/ML boosts innovation and reduces costs

Amazon DynamoDB can now import Amazon S3 data into a new table

POPULAR CATEGORY