The chain rule of calculus

Neural networks consist of compositions of many functions. Taking derivatives through compositions of functions is what the chain rule of calculus helps us figure out.

If you have the expression:

That means you have an input x, you apply a function g to x, you get y and you apply another function f to y. Eventually, you get z.

Let’s say you would like to differentiate this whole expression with respect to x. The chain rule tells you that the derivative is given by the product of the derivative of g times the derivative of f:

The same exact expression applies to multivariate functions — we just have to be a little careful with the order, because the order in scalar multiplication is exchangeable, while the same is not true for matrix multiplication.

The same chain rule holds: dz/dy is a vector and dy/dx is a matrix.

Let’s look at this neural network:

We could use the chain rule to write out, for example, the derivative of the loss with respect to W(2). That’s equal to the derivative of z(2) with respect to W(2) times the derivative of the loss with respect to z(2):

In the same way, we can write out the derivative with respect to W(1):

Feel free to drop me a message or:

  1. Connect and reach me on LinkedIn and Twitter
  2. Follow me on Medium

References

CS 182: Lecture 5

All images taken from Levine, CS182, 2021.

PhD student in Machine Learning — also interested in Neuroscience and Cognitive Science.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store