2.4: Chain Rule
We know how to differentiate sums, differences, products, and quotients of functions. But we have yet to discuss how to differentiate composite functions—for example, the derivative of \(\sin(4x + 3)\) or the derivative of \(\sqrt{x^2 + 2}.\) In this section, upon learning the Chain Rule, we'll be able to differentiate many composite functions. This topic is, arguably, the most important to differential calculus since it is a stepping stone to further units. This section discusses the following topics:
- Chain Rule
- General Power Rule
- Differentiating \(b^x\)
- Using the Chain Rule Multiple Times
- Proof of the Chain Rule
Chain Rule
A composite function can be written in the form \begin{equation*} F(x) = f(g(x)) \cma \end{equation*} where \(f\) is the outer function and \(g\) is the inner function. Alternatively, we could write \(F(x) = (f \circ g)(x).\) Now consider \(F'(x),\) the rate of change of \(F\) with respect to \(x.\) It is logical to assume that \(F'(x)\) depends on both \(f'\) and \(g'.\) In fact, it turns out that \(F'(x)\) is the product of these derivatives, as stated by the Chain Rule: \begin{equation} F'(x) = f'(g(x)) \cdot g'(x) \pd \label{eq:chain-rule} \end{equation}
Chain Rule Expressed in Leibniz Notation Alternatively, we can express \(\eqref{eq:chain-rule}\) in Leibniz notation. For the composite function \(y = f(g(x)),\) we let \(u = g(x)\) to get \(y = f(u).\) We note that \(y\) changes with the variable \(u,\) which changes with \(x.\) Calculating the rate at which \(y\) changes with \(u\) doesn't need the Chain Rule. But we do need the rule to determine the rate of change of \(y\) with respect to \(x.\) In Leibniz notation, we write the Chain Rule as \begin{equation} \deriv{y}{x} = \deriv{y}{u} \deriv{u}{x} \pd \label{eq:chain-leib} \end{equation} We could alternatively write \[\deriv{y}{x} = \deriv{f}{g} \deriv{g}{x} \cma\] in which we view \(f\) and \(g\) as variables such that \(f\) depends on \(g\) and \(g\) depends on \(x.\) Both forms reveal the magic of Leibniz's notation—\(\eqrefer{eq:chain-leib}\) is easy to remember if you imagine canceling the differential \(\dd u.\) We see the following:
If \(y\) changes \(a\) times as fast as \(u,\) and \(u\) changes \(b\) times as fast as \(x,\) then \(y\) changes \(ab\) times as fast as \(x.\)For example, imagine that a bicycle travels \(6\) times faster than a walker, and that a car drives \(5\) times faster than the bicycle. The car therefore travels \(6 \times 5\) \(= 30\) times faster than the walker.
Applying the Chain Rule is similar to peeling an onion, in which we start peeling from the outer layer (initially ignoring the inner core) and gradually moving inward. Similarly, when differentiating a composite function, we use the differentiation operation on the outermost layer \((f)\) before moving on to differentiate the inner layer \((g).\)
Method 1 In the composite function \(\sin(x^2 + 1),\) we identify the outer function to be \(f(x) = \sin x\) and the inner function to be \(g(x) = x^2 + 1.\) Note that \[f'(x) = \cos x \and g'(x) = 2x \pd\] To obtain \(f'(g(x)),\) in \(f'(x) = \cos x\) we replace \(x\) with \(g(x) = x^2 + 1 \col\) \[f'(g(x)) = \cos \parbr{g(x)} = \cos \par{x^2 + 1} \pd\] Thus, by \(\eqref{eq:chain-rule}\) (the Chain Rule), the derivative of \(f(g(x))\) is \[ \ba f'(g(x)) \cdot g'(x) &= \cos \par{x^2 + 1} \cdot 2x \nl &= \boxed{2x \cos \par{x^2 + 1}} \ea \]
Method 2 In \(y = \sin(x^2 + 1),\) we assign some variable to be the inner function—say, \(u = x^2 + 1.\) Then we have \(y = \sin u,\) from which we see \[\deriv{y}{u} = \cos u \pd\] Also, \[\deriv{u}{x} = \deriv{}{x} \par{x^2 + 1} = 2x \pd\] Hence, by \(\eqref{eq:chain-leib}\) the derivative of \(y\) with respect to \(x\) is given by \[\deriv{y}{x} = \deriv{y}{u} \deriv{u}{x} = (\cos u) (2x) \pd\] But since we chose \(u = x^2 + 1,\) substituting back gives \[ \ba \deriv{y}{x} &= \cos \par{x^2 + 1} \cdot 2x \nl &= \boxed{2x \cos \par{x^2 + 1}} \ea \]
Method 1 The function \(\sqrt{x^3 - 2}\) is composite; the outer function is \(f(x) = \sqrt x,\) and the inner function is \(g(x) = x^3 - 2.\) Observe that \(f'(x) = 1/(2 \sqrt x),\) so \[f'(g(x)) = \frac{1}{2 \sqrt{g(x)}} = \frac{1}{2 \sqrt{x^3 - 2}} \pd\] Also, \(g'(x) = 3x^2.\) Thus, the Chain Rule, as given by \(\eqrefer{eq:chain-rule},\) gives the derivative of \(\sqrt{x^3 - 2}\) to be \[ \ba f'(g(x)) \cdot g'(x) &= \frac{1}{2 \sqrt{x^3 - 2}} \cdot 3x^2 \nl &= \boxed{\frac{3x^2}{2 \sqrt{x^3 - 2}}} \ea \]
Method 2 In \(y = \sqrt{x^3 - 2},\) we choose some variable \(u\) to be the inner function. Thus, let \(u = x^3 - 2.\) Observe that \[\deriv{y}{u} = \deriv{}{u} \sqrt{u} = \frac{1}{2 \sqrt u} = \frac{1}{2 \sqrt{x^3 - 2}}\] and \[\deriv{u}{x} = \deriv{}{x} \par{x^3 - 2} = 3x^2 \pd\] Therefore, by \(\eqref{eq:chain-leib}\) \[ \ba \deriv{y}{x} &= \deriv{y}{u} \deriv{u}{x} \nl &= \frac{1}{2 \sqrt{x^3 - 2}} \cdot 3x^2 \nl &= \boxed{\frac{3x^2}{2 \sqrt{x^3 - 2}}} \ea \]
General Power Rule
Very often, we use the Chain Rule in conjunction with other differentiation rules. Differentiation rules aren't exclusive; in other words, differentiating some functions requires a combination of rules. In Example 2 and Example 3, we used the Power Rule in conjunction with the Chain Rule. Now let us generalize this combination: Let \(g\) be a differentiable function, and consider the family of functions \[y = [g(x)]^n\] for some number \(n.\) To differentiate \(y,\) assuming it is defined, we let \(u = g(x).\) By the Power Rule, we see \[\deriv{y}{u} = \deriv{}{u} \par{u^n} = n u^{n - 1} \pd\] Then by the Chain Rule, \[ \ba \deriv{y}{x} &= \deriv{y}{u} \deriv{u}{x} \nl &= n u^{n - 1} \cdot g'(x) \nl &= n[g(x)]^{n - 1} \cdot g'(x) \pd \ea \] This equation is called the General Power Rule. It is a special case of the Chain Rule—an application of the Chain Rule—for when the outer function is a power function. Because the General Power Rule is derived from the Chain Rule, it is not a new concept but rather a convenient formula to know.
In \(\eqref{eq:chain-power}\) if \(g(x) = x,\) then \(g'(x) = 1\) and so the formula becomes \[\deriv{}{x} x^n = nx^{n - 1} \cdot 1 = nx^{n - 1} \cma\] the raw Power Rule. In this special case of \(g,\) the Chain Rule agrees with the Power Rule we have previously established.
Differentiating \(b^x\)
We know that the derivative of \(e^x\) is \(e^x,\) since \(e = 2.718 \dots\) is the only number whose corresponding exponential function matches its derivative. But to differentiate an exponential function whose base isn't \(e,\) we force the base to be \(e\) (thus producing a composite function) and use the Chain Rule. Let \(b\) be any positive number, in the family of functions \(y = b^x.\) Our goal is to differentiate the family. We use the Change of Base formula for exponents: \[y = b^x = \par{e^{\ln b}}^x = e^{x \ln b} \pd\] In this composite function, the outer function is \(f(x) = e^x\) and the inner function is \(g(x) = x \ln b.\) Since \(b\) is a constant, \(\ln b\) is also a constant and so \(g'(x) = \ln b.\) And of course, \(f'(x) = e^x.\) Therefore, by \(\eqref{eq:chain-rule}\) we see \[\deriv{}{x} \par{b^x} = e^{x \ln b} \ln b = b^x \ln b \pd\] In words, to differentiate an exponential function whose base isn't \(e,\) we copy the exponential function and multiply it by the natural logarithm of the base. If \(b = e,\) then \(\ln b = 1\) and so the derivative is simply \(e^x.\)
Using the Chain Rule Multiple Times
The function \(y = f(g(h(x)))\) has "three layers" of composition: the outermost layer \(f,\) the middle layer \(g,\) and the innermost layer \(h.\) To differentiate \(y,\) we use the Chain Rule twice, as follows: \begin{align} \deriv{y}{x} &= f'(g(h(x))) \cdot \deriv{}{x} [\orange{g(h(x))}] \nonumber \nl &= f'(g(h(x))) \parbr{\orange{g'(h(x)) \cdot h'(x)}} \nonumber \nl &= f'(g(h(x))) \cdot g'(h(x)) \cdot h'(x) \pd \label{eq:chain-rule-3} \end{align} Or in Leibniz notation, we can write \[\deriv{y}{x} = \deriv{f}{g} \deriv{g}{h} \deriv{h}{x} \pd\] This form explains the name Chain Rule: \(\textderiv{y}{x}\) is given by the product of a chain of derivatives. In general, when a function has \(n\) layers of composition, we use the Chain Rule \(n - 1\) times.
Proof of the Chain Rule
PROOF Let \(F(x) = f(g(x))\) such that \(g\) is differentiable at \(c\) and \(f\) is differentiable at \(g(c).\) Our goal is to show that \(F'(c) = f'(g(c)) \cdot g'(c).\) By the limit definition of a derivative at a point (see Section 2.1), we assert that \[ \ba F'(c) &= \lim_{x \to c} \frac{F(x) - F(c)}{x - c} \nl &= \lim_{x \to c} \frac{f(g(x)) - f(g(c))}{x - c} \pd \ea \] Our goal is to rewrite this limit to reveal that it's a product of two derivatives. By algebraic manipulation, we obtain \[ F'(c) = \lim_{x \to c} \parbr{\frac{f(g(x)) - f(g(c))}{g(x) - g(c)} \cdot \frac{g(x) - g(c)}{x - c}} \cma \] assuming that \(g(x) \ne g(c).\) Using the Product Law for Limits (from Section 1.2), this limit becomes \[F'(c) = \par{\lim_{x \to c} \frac{f(g(x)) - f(g(c))}{g(x) - g(c)}} \par{\lim_{x \to c} \frac{g(x) - g(c)}{x - c}} \cma\] provided each limit exists. The second limit is the definition of \(g'(c),\) so we have \[F'(c) = \par{\lim_{x \to c} \frac{f(g(x)) - f(g(c))}{g(x) - g(c)}} \cdot g'(c) \pd\] Now we show that the first limit represents the derivative \(f'(g(c)) \col\) Since \(g\) is differentiable at \(c,\) \(g\) is continuous at \(c\) and so \(g(x) \to g(c)\) as \(x \to c.\) If we let \(u = g(x),\) then we find \[ \ba F'(c) &= \par{\lim_{u \to g(c)} \frac{f(u) - f(g(c))}{u - g(c)}} \cdot g'(c) \nl &= f'(g(c)) \cdot g'(c) \pd \ea \] \[\qedproof\]
Chain Rule We use the Chain Rule to differentiate composite functions. If \(g\) is differentiable at \(x\) and \(f\) is differentiable at \(g(x),\) then the derivative of the composite function \(F(x) = f(g(x))\) is given by \begin{equation} F'(x) = f'(g(x)) \cdot g'(x) \pd \eqlabel{eq:chain-rule} \end{equation} In Leibniz notation, if \(y = f(g(x))\) and \(u = g(x),\) then we can write either \begin{flalign} && \deriv{y}{x} &= \deriv{y}{u} \deriv{u}{x} \eqlabel{eq:chain-leib} &\nl \laWord{or} && \deriv{y}{x} &= \deriv{f}{g} \deriv{g}{x} \nonumber \pd \end{flalign} In words, to obtain the derivative of a composite function, we differentiate the outer layer—leaving the inner function alone—and then multiply by the derivative of the inner function.
General Power Rule If \(g\) is a differentiable function and \(n\) is any constant, then \begin{equation} \deriv{}{x} [g(x)]^n = n \parbr{g(x)}^{n - 1} \cdot g'(x) \cma \eqlabel{eq:chain-power} \end{equation} where \([g(x)]^n\) is defined.
Differentiating \(b^x\) The function \(y = b^x\) can be rewritten as the composite function \(y = e^{x \ln b}.\) If \(b \gt 0,\) then the Chain Rule gives \begin{equation} \deriv{}{x} \par{b^x} = b^x \ln b \pd \eqlabel{eq:diff-b^x} \end{equation}
Using the Chain Rule Multiple Times We use the Chain Rule multiple times when a function is composed of more than two layers. In general, if a function is composed of \(n\) layers, then we apply the Chain Rule \(n - 1\) times. For example, the derivative of \(y = f(g(h(x)))\) (composed of three layers) is given by either \begin{flalign} && \deriv{y}{x} &= f'(g(h(x))) \cdot g'(h(x)) \cdot h'(x) \eqlabel{eq:chain-rule-3} &\nl \laWord{or} && \deriv{y}{x} &= \deriv{f}{g} \deriv{g}{h} \deriv{h}{x} \nonumber \pd \end{flalign} You shouldn't memorize \(\eqref{eq:chain-rule-3};\) instead, during your first application of the Chain Rule, we recommend that you view \(g(h(x))\) as a single, ordinary function composed within \(f.\) Be organized in writing out the steps of your differentiation process.