Unlock 100+ guides
search toc
Log in or sign up
Sign out
What does this mean?
Why is this true?
Give me some examples!
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
Doc Search
Code Search Beta
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

Comprehensive Introduction to Differentiation

schedule Aug 12, 2023
Last updated
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Motivating example for differentiation

Suppose we have the following function:

Our goal is to find the slope at a specific point $x=a$. Graphically, we are after the gradient of the following tangent line:

Let's now go back to basics - given two points, we know that the slope is computed by "rise over the run" as illustrated below:

Let's use this formula to determine the slope at $x=a$:

The slope equation requires that we use two points, so we've arbitrarily selected another point $x=b$ so that we can compute the slope. The slope for these two points is:

$$\text{slope}= \frac{\text{rise}}{\text{run}}= \frac{f(b)-f(a)}{b-a}$$

Great, we've managed to compute the slope given these two points - but the caveat here is that we are after the slope at exactly point $x=a$. Let's compare the slope we've computed (purple) with the true slope (yellow):

Clearly, there's a difference between the true slope and the computed slope. This happens because when computing the slope as the rise over the run, we are implicitly assuming that the function is a straight line:

Since our function is curved, the computed slope (purple) cuts through our function. To make our computed slope more accurate, we can pick a point that is closer to $x=a$, say $x=c$ like so:

We can see that the computed slope is now closer to the true slope! This result should be intuitive - the slope at a single point should be approximately equal to the slope computed using that point and another close point. Therefore, let's choose a point that's even closer to $x=a$, say $x=d$ like so:

Great, the computed slope is now approximately equal to the true slope! However, we can do even better by choosing another point that is even closer, say $0.1$ units away from $x=a$. This will certainly get us closer to the true slope, but again, we can do even better by choosing an even closer point, say only $0.0001$ units away from $x=a$. In this way, we can keep reducing the gap between $x=a$ and the other point to obtain a slope that is nearly identical to the true slope.

Let's denote the distance between $x=a$ and the other point as $\Delta{x}$, which you can interpret as some small distance between the two points:

We can set an extremely small value for $\Delta{x}$ such as $\Delta{x}=0.0001$ so that our computed slope is roughly equal to the true slope. The main idea is to make $\Delta{x}$ as close to $0$ as possible because a smaller distance between the two points means that the approximation of the slope is more accurate.

Let's compute the slope given two points $x=a$ and $x=a+\Delta{x}$ using the rise-over-run formula:

$$\begin{align*} \text{slope}&= \frac{f(a+\Delta{x})-f(a)}{(a+\Delta{x})-a}\\ &= \frac{f(a+\Delta{x})-f(a)}{\Delta{x}} \end{align*}$$

We know that we can't just set $\Delta{x}=0$ because we would end up with the same point $x=a$. Of course, mathematically, $\Delta{x}=0$ would not work out as well because the denominator is $\Delta{x}$. That said, we want $\Delta{x}$ to be infinitesimally small such that $\Delta{x}$ approaches $0$, but is not actually equal to $0$. This is where the idea of limits comes into play:

$$\begin{equation}\label{eq:dyAnlbZp9rOddzGi00P} \text{slope} = \lim_{\Delta{x}\to0}\frac{f(a+\Delta{x})-f(a)}{\Delta{x}} \end{equation}$$

We are taking the limit of the fraction as the distance $\Delta{x}$ approaches $0$. To emphasize once again, this does not mean $\Delta{x}=0$, but instead means that $\Delta{x}$ can be arbitrarily as close to $0$ as possible. For the formal definition of limits, please check out our guide on limits and continuity of functions!

Formula \eqref{eq:dyAnlbZp9rOddzGi00P} tells us that the slope is a function of $a$, which is the $x$-position of the curve where we want to compute the slope. We often replace the variable $a$ with $x$ because a specific point $a$ can just be any $x$ value. Therefore, \eqref{eq:dyAnlbZp9rOddzGi00P} becomes:

$$\begin{equation}\label{eq:qUuZo0yMJLfUTrMCNqf} \text{slope}_x = \lim_{\Delta{x}\to0}\frac{f(x+\Delta{x})-f(x)}{\Delta{x}} \end{equation}$$

Here, we've added a subscript $x$ to $\text{slope}$ to emphasize the fact that the slope is dependent on the $x$ value we choose.

The formula \eqref{eq:qUuZo0yMJLfUTrMCNqf} that we have derived is actually the formal definition of differentiation! The numerator $f(x+\Delta{x})-f(x)$ represents the "rise" or the distance between the height of the two points. We typically express this distance as $\Delta{y}$, and so \eqref{eq:qUuZo0yMJLfUTrMCNqf} becomes:

$$\begin{equation}\label{eq:wFbAAU41jy0GDxpBzV0} \text{slope}_x = \lim_{\Delta{x}\to0} \frac{\Delta{y}}{\Delta{x}} \end{equation}$$

Great, we have managed to derive the formula for differentiation!

Notation for differentiation

Instead of the word "slope", mathematicians have come up with a notation to denote the slope. There are two main notations:

  • Lagrange's notation

  • Leibniz's notation

Lagrange's notation

Lagrange's notation is as follows:

$$\begin{equation}\label{eq:uxRn64SnEBu3F8Xid7w} \text{slope}_x = f'(x) = \lim_{\Delta{x}\to0} \frac{\Delta{y}}{\Delta{x}} \end{equation}$$

Here, $f'(x)$ is read as "$f$ prime of $x$".

Leibniz's notation

Leibniz's notation is as follows:

$$\begin{equation}\label{eq:zeKUG4LccAGVla2jnLn} \text{slope}_x =\frac{df(x)}{dx} = \frac{d}{dx}f(x) = \frac{dy}{dx} = \lim_{\Delta{x}\to0} \frac{\Delta{y}}{\Delta{x}} \end{equation}$$

Here, $dy/dx$ is read as "$dy$ by $dx$". When we compute $dy/dx$, we say that we are taking the derivative of $y=f(x)$ with respect to $x$.


Note that $dy/dx$ is not to be treated as a fraction, but instead as an operator. Later on in your Calculus journey, we may sometimes treat $dy/dx$ as a fraction, but this is only hand-wavy mathematics that turns out to work.

Which notation should I stick with?

Both Lagrange's notation and Leibniz's notation are widely used in practice. In multivariable Calculus, Leibniz's notation is extremely useful because we can explicitly state with respect to which variable we are taking the derivative. For instance, consider a multivariable function $f(x,y)$ - we can take the derivative with respect to either $ x$ or $y$, or even both:

$$\frac{d}{dx}f(x,y),\;\;\;\;\;\;\; \frac{d}{dy}f(x,y),\;\;\;\;\;\;\; \frac{d}{dxy}f(x,y)$$

In these cases, Lagrange's notation is not verbose enough and so Leibniz's notation is required. In cases when you are dealing with a single variable function $f(x)$, then you may want to pick Lagrange's notation instead because of its simplicity!

Formal definition of differentiation

Now that we've derived the formula for derivatives and gone over their notation, we can finally state the formal definition of differentiation.


Derivative of a function

The derivative of $f(x)$ with respect to $x$ is defined as follows:

$$\frac{d}{dx}f(x) = f'(x) =\lim_{\Delta{x}\to0}\frac{f(x+\Delta{x})-f(x)}{\Delta{x}} =\lim_{\Delta{x}\to0}\frac{\Delta{y}}{\Delta{x}}$$

If the above limit exists for some point $x=a$, then $f(x)$ is said to be differentiable at this point. If the limit does not exist at $x=a$, then $f(x)$ is not differentiable, that is, the derivative does not exist.

In the next section, we will compute the derivative of a parabola.


Computing the derivative of a parabola

Consider the following function:

  • what is the derivative of $f$ with respect to $x$?

  • what is the slope of the tangent line at $x=3$ and $x=0$?

Solution. From equation \eqref{eq:dyAnlbZp9rOddzGi00P}, we know that:

$$\begin{equation}\label{eq:fziIynyGdzJqQBvLAWJ} \begin{aligned} f'(x)&= \lim_{\Delta{x}\to0}\frac{f(x+\Delta{x})-f(x)}{\Delta{x}}\\ \end{aligned} \end{equation}$$

Let's compute $f(x+\Delta{x})$ and $f(x)$ given our function $f(x)=x^2$:

$$\begin{equation}\label{eq:dg1S2ORdjlOwGXnPPy7} \begin{aligned}[b] f'(x)&= \lim_{\Delta{x}\to0} \frac{(x+\Delta{x})^2-(x)^2}{\Delta{x}}\\ &= \lim_{\Delta{x}\to0} \frac{x^2+2x\Delta{x}+(\Delta{x})^2-x^2}{\Delta{x}}\\ &= \lim_{\Delta{x}\to0} \frac{2x\Delta{x}+(\Delta{x})^2}{\Delta{x}}\\ \end{aligned} \end{equation}$$

Now, we know that $\Delta{x}\ne0$, that is, $\Delta{x}$ is extremely close to zero, but not actually zero. This means that we can safely divide the numerator and denominator by $\Delta{x}$ like so:

$$\begin{equation}\label{eq:qlM76KTxDHv0dNM6LfM} \begin{aligned}[b] f'(x) &= \lim_{\Delta{x}\to0} \frac{2x\Delta{x}+(\Delta{x})^2}{\Delta{x}}\\ &= \lim_{\Delta{x}\to0} 2x+\Delta{x}\\ \end{aligned} \end{equation}$$

We know that $2x+\Delta{x}$ will approach $2x$ as $\Delta{x}$ tends to $0$, and hence:

$$\begin{equation}\label{eq:FjGyhq8ZZ2rBU6BmQQI} \begin{aligned}[b] f'(x) &=2x \end{aligned} \end{equation}$$

Here, we've used Lagrange's notation to represent the derivative of $f$ with respect to $x$, but we can also use Leibniz's notation:

$$\begin{equation}\label{eq:xrtQP0kQxzqyql25LzU} \frac{d}{dx}f(x)=2x \end{equation}$$

Now that we know the derivative, we can compute the slope of the function at any position $x$. For instance, the slope of the tangent line at $x=3$ is:


We can visualize this like follows:

On the other hand, the slope of the tangent line at $x=0$ is:

We can see that the tangent line at $x=0$ is flat!


Computing the derivative of functions using the formal definition of differentiation is called differentiation by first principles. As we shall explore later, there is a faster way to compute derivatives without referring to the formal definition (therefore avoiding limits altogether)!

Functions are locally linear

Recall that we started off by choosing two points ($a$ and $b_1$) with which to compute the slope:

The reason why the computed slope does not exactly match the function $f$ is that $f$ is not a straight line. We can easily imagine that the computed slope will be exactly equal to the slope at $(x_1,y_1)$ if the function $f$ was a straight line.

As we did before, let's choose the second point such that it is closer to the first point $a$, say $b_2=(x_2,y_2)$ like so:

Here, we show the zoomed-in diagram on the right to illustrate what's happening near the point $a$. We can see that the function $f$ looks more like a straight line compared to before. Now, to obtain a more accurate slope, we choose an even closer point, say $b_3=(x_3,y_3)$, and an even closer point $b_4=(x_4,y_4)$ like so:

We can see that as we progressively choose points closer and closer to $a$ and zoom in, the function $f$ becomes increasingly linear! In fact, if we keep on doing this, the function will be virtually identical to a linear line. This means that our classic "rise over the run" formula will become increasingly accurate since the curve will become more and more straight!

To recap, we have chosen an extremely close point to $a$, and then we zoomed in on that region to observe that the function approximately becomes a linear line. Mathematically, we say that functions are locally linear. This is a very surprising result because any non-linear curve such as the sine curve is actually locally linear given that we pick a point and zoom in enough on that region 🤯!

Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
Ask a question or leave a feedback...