search
Search
Login
Math ML
Map of Data Science
Join our weekly DS/ML newsletter
menu
menu search toc more_vert
Robocat
Guest 0reps
Thanks for the thanks!
close
chevron_left Handling Missing Values
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
help Ask a question
Share on Twitter
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
A
A
brightness_medium
share
arrow_backShare
chevron_left Handling Missing Values
check_circle
Mark as learned
thumb_up
2
thumb_down
0
chat_bubble_outline
0
auto_stories new
settings

Pandas DataFrame | interpolate method

Pandas
chevron_right
Documentation
chevron_right
DataFrame
chevron_right
Handling Missing Values
schedule Jul 1, 2022
Last updated
local_offer PythonPandas
Tags

Pandas DataFrame.interpolate(~) method fills NaN using interpolated values.

Parameters

1. method | string | linear

The algorithm used for interpolation:

  • "linear": simple linear interpolation.

  • "time": interpolation using DatetimeIndex.

  • "index" or "values": use the index to perform interpolation. See example below.

  • "pad": use either the previous or next non-NaN value to fill. The direction can be set using limit_direction.

In addition, you can also use the interpolation methods available for scipy.interpolate.interp1d:

nearest, zero, slinear, quadratic, cubic, spline, barycentric, polynomial

Some of these methods require a argument to be passed, which you can do using **kwargs like so:

df.interpolate(method="polynomial", order=5)

2. axis | int or string | optional

Whether to interpolate each row or column:

Axis

Description

Interpolate each column

0 or "index"

Interpolate each row

1 or "columns"

By default, axis=0.

3. limit | int | optional

The maximum number (inclusive) of consecutive NaN to fill. For instance, if limit=3, and there are 3 consecutive NaNs, then filling will be performed on the first two NaNs, and the third will be left as is.

4. inplace | boolean | optional

  • If True, then the method will directly modify the source DataFrame instead of creating a new DataFrame.

  • If False, then a new DataFrame will be created and returned.

By default, inplace=False.

5. limit_direction | string | optional

The fill direction of NaN:

  • "forward": use the previous non-NaN value to fill

  • "backward": use the next non-NaN value to fill

  • "both": use the next non-NaN value to fill if previous non-NaN value is unavailable, and vice versa.

This is only relevant if limit is specified. By default, limit_direction="forward".

6. limit_area | None or string | optional

The restriction imposed on filling:

  • None: no restriction.

  • "inside": only perform interpolation (i.e. when lower and upper bounds of the interval are defined)

  • "outside": only perform extrapolation (i.e. when only one bound of the interval is defined)

By default, limit_area=None.

7. downcast | "infer" or None | optional

Whether or not to downcast the resulting dtypes. By default, downcast=None.

8. **kwargs

The keyword arguments to pass on to method.

Return value

A DataFrame with the NaN filled with interpolated values.

Examples

Basic usage

Consider the following DataFrame:

df = pd.DataFrame({"A":[3,np.nan,5,6],"B":[1,5,np.nan,9],"C":[1,5,np.nan,np.nan]})
df
A B C
0 3.0 1.0 1.0
1 NaN 5.0 5.0
2 5.0 NaN NaN
3 6.0 9.0 NaN

To fill NaN using linear interpolation:

df.interpolate() # method="linear"
A B C
0 3.0 1.0 1.0
1 4.0 5.0 5.0
2 5.0 7.0 5.0
3 6.0 9.0 5.0

Notice how the two NaN in column C were filled using forward-fill (default) instead since linear interpolation cannot be performed without an upper bound.

Interpolating row-wise

To interpolate row-wise, pass in axis=1 like so:

df.interpolate(axis=1)
A B C
0 3.0 1.0 1.0
1 NaN 5.0 5.0
2 5.0 5.0 5.0
3 6.0 9.0 9.0

Interpolating using method=index

Consider the following DataFrame

df = pd.DataFrame({"B":[5,np.nan,9]}, index=[5,10,30])
df
B
5 5.0
10 NaN
30 9.0

Performing simple linear interpolation yields:

df.interpolate() # method="linear"
B
5 5.0
10 7.0
30 9.0

Here, we get a 7 as the interpolated value because the difference between the lower and upper bound (4) is split up into 2 equally-distanced intervals.

In contrast, interpolating using method="index" instead gives:

df.interpolate(method="index")
B
5 5.0
10 5.8
30 9.0

Here, the difference between the lower and upper bound (4) is divided up not by the number of intervals there are, but by the difference of the index values (30-5=25). So, we end up with 5.8 because:

(4/25 * 5) + 5 = 5.8

Interpolation using method=time

Consider the following DataFrame with a DatetimeIndex:

index_date = pd.to_datetime(["2020-12-01", "2020-12-02", "2020-12-15", "2020-12-31"])
df = pd.DataFrame({"A":[1,np.nan,np.nan,31]}, index=index_date)
df
A
2020-12-01 1.0
2020-12-02 NaN
2020-12-15 NaN
2020-12-31 31.0

If we perform linear interpolation on df:

df.interpolate()
A
2020-12-01 1.0
2020-12-02 11.0
2020-12-15 21.0
2020-12-31 31.0

Here, the index is not taken into account - the lower bound is 1 and upper bound is 31, and the difference is evenly spaced out in 3 intervals.

To take into account the DatatimeIndex, pass in method="time":

df.interpolate(method="time")
A
2020-12-01 1.0
2020-12-02 2.0
2020-12-15 15.0
2020-12-31 31.0

Here, the bounds are still the same - lower bound is 1 and upper bound is 31. Instead of dividing the difference 30 by the number of intervals, we divide the difference by the length of time, which in this case is 30 days. This is why for instance, for day 15, we see an interpolated value for 15.

Specifying limit direction

Consider the following DataFrame:

df = pd.DataFrame({"A":[np.nan,np.nan,5], "B":[5,np.nan,9], "C":[5,np.nan,np.nan]})
df
A B C
0 NaN 5.0 5.0
1 NaN NaN NaN
2 5.0 9.0 NaN

By default, limit_direction="forward", which means that we use the previous non-NaN value to fill NaN:

df.interpolate() # limit_direction="forward"
A B C
0 NaN 5.0 5.0
1 NaN 7.0 5.0
2 5.0 9.0 5.0

To use the next non-NaN value to fill NaN, pass in limit_direction="backward":

df.interpolate(limit_direction="backward")
A B C
0 5.0 5.0 5.0
1 5.0 7.0 NaN
2 5.0 9.0 NaN

Notice how for both forward and backward, we may still end up with NaN values when there are no previous/next non-NaN values. We can prevent this by setting limit_direction="both", which ensures that if the previous non-NaN value is unavailable, then the next non-value would be used, and vice versa:

df.interpolate(limit_direction="both")
A B C
0 5.0 5.0 5.0
1 5.0 7.0 5.0
2 5.0 9.0 5.0

Downcasting the resulting DataFrame

By default, downcast=None, which means that even no casting will be performed if a column type can be casted to a more specific type.

For example, consider the following DataFrame:

df = pd.DataFrame({"A":[np.nan,5], "B":[5,np.nan]})
df
A B
0 NaN 5.0
1 5.0 NaN

Performing interpolation yields:

df.interpolate() # downcast=None
A B
0 NaN 5.0
1 5.0 5.0

Checking the column types of the resulting DataFrame:

df.interpolate().dtypes
A float64
B float64
dtype: object

In this scenario, it is possible to use a more specific type, namely int, as the column type of B. To perform this downcast, set downcast="infer":

df.interpolate(downcast="infer").dtypes
A float64
B int64
dtype: object
mail
Join our newsletter for updates on new DS/ML comprehensive guides (spam-free)
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down