search
Search
Map of Data Science
search toc
Thanks for the thanks!
close
account_circle
Profile
exit_to_app
Sign out
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview Doc Search Code Search Beta SORRY NOTHING FOUND!
mic
Start speaking... Voice search is only supported in Safari and Chrome.
Shrink
Navigate to Pandas
655 guides
keyboard_arrow_down
chevron_leftHandling Missing Values
check_circle
Mark as learned
thumb_up
3
thumb_down
0
chat_bubble_outline
0
auto_stories new
settings

# Pandas DataFrame | interpolate method

schedule Mar 5, 2023
Last updated
local_offer
PythonPandas
Tags
map
Check out the interactive map of data science

Pandas `DataFrame.interpolate(~)` method fills `NaN` using interpolated values.

# Parameters

1. `method` | `string` | `linear`

The algorithm used for interpolation:

• `"linear"`: simple linear interpolation.

• `"time"`: interpolation using DatetimeIndex.

• `"index"` or `"values"`: use the index to perform interpolation. See example below.

• `"pad"`: use either the previous or next non-`NaN` value to fill. The direction can be set using `limit_direction`.

In addition, you can also use the interpolation methods available for `scipy.interpolate.interp1d`:

``` nearest, zero, slinear, quadratic, cubic, spline, barycentric, polynomial ```

Some of these methods require a argument to be passed, which you can do using `**kwargs` like so:

``` df.interpolate(method="polynomial", order=5) ```

2. `axis` | `int` or `string` | `optional`

Whether to interpolate each row or column:

Axis

Description

`0` or `"index"`

Interpolate each column

`1` or `"columns"`

Interpolate each row

By default, `axis=0`.

3. `limit` | `int` | `optional`

The maximum number (inclusive) of consecutive `NaN` to fill. For instance, if `limit=3`, and there are `3` consecutive `NaN`s, then filling will be performed on the first two `NaN`s, and the third will be left as is.

4. `inplace` | `boolean` | `optional`

• If `True`, then the method will directly modify the source DataFrame instead of creating a new DataFrame.

• If `False`, then a new DataFrame will be created and returned.

By default, `inplace=False`.

5. `limit_direction` | `string` | `optional`

The fill direction of `NaN`:

• `"forward"`: use the previous non-`NaN` value to fill

• `"backward"`: use the next non-`NaN` value to fill

• `"both"`: use the next non-`NaN` value to fill if previous non-`NaN` value is unavailable, and vice versa.

This is only relevant if `limit` is specified. By default, `limit_direction="forward"`.

6. `limit_area` | `None` or `string` | `optional`

The restriction imposed on filling:

• `None`: no restriction.

• `"inside"`: only perform interpolation (i.e. when lower and upper bounds of the interval are defined)

• `"outside"`: only perform extrapolation (i.e. when only one bound of the interval is defined)

By default, `limit_area=None`.

7. `downcast` | `"infer"` or `None` | `optional`

Whether or not to downcast the resulting dtypes. By default, `downcast=None`.

8. `**kwargs`

The keyword arguments to pass on to `method`.

# Return value

A DataFrame with the `NaN` filled with interpolated values.

# Examples

## Basic usage

Consider the following DataFrame:

``` df = pd.DataFrame({"A":[3,np.nan,5,6],"B":[1,5,np.nan,9],"C":[1,5,np.nan,np.nan]})df A B C0 3.0 1.0 1.01 NaN 5.0 5.02 5.0 NaN NaN3 6.0 9.0 NaN ```

To fill `NaN` using linear interpolation:

``` df.interpolate() # method="linear" A B C0 3.0 1.0 1.01 4.0 5.0 5.02 5.0 7.0 5.03 6.0 9.0 5.0 ```

Notice how the two `NaN` in column `C` were filled using forward-fill (default) instead since linear interpolation cannot be performed without an upper bound.

## Interpolating row-wise

To interpolate row-wise, pass in `axis=1` like so:

``` df.interpolate(axis=1) A B C0 3.0 1.0 1.01 NaN 5.0 5.02 5.0 5.0 5.03 6.0 9.0 9.0 ```

## Interpolating using method=index

Consider the following DataFrame

``` df = pd.DataFrame({"B":[5,np.nan,9]}, index=[5,10,30])df B5 5.010 NaN30 9.0 ```

Performing simple linear interpolation yields:

``` df.interpolate() # method="linear" B5 5.010 7.030 9.0 ```

Here, we get a `7` as the interpolated value because the difference between the lower and upper bound (`4`) is split up into 2 equally-distanced intervals.

In contrast, interpolating using `method="index"` instead gives:

``` df.interpolate(method="index") B5 5.010 5.830 9.0 ```

Here, the difference between the lower and upper bound (`4`) is divided up not by the number of intervals there are, but by the difference of the index values (`30-5=25`). So, we end up with `5.8` because:

``` (4/25 * 5) + 5 = 5.8 ```

## Interpolation using method=time

Consider the following DataFrame with a `DatetimeIndex`:

``` index_date = pd.to_datetime(["2020-12-01", "2020-12-02", "2020-12-15", "2020-12-31"])df = pd.DataFrame({"A":[1,np.nan,np.nan,31]}, index=index_date)df A2020-12-01 1.02020-12-02 NaN2020-12-15 NaN2020-12-31 31.0 ```

If we perform linear interpolation on `df`:

``` df.interpolate() A2020-12-01 1.02020-12-02 11.02020-12-15 21.02020-12-31 31.0 ```

Here, the index is not taken into account - the lower bound is `1` and upper bound is `31`, and the difference is evenly spaced out in 3 intervals.

To take into account the `DatatimeIndex`, pass in `method="time"`:

``` df.interpolate(method="time") A2020-12-01 1.02020-12-02 2.02020-12-15 15.02020-12-31 31.0 ```

Here, the bounds are still the same - lower bound is `1` and upper bound is `31`. Instead of dividing the difference 30 by the number of intervals, we divide the difference by the length of time, which in this case is 30 days. This is why for instance, for day 15, we see an interpolated value for 15.

## Specifying limit direction

Consider the following DataFrame:

``` df = pd.DataFrame({"A":[np.nan,np.nan,5], "B":[5,np.nan,9], "C":[5,np.nan,np.nan]})df A B C0 NaN 5.0 5.01 NaN NaN NaN2 5.0 9.0 NaN ```

By default, `limit_direction="forward"`, which means that we use the previous non-`NaN` value to fill `NaN`:

``` df.interpolate() # limit_direction="forward" A B C0 NaN 5.0 5.01 NaN 7.0 5.02 5.0 9.0 5.0 ```

To use the next non-`NaN` value to fill `NaN`, pass in `limit_direction="backward"`:

``` df.interpolate(limit_direction="backward") A B C0 5.0 5.0 5.01 5.0 7.0 NaN2 5.0 9.0 NaN ```

Notice how for both `forward` and `backward`, we may still end up with `NaN` values when there are no previous/next non-`NaN` values. We can prevent this by setting `limit_direction="both"`, which ensures that if the previous non-`NaN` value is unavailable, then the next non-value would be used, and vice versa:

``` df.interpolate(limit_direction="both") A B C0 5.0 5.0 5.01 5.0 7.0 5.02 5.0 9.0 5.0 ```

## Downcasting the resulting DataFrame

By default, `downcast=None`, which means that even no casting will be performed if a column type can be casted to a more specific type.

For example, consider the following DataFrame:

``` df = pd.DataFrame({"A":[np.nan,5], "B":[5,np.nan]})df A B0 NaN 5.01 5.0 NaN ```

Performing interpolation yields:

``` df.interpolate() # downcast=None A B0 NaN 5.01 5.0 5.0 ```

Checking the column types of the resulting DataFrame:

``` df.interpolate().dtypes A float64B float64dtype: object ```

In this scenario, it is possible to use a more specific type, namely `int`, as the column type of `B`. To perform this downcast, set `downcast="infer"`:

``` df.interpolate(downcast="infer").dtypes A float64B int64dtype: object ```
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
thumb_up
3
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!