# Filling missing values (NaNs) with the mean of the column in Pandas DataFrame

schedule Aug 11, 2023
Last updated
local_offer
PythonPandas
Tags
# Example

Consider the following DataFrame:

df = pd.DataFrame({"A":[np.nan,3,np.nan],"B":[4,np.nan,5],"C":[np.nan,7,8]}, index=["a","b","c"])
df
A B C
a NaN 4.0 NaN
b 3.0 NaN 7.0
c NaN 5.0 8.0

## Solution

To fill the missing values with the mean of the column:

df.fillna(df.mean())
A B C
a 3.0 4.0 7.5
b 3.0 4.5 7.0
c 3.0 5.0 8.0

Here, a new DataFrame is returned, and the original df is kept intact.

## Explanation

Here, df.mean() returns a Series that holds the mean of each column:

df.mean()
A 3.0
B 4.5
C 7.5
dtype: float64

Conveniently, this Series provides the mapping as to which value should be used as the filler for each column. We then directly use fillna(~) to perform the filling.

## Performing the fill in-place

The fillna(~) method allows for the filling to be performed in-place. Note that in-place means that the original DataFrame is directly modified, and no new DataFrame is returned.

Set inplace=True like so:

df.fillna(df.mean(), inplace=True)
df
A B C
a 3.0 4.0 7.5
b 3.0 4.5 7.0
c 3.0 5.0 8.0
