Getting rows with missing values (NaNs) in Pandas DataFrame
Start your free 7-days trial now!
Example
Consider the following DataFrame:
df = pd.DataFrame({"A":[np.nan,3,np.nan],"B":[4,5,6],"C":[np.nan,7,8]}, index=["a","b","c"])df
A B Ca NaN 4 NaNb 3.0 5 7.0c NaN 6 8.0
Rows with at least one missing value
Solution
To get rows with missing values in df:
df[df.isna().any(axis=1)]
A B Ca NaN 4 NaNc NaN 6 8.0
Explanation
The isna() method returns a DataFrame of booleans where True indicates the presence of a missing value:
df.isna()
A B Ca True False Trueb False False Falsec True False False
We then use any(axis=1) to obtain a Series where True represents the presence of at least one True in each row:
df.isna().any(axis=1)
a Trueb Falsec Truedtype: bool
The parameter axis=1 is needed here since the default behaviour of any(~) is to scan through each column (as opposed to each row).
With this boolean mask, we can then extract rows that correspond to True using [] syntax:
df[df.isna().any(axis=1)]
A B Ca NaN 4 NaNc NaN 6 8.0
Rows with missing value for a certain column
We show the same df here for your reference:
df = pd.DataFrame({"A":[np.nan,3,np.nan],"B":[4,5,6],"C":[np.nan,7,8]}, index=["a","b","c"])df
A B Ca NaN 4 NaNb 3.0 5 7.0c NaN 6 8.0
Solution
To get rows with missing value in column C:
df[df["C"].isna()]
A B Ca NaN 4 NaN
Explanation
We first begin by extracting column C as a Series:
df["C"]
a NaNb 7.0c 8.0Name: C, dtype: float64
Next, we use the Series' isna() method to get a Series of booleans where True indicates the presence of NaN:
df["C"].isna()
a Trueb Falsec FalseName: C, dtype: bool
Finally, we pass in this boolean mask to extract the rows corresponding to True using [~] syntax:
df[df["C"].isna()]
A B Ca NaN 4 NaN
Rows with missing value for multiple columns
We show the same df here for your reference:
df = pd.DataFrame({"A":[np.nan,3,np.nan],"B":[4,5,6],"C":[np.nan,7,8]}, index=["a","b","c"])df
A B Ca NaN 4 NaNb 3.0 5 7.0c NaN 6 8.0
Solution
To get rows with missing value in columns A and C:
df[df[["A","C"]].isna().all(axis=1)]
A B Ca NaN 4 NaN
Explanation
We first fetch columns A and C as a DataFrame using [~] syntax:
df[["A","C"]]
A Ca NaN NaNb 3.0 7.0c NaN 8.0
We then use the isna() method to get a DataFrame of booleans where True indicates the presence of NaN:
df[["A","C"]].isna()
A Ca True Trueb False Falsec True False
Next, we use all(axis=1) get a Series of booleans where True indicates a row with all Trues:
df[["A","C"]].isna().all(axis=1)
a Trueb Falsec Falsedtype: bool
Finally, we use the [] syntax to extract the rows corresponding to True:
df[df[["A","C"]].isna().all(axis=1)]
A B Ca NaN 4 NaN
Rows with missing values for all columns
Consider the following DataFrame:
df = pd.DataFrame({"A":[np.nan,np.nan],"B":[np.nan,4]}, index=["a","b"])df
A Ba NaN NaNb NaN 4.0
Solution
To get rows with missing values for all columns:
df[df.isna().all(axis=1)]
A Ba NaN NaN
Explanation
The logic is exactly the same as the case for getting rows with at least one missing value, except that we use all(~) instead of any(~). The difference is as follows: