Converting column type to float in Pandas DataFrame
Start your free 7-days trial now!
Solution
To convert the column type to float
in Pandas DataFrame:
use the Series'
astype()
method.use Pandas'
to_numeric()
method.
We recommend using to_numeric()
since this method is more flexible.
Example - converting data type of a single column
Consider the following DataFrame:
df
A B0 3 51 4 6
Currently, the column types are as follows:
df.dtypes
A objectB objectdtype: object
Using astype method
To convert column A
into type float
, use the Series' astype()
method:
Using as_numeric method
To convert column A
into type float32
, use the Pandas' to_numeric(~)
method:
df.dtypes
A float32B objectdtype: object
Case when conversion is not possible
Consider the following DataFrame:
df
A0 3#1 4
Here, the value "3#"
cannot be converted into a numeric type. By default, the to_numeric(~)
type will throw an error in such cases:
ValueError: Unable to parse string "3#" at position 0
We can map values that cannot be converted into NaN
instead:
df.dtypes
A float64B objectdtype: object
Note that Pandas will only allow columns containing NaN
to be of type float
.
Example - converting data type of multiple columns to float
To convert the data type of multiple columns to float, use Pandas' apply(~)
method with to_numeric(~)
.
Case when conversion is possible
Consider the following DataFrame:
df
A B0 3 51 4 6
Currently, the column types are as follows:
df.dtypes
A objectB objectdtype: object
To convert the type of all the columns, use the DataFrame's apply(~)
method:
df.dtypes
A int64B int64dtype: object
Here, we are iteratively applying Pandas' to_numeric(~)
method to each column of the DataFrame. The to_numeric(~)
method takes as argument a single column (Series) and converts its type to numeric (e.g. int
or float
).
Case when conversion is not possible
Consider the following DataFrame:
df
A B0 3 5#1 4 6
Here, column B
cannot be converted into numeric type since 5#
is not a valid number. Applying the to_numeric(~)
method without arguments will result in an error:
ValueError: Unable to parse string "5#" at position 0
Ignoring unsuccessful columns
Instead of throwing an error, we can supply the following keyword argument to to_numeric()
in order to ignore columns where the conversion is not possible:
df.dtypes
A float32B objectdtype: object
Replace with NaN for unsuccessful values
To fill values that cannot be successfully converted into the specified data type with NaN
:
df
A B0 3 NaN1 4 6.0
Here, the value "5#"
could not be converted into a numeric type and therefore we end up with a NaN
instead. The converted data types are as follows:
df.dtypes
A float32B float32dtype: object