Converting column type to integer in Pandas DataFrame
Start your free 7-days trial now!
Solution
To convert the column type to integer in Pandas DataFrame:
use the Series'
astype()method.use Pandas'
to_numeric()method.
We recommend using to_numeric() since this method is more flexible.
Example - converting data type of a single column
Consider the following DataFrame:
df
A B0 3 51 4 6
Currently, the column types are as follows:
df.dtypes
A objectB objectdtype: object
Using astype method
To convert column A into type int, use the Series' astype() method:
Using as_numeric method
To convert column A into type int, use the Pandas' to_numeric(~) method:
df.dtypes
A int64B objectdtype: object
Case when conversion is not possible
Consider the following DataFrame:
df
A0 3#1 4
Here, the value "3#" cannot be converted into a numeric type. By default, the to_numeric(~) type will throw an error in such cases:
ValueError: Unable to parse string "3#" at position 0
We can map values that cannot be converted into NaN instead:
df.dtypes
A float64B objectdtype: object
Note that Pandas will only allow columns containing NaN to be of type float.
Example - converting data type of multiple columns to integer
To convert the data type of multiple columns to integer, use Pandas' apply(~) method with to_numeric(~).
Case when conversion is possible
Consider the following DataFrame:
df
A B0 3 51 4 6
Currently, the column types are as follows:
df.dtypes
A objectB objectdtype: object
To convert the type of all the columns, use the DataFrame's apply(~) method:
df.dtypes
A int64B int64dtype: object
Here, we are iteratively applying Pandas' to_numeric(~) method to each column of the DataFrame. The to_numeric(~) method takes as argument a single column (Series) and converts its type to numeric (e.g. int or float).
Case when conversion is not possible
Consider the following DataFrame:
df
A B0 3 5#1 4 6
Here, column B cannot be converted into numeric type since 5# is not a valid number. Applying the to_numeric(~) method without arguments will result in an error:
ValueError: Unable to parse string "5#" at position 0
Ignoring unsuccessful columns
Instead of throwing an error, we can supply the following keyword argument to to_numeric() in order to ignore columns where the conversion is not possible:
df.dtypes
A int64B objectdtype: object
Replace with NaN for unsuccessful values
To fill values that cannot be successfully converted into the specified data type with NaN:
df
A B0 3 NaN1 4 6.0
Here, the value "5#" could not be converted into a numeric type and therefore we end up with a NaN instead. The converted data types are as follows:
df.dtypes
A int64B float64dtype: object