Pandas
keyboard_arrow_down 655 guides
chevron_leftRow and Column Operations Cookbook
Adding a column that contains the difference in consecutive rowsAdding a constant number to DataFrame columnsAdding an empty column to a DataFrameAdding column to DataFrame with constant valuesAdding new columns to a DataFrameAppending rows to a DataFrameApplying a function that takes as input multiple column valuesApplying a function to a single column of a DataFrameChanging column type to categoricalChanging the name of a DataFrame's indexChanging the order of columns in a DataFrameChanging the type of a DataFrame's indexChanging the type of a DataFrame's columnChecking if a column exists in a DataFrameChecking if a DataFrame column contains some valuesChecking if a value exists in a DataFrame in PandasChecking if column is numericChecking the data type of columnsChecking whether column values match or contain a patternCombining two columns as a single column of tuplesCombining two columns of type string in a DataFrameComputing the average of columnsComputing the correlation between columnsConcatenating DataFrames horizontallyConcatenating DataFrames verticallyConverting a row to column labelsConverting categorical type to intConverting column to listConverting Index to listConverting percent strings into numericConverting the index of a DataFrame into a columnCounting duplicate rowsCounting number of rows with no missing valuesCounting the occurrence of values in columnsCounting unique values in a column of a DataFrameCounting unique values in rows of a DataFrameCreating a new column based on other columnsCreating new column using if, elif and elseDescribing certain columnsDropping columns whose label contains a substringGetting column values based on another column values in a DataFrame in PandasGetting columns as a copyGetting columns whose label contains a substringGetting maximum value in columnsGetting maximum value of entire DataFrameGetting mean of columnsGetting median of columnsGetting minimum value in columnsGetting row label when calling applyGetting row labels as listGetting rows where column value contains any substring in a listGetting the name of indexGetting type of indexGrouping DataFrame rows into listsInserting column at a specific locationIterating over each column of a DataFrameIterating over each row of a DataFrameModifying rows of a DataFrameModifying values in IndexRemoving columns from a DataFrameRemoving columns using column labelsRemoving columns using integer indexRemoving columns with all missing valuesRemoving columns with some missing valuesRemoving duplicate columnsRemoving duplicate rowsRemoving first n rows of a DataFrameRemoving multiple columnsRemoving prefix from column labelsRemoving rows at random without shufflingRemoving rows from a DataFrame based on column valuesRemoving rows using integer indexRemoving rows with all zerosRemoving suffix from column labelsRenaming columns of a DataFrameReplacing substring in column valuesReturning multiple columns using the apply functionReversing the order of rowsSetting a new index of a DataFrameSetting an existing column as the new indexSetting column as the indexSetting integers as column labelsShowing all column labelsShuffling the rows of a DataFrameSorting a DataFrame by columnSorting a DataFrame by indexSorting DataFrame alphabeticallySorting DataFrame by column labelsSplitting a column of strings into multiple columnsSplitting column of lists into multiple columnsSplitting dictionary into separate columnsStripping substrings from values in columnsStripping whitespace from columnsStripping whitespaces in column labelsSumming a column of a DataFrameSumming rows of specific columnsSwapping the rows and columns of a DataFrameUnstacking certain columns onlyUpdating a row while iterating over the rows of a DataFrameUpdating rows based on column valuesUsing apply method in parallel
check_circle
Mark as learned thumb_up
0
thumb_down
0
chat_bubble_outline
0
Comment auto_stories Bi-column layout
settings
Converting categorical type to int in Pandas DataFrame
schedule Aug 12, 2023
Last updated local_offer
Tags Python●Pandas
tocTable of Contents
expand_more Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!
Start your free 7-days trial now!
Consider the following DataFrame:
df = pd.DataFrame({"group": pd.Series(["A","B","A"], dtype="category"), "name": pd.Series(["alex","bob","cathy"], dtype="string")})
group name0 A alex1 B bob2 A cathy
Here, column group
is of type category
.
Solution
To change the column type from category
to int
, use the factorize(~)
method:
df["group"] = pd.factorize(df["group"])[0]df
group name0 0 alex1 1 bob2 0 cathy
Explanation
Here, factorize(~)
returns a tuple of size two where the first element is the converted integers:
pd.factorize(df["group"])
(array([0, 1, 0]), CategoricalIndex(['A', 'B'], categories=['A', 'B'], ordered=False, dtype='category'))
Notice how values of the same category gets mapped to the same integer.
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
thumb_up
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!