search
Search
Publish
Guest 0reps
Thanks for the thanks!
close
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
search
keyboard_voice
close
Searching Tips
Search for a recipe: "Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
Doc Search
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Shrink
Navigate to
A
A
share
thumb_up_alt
bookmark
arrow_backShare
thumb_up
0
thumb_down
0
chat_bubble_outline
0
auto_stories new
settings

# Pandas | factorize method

Programming
chevron_right
Python
chevron_right
Pandas
chevron_right
Documentation
chevron_right
General Functions
schedule Jul 1, 2022
Last updated
local_offer PythonPandas
Tags
expand_more

Pandas `factorize(~)` method returns the following:

• an array of integer indices to map the input array to the unique values.

• all the unique values of the input array.

# Parameters

1. `values`link | `sequence`

A 1D sequence of values.

2. `sort`link | `boolean` | `optional`

Whether or not to sort the resulting array of unique values. By default, `sort=False`.

3. `na_sentinel`link | `int` | `optional`

The value to mark `NaN` in the array of integer indices. By default, `na_sentinel=-1`.

# Return Value

The following two NumPy arrays are returned:

• an array of integer indices that maps the input array to the array of unique values.

• an array containing the unique values of the input array.

# Examples

## Basic usage

``` codes, uniques = pd.factorize(["B", "A", "A", "C", "B"])print("codes:", codes)print("uniques:", uniques) codes: [0 1 1 2 0]uniques: ['B' 'A' 'C'] ```

Note the following:

• the `codes` array maps the values in the input array to the `uniques` array.

• the unique values are ordered as they appear in the input array.

You can recreate the input array using `codes` and `uniques` like so:

``` uniques[codes] array(['B', 'A', 'A', 'C', 'B'], dtype=object) ```

## Specifying sort

By default, `sort=False`, which means that the returned array of unique values is not sorted.

To have the array of unique values sorted, set `sort=True` like so:

``` codes, uniques = pd.factorize(["B", "A", "A", "C", "B"], sort=True)print("codes:", codes)print("uniques:", uniques) codes: [1 0 0 2 1]uniques: ['A' 'B' 'C'] ```

Notice how the `uniques` are sorted, and the `codes` array also reflects this.

## Specifying na_sentinel

By default, `NaN` values are marked as `-1` in the `codes` array:

``` codes, uniques = pd.factorize(["B", np.NaN, "A", "C", "B"])print("codes:", codes)print("uniques:", uniques) codes: [ 0 -1 1 2 0]uniques: ['B' 'A' 'C'] ```

We can choose our own value by passing in `na_sentinel` like so:

``` codes, uniques = pd.factorize(["B", np.NaN, "A", "C", "B"], na_sentinel=50)print("codes:", codes)print("uniques:", uniques) codes: [ 0 50 1 2 0]uniques: ['B' 'A' 'C'] ```
Edited by 0 others
thumb_up
thumb_down
Ask a question or leave a feedback...
thumb_up
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!