# Pandas | factorize method

Programming
chevron_right
Python
chevron_right
Pandas
chevron_right
Documentation
chevron_right
General Functions
schedule Jul 1, 2022
Last updated
PythonPandas
Tags
expand_more

Pandas `factorize(~)` method returns the following:

• an array of integer indices to map the input array to the unique values.

• all the unique values of the input array.

# Parameters

1. `values`link | `sequence`

A 1D sequence of values.

2. `sort`link | `boolean` | `optional`

Whether or not to sort the resulting array of unique values. By default, `sort=False`.

3. `na_sentinel`link | `int` | `optional`

The value to mark `NaN` in the array of integer indices. By default, `na_sentinel=-1`.

# Return Value

The following two NumPy arrays are returned:

• an array of integer indices that maps the input array to the array of unique values.

• an array containing the unique values of the input array.

# Examples

## Basic usage

``` codes, uniques = pd.factorize(["B", "A", "A", "C", "B"])print("codes:", codes)print("uniques:", uniques) codes: [0 1 1 2 0]uniques: ['B' 'A' 'C'] ```

Note the following:

• the `codes` array maps the values in the input array to the `uniques` array.

• the unique values are ordered as they appear in the input array.

You can recreate the input array using `codes` and `uniques` like so:

``` uniques[codes] array(['B', 'A', 'A', 'C', 'B'], dtype=object) ```

## Specifying sort

By default, `sort=False`, which means that the returned array of unique values is not sorted.

To have the array of unique values sorted, set `sort=True` like so:

``` codes, uniques = pd.factorize(["B", "A", "A", "C", "B"], sort=True)print("codes:", codes)print("uniques:", uniques) codes: [1 0 0 2 1]uniques: ['A' 'B' 'C'] ```

Notice how the `uniques` are sorted, and the `codes` array also reflects this.

## Specifying na_sentinel

By default, `NaN` values are marked as `-1` in the `codes` array:

``` codes, uniques = pd.factorize(["B", np.NaN, "A", "C", "B"])print("codes:", codes)print("uniques:", uniques) codes: [ 0 -1 1 2 0]uniques: ['B' 'A' 'C'] ```

We can choose our own value by passing in `na_sentinel` like so:

``` codes, uniques = pd.factorize(["B", np.NaN, "A", "C", "B"], na_sentinel=50)print("codes:", codes)print("uniques:", uniques) codes: [ 0 50 1 2 0]uniques: ['B' 'A' 'C'] ```
