# Getting the longest string in a column in Pandas DataFrame

Pandas
chevron_right
Cookbooks
chevron_right
DataFrame Cookbooks
chevron_right
Selecting Data Cookbook
schedule Jul 26, 2022
Last updated
local_offer PythonPandas
Tags
expand_more

Consider the following DataFrame:

``` import pandas as pddf = pd.DataFrame({"A":["a","abc","def"]})df A0 a1 abc2 def ```

# Solution

To get the longest string in column `A`:

``` import numpy as nplengths = df["A"].str.len()argmax = np.where(lengths == lengths.max())df.iloc[argmax] A1 abc2 def ```

Here, the returned value is a `DataFrame`. If you wanted a list of the longest strings instead:

``` df["A"].iloc[argmax].to_numpy().ravel().tolist() ['abc', 'def'] ```

Note the following:

# Explanation

We first start by computing the length of each string in column `A` using the Series' `str.len()` method:

``` lengths = df["A"].str.len() # returns a Serieslengths 0 11 32 3Name: A, dtype: int64 ```

We then get a Series of booleans where `True` indicates the position of the longest strings:

``` lengths == lengths.max() 0 False1 True2 TrueName: A, dtype: bool ```

We then use NumPy's `where(~)` to get all the integer indexes of `True`:

``` argmax = np.where(lengths == lengths.max())argmax array([1, 2]) ```

The `` is needed at the back because `where(~)` returns a tuple where the first element is the integer indexes.

Finally, we use the `iloc` property to extract the rows in `df` given these integer indexes:

``` df.iloc[argmax] A1 abc2 def ```

# Why argmax method does not work

You may be tempted to directly use the Series' `argmax(~)` method like so:

``` df.iloc[df["A"].str.loc().argmax()] A abcName: 1, dtype: object ```

However, the problem with `argmax(~)` is that it only returns the integer index of the first occurrence of the maximum, as demonstrated in the output above.

