What does this mean?
Why is this true?
Give me some examples!
# Getting shortest and longest strings in Pandas DataFrame

schedule Aug 11, 2023
PythonPandas
Consider the following Pandas DataFrame:

``` import pandas as pddf = pd.DataFrame({'vals':['aa','bbbb','ccc','dddd','ee']})df.head() vals0 aa1 bbbb2 ccc3 dddd4 ee ```

# Getting the shortest strings in a Pandas column

To get the shortest strings in the `vals` column:

``` s_length = df['vals'].str.len()bool_mask = (s_length == s_length.min())df['vals'][bool_mask] 0 aa4 eeName: vals, dtype: object ```

Here, we first obtain a `Series` holding the length of each string using the `str.len()` method:

``` s_length = df['vals'].str.len()s_length 0 21 42 33 44 2Name: vals, dtype: int64 ```

We then compute the minimum string length using `s_length.min()`, and create a boolean mask where `True` corresponds to strings that are shortest:

``` bool_mask = (s_length == s_length.min())bool_mask 0 True1 False2 False3 False4 TrueName: vals, dtype: bool ```

Finally, we use the `[~]` notation to fetch the values in the `vals` column corresponding to `True`:

``` df['vals'][bool_mask] # Returns a Series 0 aa4 eeName: vals, dtype: object ```

We could also fetch the rows who have the shortest `vals` value as a DataFrame like so:

``` df[bool_mask] # Returns a DataFrame vals0 aa4 ee ```

# Getting the longest strings in a Pandas column

The logic for getting the longest strings is very similar to getting the shortest strings:

``` s_length = df['vals'].str.len()bool_mask = (s_length == s_length.max())df['vals'][bool_mask] # Returns a Series 1 bbbb3 ddddName: vals, dtype: object ```

Here, we use the `max()` instead of `min()` to compute the length of the longest string.

Again, to get the rows whose `vals` string value is longest as a DataFrame instead of a Series:

``` df[bool_mask] # Returns a DataFrame vals1 bbbb3 dddd ```
