# Reading n random lines using read_csv in Pandas

*schedule*Jul 1, 2022

# When file contains a header row

Consider the following `my_data.txt`

file:

```
A,B,C1,2,34,5,67,8,9
```

To read `n`

random lines using `read_csv(~)`

in Pandas.

```
import random
```

def get_num_lines(fname): with open(fname) as f: for i, _ in enumerate(f): pass return i + 1

num_lines = get_num_lines("my_data.txt") - 1

# How many randomn rows do you want?sample_size = 2rows_to_skip = random.sample(range(1,num_lines), num_lines-sample_size)

df = pd.read_csv("my_data.txt", skiprows=rows_to_skip)df
A B C0 1 2 31 7 8 9

Note the following:

we first start by fetching the total number of lines in the file. Since we have a header row in our file, we subtract the number by

`1`

. In this case,`num_lines=3`

.we then use

`random.sample(~)`

method to randomly get the row numbers to skip.the first argument is the values to randomly select from. In this case, since

`num_lines=3`

, random integers between`1`

(inclusive) and`3`

(inclusive) is chosen. We used`range(1,_)`

because the first line of the file is for column labels, and so we don't want to skip this row. In this case, it turned out that`rows_to_skip=[2]`

, which means that the second row is skipped.the second argument is the number of random integers you want.

# When file does not contain a header row

Consider the following `my_data.txt`

file:

```
1,2,34,5,67,8,9
```

To read `n`

random lines using `read_csv(~)`

:

```
import random
```

def get_num_lines(fname): with open(fname) as f: for i, _ in enumerate(f): pass return i + 1

num_lines = get_num_lines("my_data.txt")

# How many random rows do you want?sample_size = 2rows_to_skip = random.sample(range(num_lines), num_lines-sample_size)

df = pd.read_csv("my_data.txt", skiprows=rows_to_skip, header=None)df
0 1 20 4 5 61 7 8 9

Note the following:

we first start by fetching the total number of lines in the file. In this case,

`num_lines=3`

.we then use

`random.sample(~)`

method to randomly get the row numbers to skip. In this case, it turns out that`rows_to_skip=[0]`

.

