Navigate to

Aug 12, 2023
Last updated
local_offer
PythonPandas
Tags
expand_more
# When file contains a header row

Consider the following `my_data.txt` file:

``` A,B,C1,2,34,5,67,8,9 ```

To read `n` random lines using `read_csv(~)` in Pandas.

``` import randomdef get_num_lines(fname): with open(fname) as f: for i, _ in enumerate(f): pass return i + 1num_lines = get_num_lines("my_data.txt") - 1# How many randomn rows do you want?sample_size = 2rows_to_skip = random.sample(range(1,num_lines), num_lines-sample_size)df = pd.read_csv("my_data.txt", skiprows=rows_to_skip)df A B C0 1 2 31 7 8 9 ```

Note the following:

• we first start by fetching the total number of lines in the file. Since we have a header row in our file, we subtract the number by `1`. In this case, `num_lines=3`.

• we then use `random.sample(~)` method to randomly get the row numbers to skip.

• the first argument is the values to randomly select from. In this case, since `num_lines=3`, random integers between `1` (inclusive) and `3` (inclusive) is chosen. We used `range(1,_)` because the first line of the file is for column labels, and so we don't want to skip this row. In this case, it turned out that `rows_to_skip=`, which means that the second row is skipped.

• the second argument is the number of random integers you want.

# When file does not contain a header row

Consider the following `my_data.txt` file:

``` 1,2,34,5,67,8,9 ```

To read `n` random lines using `read_csv(~)`:

``` import randomdef get_num_lines(fname): with open(fname) as f: for i, _ in enumerate(f): pass return i + 1num_lines = get_num_lines("my_data.txt")# How many random rows do you want?sample_size = 2rows_to_skip = random.sample(range(num_lines), num_lines-sample_size)df = pd.read_csv("my_data.txt", skiprows=rows_to_skip, header=None)df 0 1 20 4 5 61 7 8 9 ```

Note the following:

• we first start by fetching the total number of lines in the file. In this case, `num_lines=3`.

• we then use `random.sample(~)` method to randomly get the row numbers to skip. In this case, it turns out that `rows_to_skip=`.

