search
Search
Login
Math ML Join our weekly DS/ML newsletter
menu
menu search toc more_vert
Robocat
Guest 0reps
Thanks for the thanks!
close
chevron_left Creating DataFrames Cookbook
Combining multiple Series into a DataFrameCombining multiple Series to form a DataFrameConverting a Series to a DataFrameConverting list of lists into DataFrameConverting list to DataFrameConverting percent string into a numeric for read_csvConverting scikit-learn dataset to Pandas DataFrameConverting string data into a DataFrameCreating a DataFrame from a stringCreating a DataFrame using listsCreating a DataFrame with different type for each columnCreating a DataFrame with empty valuesCreating a DataFrame with missing valuesCreating a DataFrame with random numbersCreating a DataFrame with zerosCreating a MultiIndex DataFrameCreating a Pandas DataFrameCreating a single DataFrame from multiple filesCreating empty DataFrame with only column labelsFilling missing values when using read_csvImporting DatasetImporting tables from PostgreSQL as Pandas DataFramesInitialising a DataFrame using a constantInitialising a DataFrame using a dictionaryInitialising a DataFrame using a list of dictionariesInserting lists into a DataFrame cellKeeping leading zeroes when using read_csvParsing dates when using read_csvPreventing strings from getting parsed as NaN for read_csvReading data from GitHubReading file without headerReading large CSV files in chunksReading n random lines using read_csvReading space-delimited filesReading specific columns from fileReading tab-delimited filesReading the first few lines of a file to create DataFrameReading the last n lines of a fileReading URL using read_csvReading zipped csv file as a DataFrameRemoving Unnamed:0 columnResolving ParserError: Error tokenizing dataSaving DataFrame as zipped csvSkipping rows without skipping header for read_csvSpecifying data type for read_csvTreating missing values as empty strings rather than NaN for read_csv
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
help Ask a question
Share on Twitter
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
A
A
brightness_medium
share
arrow_backShare
Twitter
Facebook
chevron_left Creating DataFrames Cookbook
Combining multiple Series into a DataFrameCombining multiple Series to form a DataFrameConverting a Series to a DataFrameConverting list of lists into DataFrameConverting list to DataFrameConverting percent string into a numeric for read_csvConverting scikit-learn dataset to Pandas DataFrameConverting string data into a DataFrameCreating a DataFrame from a stringCreating a DataFrame using listsCreating a DataFrame with different type for each columnCreating a DataFrame with empty valuesCreating a DataFrame with missing valuesCreating a DataFrame with random numbersCreating a DataFrame with zerosCreating a MultiIndex DataFrameCreating a Pandas DataFrameCreating a single DataFrame from multiple filesCreating empty DataFrame with only column labelsFilling missing values when using read_csvImporting DatasetImporting tables from PostgreSQL as Pandas DataFramesInitialising a DataFrame using a constantInitialising a DataFrame using a dictionaryInitialising a DataFrame using a list of dictionariesInserting lists into a DataFrame cellKeeping leading zeroes when using read_csvParsing dates when using read_csvPreventing strings from getting parsed as NaN for read_csvReading data from GitHubReading file without headerReading large CSV files in chunksReading n random lines using read_csvReading space-delimited filesReading specific columns from fileReading tab-delimited filesReading the first few lines of a file to create DataFrameReading the last n lines of a fileReading URL using read_csvReading zipped csv file as a DataFrameRemoving Unnamed:0 columnResolving ParserError: Error tokenizing dataSaving DataFrame as zipped csvSkipping rows without skipping header for read_csvSpecifying data type for read_csvTreating missing values as empty strings rather than NaN for read_csv
check_circle
Mark as learned
thumb_up
1
thumb_down
0
chat_bubble_outline
0
auto_stories new
settings

Reading n random lines using read_csv in Pandas

Pandas
chevron_right
Cookbooks
chevron_right
DataFrame Cookbooks
chevron_right
Creating DataFrames Cookbook
schedule Jul 1, 2022
Last updated
local_offer PythonPandas
Tags

When file contains a header row

Consider the following my_data.txt file:

A,B,C
1,2,3
4,5,6
7,8,9

To read n random lines using read_csv(~) in Pandas.

import random

def get_num_lines(fname):
with open(fname) as f:
for i, _ in enumerate(f):
pass
return i + 1

num_lines = get_num_lines("my_data.txt") - 1

# How many randomn rows do you want?
sample_size = 2
rows_to_skip = random.sample(range(1,num_lines), num_lines-sample_size)

df = pd.read_csv("my_data.txt", skiprows=rows_to_skip)
df
A B C
0 1 2 3
1 7 8 9

Note the following:

  • we first start by fetching the total number of lines in the file. Since we have a header row in our file, we subtract the number by 1. In this case, num_lines=3.

  • we then use random.sample(~) method to randomly get the row numbers to skip.

    • the first argument is the values to randomly select from. In this case, since num_lines=3, random integers between 1 (inclusive) and 3 (inclusive) is chosen. We used range(1,_) because the first line of the file is for column labels, and so we don't want to skip this row. In this case, it turned out that rows_to_skip=[2], which means that the second row is skipped.

    • the second argument is the number of random integers you want.

When file does not contain a header row

Consider the following my_data.txt file:

1,2,3
4,5,6
7,8,9

To read n random lines using read_csv(~):

import random

def get_num_lines(fname):
with open(fname) as f:
for i, _ in enumerate(f):
pass
return i + 1

num_lines = get_num_lines("my_data.txt")

# How many random rows do you want?
sample_size = 2
rows_to_skip = random.sample(range(num_lines), num_lines-sample_size)

df = pd.read_csv("my_data.txt", skiprows=rows_to_skip, header=None)
df
0 1 2
0 4 5 6
1 7 8 9

Note the following:

  • we first start by fetching the total number of lines in the file. In this case, num_lines=3.

  • we then use random.sample(~) method to randomly get the row numbers to skip. In this case, it turns out that rows_to_skip=[0].

mail
Join our newsletter for updates on new DS/ML comprehensive guides (spam-free)
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Ask a question or leave a feedback...
thumb_up
1
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!