search
Search
Login
Math ML Join our weekly DS/ML newsletter
menu
menu search toc more_vert
Robocat
Guest 0reps
Thanks for the thanks!
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
help Ask a question
Share on Twitter
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
A
A
brightness_medium
share
arrow_backShare
Twitter
Facebook

Pandas DataFrame | sample method

Pandas
chevron_right
Documentation
chevron_right
DataFrame
chevron_right
Data Selection and Renaming
schedule Jul 1, 2022
Last updated
local_offer PythonPandas
Tags

Pandas DataFrame.sample(~) method returns the specified number of rows or columns randomly. Note that a new copy is returned, that is, modifying the returned DataFrame will not mutate the source DataFrame.

Parameters

1. n | int | optional

The size of the random sample. By default, n=1.

2. fraclink | float | optional

The relative size of the random sample. For instance, frac=0.6 means that the size of the random sample would be 60% of the total number of values.

WARNING

Only specify either n or frac - not both.

3. replacelink | boolean | optional

Whether or not to allow sampling from the same row. By default, replace=False.

4. weightslink | string or array-like | optional

The weights assigned to the items. An item with high weight is more likely to be selected. If the weights do not sum up to 1, then they are normalised so that the sum becomes 1. By default, weights=None, which means that equal weights are assigned.

5. random_statelink | int or numpy.random.RandomState | optional

The seed used to generate the random samples. This is used for reproducibility - if you'd like to get consistent results, then specify this parameter.

6. axislink | int or string | optional

Whether to return rows or columns:

Axis

Description

Rows will be returned.

0 or "index"

Columns will be returned.

1 or "columns"

By default, axis=0.

Return Value

A new DataFrame containing rows or columns selected at random.

Examples

Consider the following DataFrame:

df = pd.DataFrame({"A":["a","b","c","d"],"B":["e","f","g","h"],"C":["i","j","k","l"]})
df
   A  B  C
0  a  e  i
1  b  f  j
2  c  g  k
3  d  h  l

Basic usage

To get 2 rows in random:

df.sample(n=2)
   A  B  C
3  d  h  l
0  a  e  i

Specifying frac parameter

To make the sample size half of the total number of rows, set frac=0.5:

df.sample(frac=0.5)
   A  B  C
2  c  g  k
0  a  e  i

Here, 50% of the total number of rows is 2, so that is why we ended up with 2 rows.

Specifying replace parameter

To allow the same rows to be selected, set replace=True. This would mean that the following outcome may now be possible:

df.sample(n=2, replace=True)
   A  B  C
0  a  e  i
0  a  e  i

Specifying weights parameter

By default, all rows have an equal probability of getting selected. We can make certain rows more likely to be selected by setting the weights parameter, like so:

df.sample(n=1, weights=[0.7 ,0.1, 0.1, 0.1])
   A  B  C
0  a  e  i

Here, row 0 will get selected 70% of the time, and other rows will each get selected 10% of the time. Note that the sum of the weights need not be 1; the method will automatically normalise the weights so that the sum becomes 1.

Specifying random_state parameter

When you need to reproduce your results, set the random_state parameter, like so:

df.sample(n=2, random_state=42)
   A  B  C
1  b  f  j
3  d  h  l

Now, no matter how many times you run this method, the result will always be the same. You can give the number 42 to your friends, and they would also get the same result on their machines!

Specifying axis parameter

By default, rows will be returned in random:

df.sample(n=1)   # axis=0
   A  B  C
3  d  h  l

To get columns instead, set axis=1 like so:

df.sample(n=1, axis=1)
   B
0  e
1  f
2  g
3  h
mail
Join our newsletter for updates on new DS/ML comprehensive guides (spam-free)
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Ask a question or leave a feedback...
thumb_up
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!