The weights assigned to the items. An item with high weight is more likely to be selected. If the weights do not sum up to 1, then they are normalised so that the sum becomes 1. By default, weights=None, which means that equal weights are assigned.

5. random_statelink | int or numpy.random.RandomState | optional

The seed used to generate the random samples. This is used for reproducibility - if you'd like to get consistent results, then specify this parameter.

6. axislink | int or string | optional

Whether to return rows or columns:

Axis	Description
`0` or `"index"`	Rows will be returned.
`1` or `"columns"`	Columns will be returned.

By default, axis=0.

Return Value

A new DataFrame containing rows or columns selected at random.

Examples

Consider the following DataFrame:


        
        
            
                
                
                    df = pd.DataFrame({"A":["a","b","c","d"],"B":["e","f","g","h"],"C":["i","j","k","l"]})
df
                
            
               A  B  C
0  a  e  i
1  b  f  j
2  c  g  k
3  d  h  l

Basic usage

To get 2 rows in random:


        
        
            
                
                
                    df.sample(n=2)
                
            
               A  B  C
3  d  h  l
0  a  e  i

Specifying frac parameter

To make the sample size half of the total number of rows, set frac=0.5:


        
        
            
                
                
                    df.sample(frac=0.5)
                
            
               A  B  C
2  c  g  k
0  a  e  i

Here, 50% of the total number of rows is 2, so that is why we ended up with 2 rows.

Specifying replace parameter

To allow the same rows to be selected, set replace=True. This would mean that the following outcome may now be possible:


        
        
            
                
                
                    df.sample(n=2, replace=True)
                
            
               A  B  C
0  a  e  i
0  a  e  i

Specifying weights parameter

By default, all rows have an equal probability of getting selected. We can make certain rows more likely to be selected by setting the weights parameter, like so:


        
        
            
                
                
                    df.sample(n=1, weights=[0.7 ,0.1, 0.1, 0.1])
                
            
               A  B  C
0  a  e  i

Here, row 0 will get selected 70% of the time, and other rows will each get selected 10% of the time. Note that the sum of the weights need not be 1; the method will automatically normalise the weights so that the sum becomes 1.

Specifying random_state parameter

When you need to reproduce your results, set the random_state parameter, like so:


        
        
            
                
                
                    df.sample(n=2, random_state=42)
                
            
               A  B  C
1  b  f  j
3  d  h  l

Now, no matter how many times you run this method, the result will always be the same. You can give the number 42 to your friends, and they would also get the same result on their machines!

Specifying axis parameter

By default, rows will be returned in random:


        
        
            
                
                
                    df.sample(n=1)   # axis=0
                
            
               A  B  C
3  d  h  l

To get columns instead, set axis=1 like so:


        
        
            
                
                
                    df.sample(n=1, axis=1)
                
            
               B
0  e
1  f
2  g
3  h

Published by Isshin Inada

Edited by 0 others

Did you find this page useful?

thumb_up

thumb_down

Comment

Citation

Ask a question or leave a feedback...

Official Pandas Documentation

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sample.html

thumb_up

thumb_down

chat_bubble_outline

settings

Enjoy our search

Hit / to insta-search docs and recipes!