**Pandas**

655 guides

*chevron_left*

**Miscellaneous Cookbook**

Adjusting number of rows that are printedAppending DataFrame to an existing CSV fileChecking differences between two indexesChecking if a DataFrame is emptyChecking if a variable is a DataFrameChecking if index is sortedChecking if value exists in IndexChecking memory usage of DataFrameChecking whether a Pandas object is a view or a copyConcatenating a list of DataFramesConverting a DataFrame to a listConverting a DataFrame to a SeriesConverting DataFrame to a list of dictionariesConverting DataFrame to list of tuplesCounting the number of negative valuesCreating a DataFrame using cartesian product of two DataFramesDisplaying DataFrames side by sideDisplaying full non-truncated DataFrame valuesDrawing frequency histogram of DataFrame columnExporting Pandas DataFrame to PostgreSQL tableHighlighting a particular cell of a DataFrameHighlighting DataFrame cell based on valueHow to solve "ValueError: If using all scalar values, you must pass an index"Importing BigQuery table as Pandas DataFramePlotting two columns of DataFramePrinting DataFrame on a single linePrinting DataFrame without indexPrinting DataFrames in tabular formatRandomly splitting DataFrame into multiple DataFrames of equal sizeReducing DataFrame memory sizeSaving a DataFrame as a CSV fileSaving DataFrame as Excel fileSaving DataFrame as feather fileSetting all values to zeroShowing all dtypes without truncationSplitting DataFrame into multiple DataFrames based on valueSplitting DataFrame into smaller equal-sized DataFramesWriting DataFrame to SQLite

check_circle

Mark as learned thumb_up

0

thumb_down

0

chat_bubble_outline

0

Comment auto_stories Bi-column layout

settings

# Randomly splitting DataFrame into multiple DataFrames of equal size in Pandas

*schedule*Aug 10, 2023

local_offer

Tags Python●Pandas

*toc*Table of Contents

*expand_more*

Master the

Start your free 7-days trial now!

**mathematics behind data science**with 100+ top-tier guidesStart your free 7-days trial now!

Consider the following DataFrame:

```
df = pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[9,10,11,12]})df
A B C0 1 5 91 2 6 102 3 7 113 4 8 12
```

# Solution

To randomly split `df`

into two DataFrames of equal size:

```
df_shuffled = df.sample(frac=1)df_splits = np.array_split(df_shuffled, 2)for df in df_splits: display(df)
A B C2 3 7 111 2 6 10 A B C0 1 5 93 4 8 12
```

Note the following:

we first use DataFrame's

`sample(~)`

method to randomly shuffle the rows. The`frac=1`

means we want all rows returned.we then use NumPy's

`array_split(~,2)`

method to split the DataFrame into 2 equally sized sub-DataFrames. The return type is a list of DataFrames.

# Case when equally-sized DataFrame is not possible

When the number of splits do not evenly divide the number of rows, then the resulting DataFrames will not all be of equal size:

```
df_shuffled = df.sample(frac=1)df_splits = np.array_split(df_shuffled, 3)for df in df_splits: display(df)
A B C2 3 7 111 2 6 10 A B C3 4 8 12 A B C0 1 5 9
```

Published by Isshin Inada

Edited by 0 others

Did you find this page useful?

thumb_up

thumb_down

Comment

Citation

Ask a question or leave a feedback...

thumb_up

0

thumb_down

0

chat_bubble_outline

0

settings

Enjoy our search

Hit / to insta-search docs and recipes!