search
Search
Map of Data Science
Guest 0reps
exit_to_appLog out
Map of data science
Thanks for the thanks!
close
account_circle
Profile
exit_to_app
Sign out
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview Doc Search Code Search Beta SORRY NOTHING FOUND!
mic
Start speaking... Voice search is only supported in Safari and Chrome.
Shrink
Navigate to
A
A
brightness_medium
share
arrow_backShare Twitter Facebook

# Randomly splitting DataFrame into multiple DataFrames of equal size in Pandas

Pandas
chevron_right
Cookbooks
chevron_right
DataFrame Cookbooks
chevron_right
Miscellaneous Cookbook
schedule Jul 1, 2022
Last updated
local_offer PythonPandas
Tags
expand_more
map
Check out the interactive map of data science

Consider the following DataFrame:

``` df = pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[9,10,11,12]})df A B C0 1 5 91 2 6 102 3 7 113 4 8 12 ```

# Solution

To randomly split `df` into two DataFrames of equal size:

``` df_shuffled = df.sample(frac=1)df_splits = np.array_split(df_shuffled, 2)for df in df_splits: display(df) A B C2 3 7 111 2 6 10 A B C0 1 5 93 4 8 12 ```

Note the following:

• we first use DataFrame's `sample(~)` method to randomly shuffle the rows. The `frac=1` means we want all rows returned.

• we then use NumPy's `array_split(~,2)` method to split the DataFrame into 2 equally sized sub-DataFrames. The return type is a list of DataFrames.

# Case when equally-sized DataFrame is not possible

When the number of splits do not evenly divide the number of rows, then the resulting DataFrames will not all be of equal size:

``` df_shuffled = df.sample(frac=1)df_splits = np.array_split(df_shuffled, 3)for df in df_splits: display(df) A B C2 3 7 111 2 6 10 A B C3 4 8 12 A B C0 1 5 9 ```