Pandas
keyboard_arrow_down 655 guides
chevron_leftMiscellaneous Cookbook
Adjusting number of rows that are printedAppending DataFrame to an existing CSV fileChecking differences between two indexesChecking if a DataFrame is emptyChecking if a variable is a DataFrameChecking if index is sortedChecking if value exists in IndexChecking memory usage of DataFrameChecking whether a Pandas object is a view or a copyConcatenating a list of DataFramesConverting a DataFrame to a listConverting a DataFrame to a SeriesConverting DataFrame to a list of dictionariesConverting DataFrame to list of tuplesCounting the number of negative valuesCreating a DataFrame using cartesian product of two DataFramesDisplaying DataFrames side by sideDisplaying full non-truncated DataFrame valuesDrawing frequency histogram of DataFrame columnExporting Pandas DataFrame to PostgreSQL tableHighlighting a particular cell of a DataFrameHighlighting DataFrame cell based on valueHow to solve "ValueError: If using all scalar values, you must pass an index"Importing BigQuery table as Pandas DataFramePlotting two columns of DataFramePrinting DataFrame on a single linePrinting DataFrame without indexPrinting DataFrames in tabular formatRandomly splitting DataFrame into multiple DataFrames of equal sizeReducing DataFrame memory sizeSaving a DataFrame as a CSV fileSaving DataFrame as Excel fileSaving DataFrame as feather fileSetting all values to zeroShowing all dtypes without truncationSplitting DataFrame into multiple DataFrames based on valueSplitting DataFrame into smaller equal-sized DataFramesWriting DataFrame to SQLite
check_circle
Mark as learned thumb_up
0
thumb_down
0
chat_bubble_outline
0
Comment auto_stories Bi-column layout
settings
Randomly splitting DataFrame into multiple DataFrames of equal size in Pandas
schedule Aug 10, 2023
Last updated local_offer
Tags Python●Pandas
tocTable of Contents
expand_more Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!
Start your free 7-days trial now!
Consider the following DataFrame:
df = pd.DataFrame({"A":[1,2,3,4],"B":[5,6,7,8],"C":[9,10,11,12]})df
A B C0 1 5 91 2 6 102 3 7 113 4 8 12
Solution
To randomly split df
into two DataFrames of equal size:
df_shuffled = df.sample(frac=1)df_splits = np.array_split(df_shuffled, 2)for df in df_splits: display(df)
A B C2 3 7 111 2 6 10 A B C0 1 5 93 4 8 12
Note the following:
we first use DataFrame's
sample(~)
method to randomly shuffle the rows. Thefrac=1
means we want all rows returned.we then use NumPy's
array_split(~,2)
method to split the DataFrame into 2 equally sized sub-DataFrames. The return type is a list of DataFrames.
Case when equally-sized DataFrame is not possible
When the number of splits do not evenly divide the number of rows, then the resulting DataFrames will not all be of equal size:
df_shuffled = df.sample(frac=1)df_splits = np.array_split(df_shuffled, 3)for df in df_splits: display(df)
A B C2 3 7 111 2 6 10 A B C3 4 8 12 A B C0 1 5 9
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
thumb_up
0
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!