menu

login

Log in

Linear Algebra

Prob and Stats

Other math topics

Machine Learning

Dagster (NEW)

search

Search

Login

Unlock 100+ guides

menu

menu

search toc

close

Outline

Comments

Log in or sign up

Cancel

Post

account_circle

exit_to_app

Sign out

What does this mean?

Why is this true?

Give me some examples!

search

keyboard_voice

close

Searching Tips

Search for a recipe:
"Creating a table in MySQL"

Search for an API documentation: "@append"

Search for code: "!dataframe"

Apply a tag filter: "#python"

Useful Shortcuts

/ to open search panel

Esc to close search panel

↑↓ to navigate between search results

⌘d to clear all current filters

⌘Enter to expand content preview

icon_star

Doc Search

icon_star

Code Search Beta

SORRY NOTHING FOUND!

mic

Start speaking...

Voice search is only supported in Safari and Chrome.

fullscreen_exit

Shrink

Navigate to

Pandas

655 guides

keyboard_arrow_down

Linear Algebra

Prob and Stats

Machine Learning

Other math topics

chevron_leftRow and Column Operations Cookbook

Adding a column that contains the difference in consecutive rows Adding a constant number to DataFrame columns Adding an empty column to a DataFrame Adding column to DataFrame with constant values Adding new columns to a DataFrame Appending rows to a DataFrame Applying a function that takes as input multiple column values Applying a function to a single column of a DataFrame Changing column type to categorical Changing the name of a DataFrame's index Changing the order of columns in a DataFrame Changing the type of a DataFrame's index Changing the type of a DataFrame's column Checking if a column exists in a DataFrame Checking if a DataFrame column contains some values Checking if a value exists in a DataFrame in Pandas Checking if column is numeric Checking the data type of columns Checking whether column values match or contain a pattern Combining two columns as a single column of tuples Combining two columns of type string in a DataFrame Computing the average of columns Computing the correlation between columns Concatenating DataFrames horizontally Concatenating DataFrames vertically Converting a row to column labels Converting categorical type to int Converting column to list Converting Index to list Converting percent strings into numeric Converting the index of a DataFrame into a column Counting duplicate rows Counting number of rows with no missing values Counting the occurrence of values in columns Counting unique values in a column of a DataFrame Counting unique values in rows of a DataFrame Creating a new column based on other columns Creating new column using if, elif and else Describing certain columns Dropping columns whose label contains a substring Getting column values based on another column values in a DataFrame in Pandas Getting columns as a copy Getting columns whose label contains a substring Getting maximum value in columns Getting maximum value of entire DataFrame Getting mean of columns Getting median of columns Getting minimum value in columns Getting row label when calling apply Getting row labels as list Getting rows where column value contains any substring in a list Getting the name of index Getting type of index Grouping DataFrame rows into lists Inserting column at a specific location Iterating over each column of a DataFrame Iterating over each row of a DataFrame Modifying rows of a DataFrame Modifying values in Index Removing columns from a DataFrame Removing columns using column labels Removing columns using integer index Removing columns with all missing values Removing columns with some missing values Removing duplicate columns Removing duplicate rows Removing first n rows of a DataFrame Removing multiple columns Removing prefix from column labels Removing rows at random without shuffling Removing rows from a DataFrame based on column values Removing rows using integer index Removing rows with all zeros Removing suffix from column labels Renaming columns of a DataFrame Replacing substring in column values Returning multiple columns using the apply function Reversing the order of rows Setting a new index of a DataFrame Setting an existing column as the new index Setting column as the index Setting integers as column labels Showing all column labels Shuffling the rows of a DataFrame Sorting a DataFrame by column Sorting a DataFrame by index Sorting DataFrame alphabetically Sorting DataFrame by column labels Splitting a column of strings into multiple columns Splitting column of lists into multiple columns Splitting dictionary into separate columns Stripping substrings from values in columns Stripping whitespace from columns Stripping whitespaces in column labels Summing a column of a DataFrame Summing rows of specific columns Swapping the rows and columns of a DataFrame Unstacking certain columns only Updating a row while iterating over the rows of a DataFrame Updating rows based on column values Using apply method in parallel

check_circle

Mark as learned

thumb_up

0

thumb_down

0

chat_bubble_outline

0

Comment

auto_stories Bi-column layout

settings

Using apply method in parallel to Pandas DataFrame

schedule Aug 10, 2023

Last updated

local_offer

Python●Pandas

Tags

tocTable of Contents

expand_more

Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

Pandas' apply(~) method uses a single core, which means that a single thread is used to perform this method. If your machine has multiple cores, then you would be able to execute the apply(~) method in parallel.

To run apply(~) in parallel, use Dask, which is an easy-to-use library that performs Pandas' operations in parallel by splitting up the DataFrame into smaller partitions.

Consider the following Pandas DataFrame with one million rows:


        
        
            
                
                
                    import numpy as np
import pandas as pd
rng = np.random.default_rng(seed=42)
df = pd.DataFrame({'A':rng.uniform(0,5,1000000)})
df.head()
                
            
               A
0  3.869780
1  2.194392
2  4.292990
3  3.486840
4  0.470887

To convert a Pandas DataFrame into a Dask DataFrame with 5 partitions:


        
        
            
                
                
                    import dask.dataframe as dd

ddf = dd.from_pandas(df, npartitions=5)
ddf
                
            
                            A
npartitions=5  
0               float64
200000          ...
...             ...
800000          ...
999999          ...
Dask Name: from_pandas, 5 tasks

Performing apply(~) in parallel:


        
        
            
                
                
                    def foo(row, a, x=10):
    return (row.sum() + a) * x

# axis=1 means that we are applying a function row-wise
# meta='float' means that the type of the resulting Dask Series is float
dask_series = ddf.apply(foo, axis=1, args=(10,), x=100, meta='float')  
ddf['B'] = dask_series

# Convert Dask DataFrame back to Pandas DataFrame
df_new = ddf.compute()
df_new.head()
                
            
               A         B
0  3.869780  1386.978024
1  2.194392  1219.439220
2  4.292990  1429.298960
3  3.486840  1348.684015
4  0.470887  1047.088674

Dask's apply(~) method takes 19 seconds to complete, while Pandas' apply(~) method takes 50 seconds - that's more than x2 speedup!

Related

Pandas DataFrame | apply method

Pandas DataFrame.apply(~) applies the specified function to each row or column of the DataFrame.

chevron_right

robocat

Published by Isshin Inada

Edited by 0 others

Did you find this page useful?

thumb_up

thumb_down

Comment

Citation

Ask a question or leave a feedback...

thumb_up

0

thumb_down

0

chat_bubble_outline

0

settings

Enjoy our search

Hit / to insta-search docs and recipes!

Navigation

Contact us

Resources

Python Pandas MySQL Beautiful Soup Matplotlib NumPy PySpark

Community

Join our Discord

Join our newsletter for updates on new comprehensive DS/ML guides

|