PySpark DataFrame | collect method
Start your free 7-days trial now!
collect() method returns all the records of the DataFrame as a list of
A list of
Consider the following PySpark DataFrame:
Getting all rows of the PySpark DataFrame as a list of Row objects
To get all the rows as a list of
df.collect()[Row(name='Alex', age=25), Row(name='Bob', age=30)]
Under the hood, the
collect(~) method sends all the data scattered across the worker nodes to the main deriver node. This means that if the size of the data is large, then the driver program will run out of memory and throw an error.