search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

PySpark DataFrame | orderBy method

schedule Aug 12, 2023
Last updated
local_offer
PySpark
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

PySpark DataFrame's orderBy(~) method returns a new DataFrame that is sorted based on the specified columns.

Parameters

1. cols | string or list or Column | optional

A column or columns by which to sort.

2. ascending | boolean or list of boolean | optional

  • If True, then the sort will be in ascending order.

  • If False, then the sort will be in descending order.

  • If a list of booleans is passed, then sort will respect this order. For example, if [True,False] is passed and cols=["colA","colB"], then the DataFrame will first be sorted in ascending order of colA, and then in descending order of colB. Note that the second sort will be relevant only when there are duplicate values in colA.

By default, ascending=True.

Return Value

A PySpark DataFrame (pyspark.sql.dataframe.DataFrame).

Examples

Consider the following PySpark DataFrame:

df = spark.createDataFrame([["Alex", 22, 200], ["Bob", 24, 300], ["Cathy", 22, 100]], ["name", "age", "salary"])
df.show()
+-----+---+------+
| name|age|salary|
+-----+---+------+
| Alex| 22| 200|
| Bob| 24| 300|
|Cathy| 22| 100|
+-----+---+------+

Sorting PySpark DataFrame by single column in ascending order

To sort by age in ascending order:

df.orderBy("age").show()
+-----+---+------+
| name|age|salary|
+-----+---+------+
| Alex| 22| 200|
|Cathy| 22| 100|
| Bob| 24| 300|
+-----+---+------+

Sorting PySpark DataFrame by multiple columns in ascending order

To sort by age, and then by salary (both by ascending order):

df.orderBy(["age","salary"]).show()
+-----+---+------+
| name|age|salary|
+-----+---+------+
|Cathy| 22| 100|
| Alex| 22| 200|
| Bob| 24| 300|
+-----+---+------+

Sorting PySpark DataFrame by descending order

To sort by descending order, set ascending=False:

df.orderBy("age", ascending=False).show()
+-----+---+------+
| name|age|salary|
+-----+---+------+
| Bob| 24| 300|
| Alex| 22| 200|
|Cathy| 22| 100|
+-----+---+------+
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...