df = spark.createDataFrame([["Alex", 20], ["Bob", 30]], ["name", "age"])
df.show()
                
            
            +----+---+
|name|age|
+----+---+
|Alex| 20|
| Bob| 30|
+----+---+

Arranging columns in specific order in PySpark

To arrange the columns from age first and name second:


        
        
            
                
                
                    df.toDF("age", "name").show()
                
            
            +----+----+
| age|name|
+----+----+
|Alex|  20|
| Bob|  30|
+----+----+

Note that if the columns of the new DataFrame do not match the original DataFrame, then an error will be thrown:


        
        
            
                
                
                    df.toDF("age").show()
                
            
            IllegalArgumentException: requirement failed: The number of columns doesn't match.
Old column names (2): name, age
New column names (1): age

Arrange columns in alphabetical order in PySpark

To arrange the columns in alphabetical order:


        
        
            
                
                
                    df.toDF(*sorted(df.columns)).show()
                
            
            +----+----+
| age|name|
+----+----+
|Alex|  20|
| Bob|  30|
+----+----+

Here:

sorted(~) returns the column labels in alphabetical order.
the * is used to convert the list into positional arguments.

Published by Isshin Inada

Edited by 0 others

Did you find this page useful?

thumb_up

thumb_down

Comment

Citation

Ask a question or leave a feedback...

Official PySpark Documentation

https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.toDF.html

thumb_up

thumb_down

chat_bubble_outline

settings

Enjoy our search

Hit / to insta-search docs and recipes!