PySpark SQL Functions | to_date method
Start your free 7-days trial now!
PySpark SQL Functions' to_date() method converts date strings to date types.
Parameters
1. col | Column
The date string column.
2. format | string
The format of the date string.
Return Value
A PySpark Column.
Examples
Consider the following PySpark DataFrame with some date strings:
+----+----------+|name| birthday|+----+----------+|Alex|1995-12-16|| Bob|1998-05-06|+----+----------+
Converting date strings to date type in PySpark
To convert date strings in the birthday column to actual date type, use to_date(~) and specify the pattern of the date string:
from pyspark.sql import functions as F
root |-- name: string (nullable = true) |-- birthday: date (nullable = true)
Here, the withColumn(~) method is used to update the birthday column using the new column returned by to_date(~).
As another example, here's a PySpark DataFrame with slightly more complicated date strings:
df = spark.createDataFrame([["Alex", "1995/12/16 16:20:20"], ["Bob", "1998/05/06 18:56:10"]], ["name", "birthday"])
+----+----------+|name| birthday|+----+----------+|Alex|1995-12-16|| Bob|1998-05-06|+----+----------+
Here, our date strings also contain hours, minutes and seconds.
To convert the birthday column to date type:
+----+----------+|name| birthday|+----+----------+|Alex|1995-12-16|| Bob|1998-05-06|+----+----------+
Here, notice how information about the hours, minutes and seconds unit have been lost during the type conversion.