chevron_left
PySpark DataFrame
Method aliasMethod coalesceMethod collectMethod colRegexMethod corrMethod countMethod covMethod describeMethod distinctMethod dropMethod dropDuplicatesMethod dropnaMethod exceptAllMethod fillnaMethod filterMethod foreachMethod groupByMethod headMethod intersectMethod intersectAllMethod joinMethod limitMethod orderByMethod printSchemaMethod repartitionMethod replaceMethod sampleMethod sampleByMethod selectMethod selectExprMethod showMethod sortMethod summaryMethod tailMethod takeMethod toDFMethod toJSONMethod toPandasMethod transformMethod unionMethod unionByNameMethod whereMethod withColumnMethod withColumnRenamedProperty columnsProperty dtypesProperty rdd
0
0
0
new
PySpark DataFrame | describe method
Machine Learning
chevron_rightPySpark
chevron_rightDocumentation
chevron_rightPySpark DataFrame
schedule Jun 17, 2022
Last updated PySpark
Tags tocTable of Contents
expand_more PySpark DataFrame's describe(~)
method returns a new PySpark DataFrame holding summary statistics of the specified columns.
Parameters
1. *cols
| string
| optional
By default, all numeric and string columns will be described.
Return Value
A PySpark DataFrame.
Examples
Consider the following PySpark DataFrame:
+----+---+|name|age|+----+---+|Alex| 20|| Bob| 25|| Bob| 30|+----+---+
Getting summary statistics of certain columns in PySpark DataFrame
To get the summary statistics of the name
and age
columns:
+-------+----+----+|summary|name| age|+-------+----+----+| count| 3| 3|| mean|null|25.0|| stddev|null| 5.0|| min|Alex| 20|| max| Bob| 30|+-------+----+----+
Getting summary statistics of all numeric and string columns in PySpark DataFrame
To get the summary statistics of all numeric and string columns:
+-------+----+----+|summary|name| age|+-------+----+----+| count| 3| 3|| mean|null|25.0|| stddev|null| 5.0|| min|Alex| 20|| max| Bob| 30|+-------+----+----+
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
Ask a question or leave a feedback...
Official PySpark Documentation
https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrame.describe.html