PySpark DataFrame | corr method
corr(~) method returns the correlation of the specified numeric columns as a float.
The first column.
The second column.
The type of correlation to compute. The only correlation type supported currently is the Pearson Correlation Coefficient.
Consider the following PySpark DataFrame:
Computing the correlation of two numeric PySpark columns
To compute the correlation between the
Here, we see that the
weight are positively correlated with a Pearson correlation coefficient of around
cov(~)method returns the covariance of two specified numeric columns as a float.