search
Search
Login
Math ML Join our weekly DS/ML newsletter
menu
menu search toc more_vert
Robocat
Guest 0reps
Thanks for the thanks!
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
help Ask a question
Share on Twitter
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
A
A
brightness_medium
share
arrow_backShare
Twitter
Facebook

PySpark DataFrame | toJSON method

Machine Learning
chevron_right
PySpark
chevron_right
Documentation
chevron_right
PySpark DataFrame
schedule Jul 1, 2022
Last updated
local_offer PySpark
Tags

PySpark DataFrame's toJSON(~) method converts the DataFrame into a string-typed RDD. When the RDD data is extracted, each row of the DataFrame will be converted into a string JSON. Consult the examples below for clarification.

Parameters

1. use_unicode | boolean

Whether to use unicode during the conversion. By default, use_unicode=True.

Return Value

A MapPartitionsRDD object.

Examples

Consider the following PySpark DataFrame:

df = spark.createDataFrame([["André", 20], ["Bob", 30], ["Cathy", 30]], ["name", "age"])
df.show()
+-----+---+
| name|age|
+-----+---+
|André| 20|
| Bob| 30|
|Cathy| 30|
+-----+---+

Converting the first row of PySpark DataFrame into a dictionary

To convert the first row of a PySpark DataFrame into a string-encoded JSON:

df.toJSON().first()
'{"name":"André","age":20}'

To convert a string-encoded JSON into a native dict:

import json
json.loads(df.toJSON().first())
{'name': 'André', 'age': 20}

Converting PySpark DataFrame into a list of row objects (dictionaries)

To convert a PySpark DataFrame into a list of string-encoded JSON:

df.toJSON().collect()
['{"name":"André","age":20}',
'{"name":"Bob","age":30}',
'{"name":"Cathy","age":30}']

To convert a PySpark DataFrame into a list of native dict:

df.toJSON().map(lambda str_json: json.loads(str_json)).collect()
[{'name': 'André', 'age': 20},
{'name': 'Bob', 'age': 30},
{'name': 'Cathy', 'age': 30}]

Here:

  • we are using the RDD.map(~) method to apply a custom function on each element of the RDD.

  • our custom function converts each string-encoded JSON into a dict.

Disabling unicode when converting PySpark DataFrame rows into string JSON

By default, unicode is enabled:

df.toJSON().first() # use_unicode=True
'{"name":"André","age":20}'

To disable unicode, set use_unicode=False:

df.toJSON(use_unicode=False).first()
b'{"name":"Andr\xc3\xa9","age":20}'
mail
Join our newsletter for updates on new DS/ML comprehensive guides (spam-free)
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down