search
Search
Login
Unlock 100+ guides
menu
menu
web
search toc
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
What does this mean?
Why is this true?
Give me some examples!
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to

PySpark DataFrame | toJSON method

schedule Aug 12, 2023
Last updated
local_offer
PySpark
Tags
mode_heat
Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!

PySpark DataFrame's toJSON(~) method converts the DataFrame into a string-typed RDD. When the RDD data is extracted, each row of the DataFrame will be converted into a string JSON. Consult the examples below for clarification.

Parameters

1. use_unicode | boolean

Whether to use unicode during the conversion. By default, use_unicode=True.

Return Value

A MapPartitionsRDD object.

Examples

Consider the following PySpark DataFrame:

df = spark.createDataFrame([["André", 20], ["Bob", 30], ["Cathy", 30]], ["name", "age"])
df.show()
+-----+---+
| name|age|
+-----+---+
|André| 20|
| Bob| 30|
|Cathy| 30|
+-----+---+

Converting the first row of PySpark DataFrame into a dictionary

To convert the first row of a PySpark DataFrame into a string-encoded JSON:

df.toJSON().first()
'{"name":"André","age":20}'

To convert a string-encoded JSON into a native dict:

import json
json.loads(df.toJSON().first())
{'name': 'André', 'age': 20}

Converting PySpark DataFrame into a list of row objects (dictionaries)

To convert a PySpark DataFrame into a list of string-encoded JSON:

df.toJSON().collect()
['{"name":"André","age":20}',
'{"name":"Bob","age":30}',
'{"name":"Cathy","age":30}']

To convert a PySpark DataFrame into a list of native dict:

df.toJSON().map(lambda str_json: json.loads(str_json)).collect()
[{'name': 'André', 'age': 20},
{'name': 'Bob', 'age': 30},
{'name': 'Cathy', 'age': 30}]

Here:

  • we are using the RDD.map(~) method to apply a custom function on each element of the RDD.

  • our custom function converts each string-encoded JSON into a dict.

Disabling unicode when converting PySpark DataFrame rows into string JSON

By default, unicode is enabled:

df.toJSON().first() # use_unicode=True
'{"name":"André","age":20}'

To disable unicode, set use_unicode=False:

df.toJSON(use_unicode=False).first()
b'{"name":"Andr\xc3\xa9","age":20}'
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...