Pandas DataFrame | to_json method
Start your free 7-days trial now!
Pandas DataFrame.to_json(~) method either converts a DataFrame to a JSON string, or outputs a JSON file.
Parameters
1. path_or_buf | string or file handle | optional
The path to where you want to save the JSON. By default, the method will return a JSON string without writing to a file.
2. orientlink | string
How you want convert the source DataFrame into a JSON.
For "split":
{ "index": [list of index], "columns": [list of labels], "data": [list of values]}
For "records":
[{column_label: values}, ... , {column_label: values}]
Here, we call each item a record.
For "index":
{ index: column_label: value ... index: ...}
For "columns" (default):
{ column_label: index: value ... column_label: ...}
For "values", we just get a list of values:
[value_one, ..., ]
For "table", we get a comprehensive representation of the source DataFrame:
{ "schema": {schema} "data": {data}}
By default, orient="columns".
3. date_formatlink | string | optional
Whether or not to convert the dates into epoch milliseconds or iso8601 format. The allowed value are as follows:
"epoch": the time passed since1970-01-01in milliseconds."iso": the global standard for date representation.
By default, if orient="table", then date_format="iso", otherwise "epoch".
4. double_precision | int | optional
The number of decimal places to store for floating numbers. By default, double_precision=10.
5. force_ascii | boolean | optional
Whether to use ASCII for encoding strings. By default, force_ascii=True.
6. date_unitlink | string | optional
The time unit to use. The allowed values are as follows:
"s": seconds"ms": milliseconds"us": microseconds"ns": nanoseconds
By default, date_unit="ms".
7. default_handler | callable | optional
The callback that fires if the JSON conversion is not successful. The callback takes in as argument the source DataFrame and returns a serialisable object (e.g. maps and lists). By default, default_handler=None.
8. lineslink | boolean | optional
If orient="records", then write out each {column_label:values} in a new line. Passing in True when for values other than "records" will result in an error. By default, lines=False.
9. compression | string | optional
The compression algorithm to use when outputting to a file. The available algorithms are:
"infer", "gzip", "bz2", "zip", "xz"
This is only relevant when we are outputting to a file, that is, the path_or_buf is specified. By default, compression="infer".
10. index | boolean | optional
Whether to include the index in the resulting JSON string. This is only relevant if orient is either "split" or "table". By default, index=True.
11. indentlink | int | optional
The number of whitespaces to indent each record.
Return Value
If path_or_buf is specified, then None is returned. Otherwise, a string in JSON format is returned.
Examples
Outputting to a file
Consider the following DataFrame:
df
A Ba 2 4b 3 5
Instead of obtaining a JSON string, we can output the JSON to a file by passing in path_or_buf like so:
df.to_json(path_or_buf="my_json")
This will create a new file called my_json in the same directory as your Python script:
{"A":{"a":2,"b":3},"B":{"a":4,"b":5}}
Specifying orient
Consider the following DataFrame:
df
A Ba 2 4b 3 5
Default
By default, orient="columns":
df.to_json()
'{"A":{"a":2,"b":3},"B":{"a":4,"b":5}}'
split
df.to_json(orient="split")
'{"columns":["A","B"],"index":["a","b"],"data":[[2,4],[3,5]]}'
records
df.to_json(orient="records")
'[{"A":2,"B":4},{"A":3,"B":5}]'
index
df.to_json(orient="index")
'{"a":{"A":2,"B":4},"b":{"A":3,"B":5}}'
columns
df.to_json(orient="columns")
'{"A":{"a":2,"b":3},"B":{"a":4,"b":5}}'
values
df.to_json(orient="values")
'[[2,4],[3,5]]'
table
df.to_json(orient="table")
'{"schema":{"fields":[{"name":"index","type":"string"}, {"name":"A","type":"integer"}, {"name":"B","type":"integer"}], "primaryKey":["index"], "pandas_version":"0.20.0"} "data":[{"index":"a","A":2,"B":4}, {"index":"b","A":3,"B":5}]}'
Here, we've prettified the output for clarity, but the actual output is all in one line.
Specifying date_format
Consider the following DataFrame:
df
A B0 2020-12-25 2020-12-20
By default, date_format="epoch" (unless orient="table"):
df.to_json() # date_format="epoch"
'{"A":{"0":1608854400000},"B":{"0":1608422400000}}'
Here, the large numbers represent the time passed since 1970-01-01 in milliseconds.
On the other hand, the we can change the date formatting to iso8601 by passing in "iso" like so:
df.to_json(date_format="iso")
'{"A":{"0":"2020-12-25T00:00:00.000Z"},"B":{"0":"2020-12-20T00:00:00.000Z"}}'
Specifying date_unit
Consider the following DataFrame:
df
A B0 2020-12-25 2020-12-20
By default, when date_format="epoch", the units will be in milliseconds:
df.to_json() # date_format="epoch"
'{"A":{"0":1608854400000},"B":{"0":1608422400000}}'
Here, the large numbers represent the time passed since 1970-01-01 in milliseconds.
We can change the units to seconds like so:
df.to_json(date_unit="s") # date_format="epoch"
'{"A":{"0":1608854400},"B":{"0":1608422400}}'
Specifying default_handler
Consider the following DataFrame:
df
A Ba 3.000000+4.000000j 4
Here, our df has a complex number.
By default, when we try to convert df into a JSON string, we get the following:
df.to_json(orient="records")
'[{"A":{"imag":4.0},"B":4}]'
Notice how the complex number is represented erroneously as "imag", which happens because JSON does not know how to internally parse complex numbers. For situations like this when the conversion is improper, we can use the default_handler parameter to control what is returned:
def my_handler(my_df):
df.to_json(orient="records", default_handler=my_handler)
'[{"A":[{"A":3,"B":4}],"B":4}]'
The handler takes as argument the source DataFrame and returns a serialisable object like a map, Series, DataFrame and so on. Now, instead of the malformed JSON that we had before, we can return another JSON of our liking.
Specifying lines
Consider the following DataFrame:
df
A Ba 2 4b 3 5
When orient="records", we can make each item appear in a new line by passing in lines=True like so:
my_json = df.to_json(orient="records", lines=True)my_json
'{"A":2,"B":4}\n{"A":3,"B":5}'
When we print out our string my_json, then we see the \n take effect:
print(my_json)
{"A":2,"B":4}{"A":3,"B":5}
Specifying indent
We can add whitespace indents by passing in the indent parameter like so:
my_json = df.to_json(orient="records", indent=3)print(my_json)
[ { "A":2, "B":4 }, { "A":3, "B":5 }]
Here, the second line has 3 whitespaces, while the third has 6 whitespaces, and so on.