{
   "index": [list of index],
   "columns": [list of labels],
   "data": [list of values]
}

For "records":


        
        
            
                
                
                    [{column_label: values}, ... , {column_label: values}]

Here, we call each item a record.

For "index":


        
        
            
                
                
                    {
   index:
      column_label: value
      ...
   index:
      ...
}

For "columns" (default):


        
        
            
                
                
                    {
   column_label:
      index: value
      ...
   column_label:
      ...
}

For "values", we just get a list of values:


        
        
            
                
                
                    [value_one, ..., ]

For "table", we get a comprehensive representation of the source DataFrame:


        
        
            
                
                
                    {
   "schema": {schema}
   "data": {data}
}

By default, orient="columns".

3. date_formatlink | string | optional

Whether or not to convert the dates into epoch milliseconds or iso8601 format. The allowed value are as follows:

"epoch": the time passed since 1970-01-01 in milliseconds.
"iso": the global standard for date representation.

By default, if orient="table", then date_format="iso", otherwise "epoch".

4. double_precision | int | optional

The number of decimal places to store for floating numbers. By default, double_precision=10.

5. force_ascii | boolean | optional

Whether to use ASCII for encoding strings. By default, force_ascii=True.

6. date_unitlink | string | optional

The time unit to use. The allowed values are as follows:

"s": seconds
"ms": milliseconds
"us": microseconds
"ns": nanoseconds

By default, date_unit="ms".

7. default_handler | callable | optional

The callback that fires if the JSON conversion is not successful. The callback takes in as argument the source DataFrame and returns a serialisable object (e.g. maps and lists). By default, default_handler=None.

8. lineslink | boolean | optional

If orient="records", then write out each {column_label:values} in a new line. Passing in True when for values other than "records" will result in an error. By default, lines=False.

9. compression | string | optional

The compression algorithm to use when outputting to a file. The available algorithms are:


        
        
            
                
                
                    "infer", "gzip", "bz2", "zip", "xz"

This is only relevant when we are outputting to a file, that is, the path_or_buf is specified. By default, compression="infer".

10. index | boolean | optional

Whether to include the index in the resulting JSON string. This is only relevant if orient is either "split" or "table". By default, index=True.

11. indentlink | int | optional

The number of whitespaces to indent each record.

Return Value

If path_or_buf is specified, then None is returned. Otherwise, a string in JSON format is returned.

Examples

Outputting to a file

Consider the following DataFrame:


        
        
            
                
                
                    df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=["a","b"])
df
                
            
               A  B
a  2  4
b  3  5

Instead of obtaining a JSON string, we can output the JSON to a file by passing in path_or_buf like so:


        
        
            
                
                
                    df.to_json(path_or_buf="my_json")

This will create a new file called my_json in the same directory as your Python script:


        
        
            
                
                
                    {"A":{"a":2,"b":3},"B":{"a":4,"b":5}}

Specifying orient

Consider the following DataFrame:


        
        
            
                
                
                    df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=["a","b"])
df
                
            
               A  B
a  2  4
b  3  5

Default

By default, orient="columns":


        
        
            
                
                
                    df.to_json()
                
            
            '{"A":{"a":2,"b":3},"B":{"a":4,"b":5}}'

split


        
        
            
                
                
                    df.to_json(orient="split")
                
            
            '{"columns":["A","B"],"index":["a","b"],"data":[[2,4],[3,5]]}'

records


        
        
            
                
                
                    df.to_json(orient="records")
                
            
            '[{"A":2,"B":4},{"A":3,"B":5}]'

index


        
        
            
                
                
                    df.to_json(orient="index")
                
            
            '{"a":{"A":2,"B":4},"b":{"A":3,"B":5}}'

columns


        
        
            
                
                
                    df.to_json(orient="columns")
                
            
            '{"A":{"a":2,"b":3},"B":{"a":4,"b":5}}'

values


        
        
            
                
                
                    df.to_json(orient="values")
                
            
            '[[2,4],[3,5]]'

table


        
        
            
                
                
                    df.to_json(orient="table")
                
            
            '{"schema":{"fields":[{"name":"index","type":"string"},
                      {"name":"A","type":"integer"},
                      {"name":"B","type":"integer"}],
            "primaryKey":["index"],
            "pandas_version":"0.20.0"}
  "data":[{"index":"a","A":2,"B":4},
          {"index":"b","A":3,"B":5}]}'

Here, we've prettified the output for clarity, but the actual output is all in one line.

Specifying date_format

Consider the following DataFrame:


        
        
            
                
                
                    df = pd.DataFrame({"A":["2020-12-25"], "B":["2020-12-20"]}, dtype="datetime64[ns]")
df
                
            
               A           B
0  2020-12-25  2020-12-20

By default, date_format="epoch" (unless orient="table"):


        
        
            
                
                
                    df.to_json()    # date_format="epoch"
                
            
            '{"A":{"0":1608854400000},"B":{"0":1608422400000}}'

Here, the large numbers represent the time passed since 1970-01-01 in milliseconds.

On the other hand, the we can change the date formatting to iso8601 by passing in "iso" like so:


        
        
            
                
                
                    df.to_json(date_format="iso")
                
            
            '{"A":{"0":"2020-12-25T00:00:00.000Z"},"B":{"0":"2020-12-20T00:00:00.000Z"}}'

Specifying date_unit

Consider the following DataFrame:


        
        
            
                
                
                    df = pd.DataFrame({"A":["2020-12-25"], "B":["2020-12-20"]}, dtype="datetime64[ns]")
df
                
            
               A           B
0  2020-12-25  2020-12-20

By default, when date_format="epoch", the units will be in milliseconds:


        
        
            
                
                
                    df.to_json()    # date_format="epoch"
                
            
            '{"A":{"0":1608854400000},"B":{"0":1608422400000}}'

Here, the large numbers represent the time passed since 1970-01-01 in milliseconds.

We can change the units to seconds like so:


        
        
            
                
                
                    df.to_json(date_unit="s")    # date_format="epoch"
                
            
            '{"A":{"0":1608854400},"B":{"0":1608422400}}'

Specifying default_handler

Consider the following DataFrame:


        
        
            
                
                
                    df = pd.DataFrame({"A":[complex(3,4)], "B":[4]}, index=["a"])
df
                
            
               A                   B
a  3.000000+4.000000j  4

Here, our df has a complex number.

By default, when we try to convert df into a JSON string, we get the following:


        
        
            
                
                
                    df.to_json(orient="records")
                
            
            '[{"A":{"imag":4.0},"B":4}]'

Notice how the complex number is represented erroneously as "imag", which happens because JSON does not know how to internally parse complex numbers. For situations like this when the conversion is improper, we can use the default_handler parameter to control what is returned:


        
        
            
                
                
                    def my_handler(my_df):
    return pd.DataFrame({"A":[3], "B":[4]}, index=["a"])

df.to_json(orient="records", default_handler=my_handler)
                
            
            '[{"A":[{"A":3,"B":4}],"B":4}]'

The handler takes as argument the source DataFrame and returns a serialisable object like a map, Series, DataFrame and so on. Now, instead of the malformed JSON that we had before, we can return another JSON of our liking.

Specifying lines

Consider the following DataFrame:


        
        
            
                
                
                    df = pd.DataFrame({"A":[2,3], "B":[4,5]}, index=["a","b"])
df
                
            
               A  B
a  2  4
b  3  5

When orient="records", we can make each item appear in a new line by passing in lines=True like so:


        
        
            
                
                
                    my_json = df.to_json(orient="records", lines=True)
my_json
                
            
            '{"A":2,"B":4}\n{"A":3,"B":5}'

When we print out our string my_json, then we see the \n take effect:


        
        
            
                
                
                    print(my_json)
                
            
            {"A":2,"B":4}
{"A":3,"B":5}

Specifying indent

We can add whitespace indents by passing in the indent parameter like so:


        
        
            
                
                
                    my_json = df.to_json(orient="records", indent=3)
print(my_json)
                
            
            [
   {
      "A":2,
      "B":4
   },
   {
      "A":3,
      "B":5
   }
]

Here, the second line has 3 whitespaces, while the third has 6 whitespaces, and so on.

Published by Isshin Inada

Edited by 0 others

Did you find this page useful?

thumb_up

thumb_down

Comment

Citation

Ask a question or leave a feedback...

Official Pandas Documentation

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html

thumb_up

thumb_down

chat_bubble_outline

settings

Enjoy our search

Hit / to insta-search docs and recipes!