search
Search
Join our weekly DS/ML newsletter layers DS/ML Guides
menu
menu search toc more_vert
Robocat
Guest 0reps
Thanks for the thanks!
close
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
help Ask a question
Share on Twitter
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
A
A
brightness_medium
share
arrow_backShare
Twitter
Facebook
0
thumb_down
0
chat_bubble_outline
0
auto_stories new
settings

Pandas DataFrame | to_csv method

Programming
chevron_right
Python
chevron_right
Pandas
chevron_right
Documentation
chevron_right
DataFrame
chevron_right
Type Conversion
schedule Jul 1, 2022
Last updated
local_offer PythonPandas
Tags

Pandas DataFrame.to_csv(~) method converts the source DataFrame into comma-separated value format.

Parameters

1. path_or_buf | string or file handle | optional

The path to write the csv. By default, the csv is returned as a string.

2. sep | string of length one | optional

The separator to use. By default, sep=",".

3. na_rep | string | optional

The value to replace NaN in the source DataFrame.

4. float_formatlink | string | optional

The format string for floats.

5. columnslink | sequence | optional

The label of the columns to include. All other columns will be excluded from the resulting csv. By default, all columns are included in the resulting csv.

6. headerlink | boolean or list of string | optional

Whether or not to include the column labels in the csv. If a list of strings is passed, then those strings will overwrite the existing column labels. By default, header=True.

7. index | boolean | optional

Whether or not to include row labels in the csv. By default, index=True.

8. index_labellink | string or sequence or False or None | optional

The column labels to use. By default, index_label=None.

9. mode | string | optional

The mode to open the file. By default, mode="w", which stands for write mode.

10. encoding | string | optional

The encoding to use when writing to a file. By default, encoding="utf-8".

11. compression | string or dict | optional

The compression algorithm to use. The allowed values are as follows:

"infer" "gzip" "bz2" "zip" "xz"

By default, compression="infer", which means that if a path is supplied for path_or_buf, then the compression algorithm will be inferred from the extension you appended. For instance, if path_or_buf is "my_data.zip", then the "zip" compression will be used. If an extension is not supplied, then no compression will take place.

12. quoting | optional

By default, quoting=csv.QUOTE_MINIMAL.

13. quotecharlink | string of length one | optional

When a value happens to contain a delimiter, then the value will erroneously be divided up. You can wrap your value using quotechar to prevent such unwanted splits from happening. By default, '"'.

14. line_terminatorlink | string | optional

The character used to indicate a line break. Defaults to os.linesep.

15. chunksize | int or None | optional

The number of rows to write at one time. If your DataFrame is large, using a large chunksize (e.g. 10000) may lead to better performance. By default, chunksize=None.

16. date_format | string | optional

The format string for datetime objects.

17. doublequotelink | boolean | optional

Whether or not to parse quotechar. By default, doublequote=True.

18. escapecharlink | string | optional

The character to escape the double quotation marks. By default, escapechar=None.

19. decimal | string | optional

The character denoting a decimal point. By default, decimal=".". This parameter exists since some European countries like France use a , to denote a decimal point instead.

Return Value

If path_or_buf is specified, then None is returned. Otherwise, a string is returned.

Examples

Writing csv to a file

Consider the following DataFrame:

df = pd.DataFrame({"A":[3,4], "B":[5,6]}, index=["a","b"])
df
   A  B
a  3  5
b  4  6

To save our df as a CSV file:

df.to_csv(path_or_buf="my_data")

The outputted file my_data, which is located in the same directory as the Python script, is as follows:

,A,B
a,3,5
b,4,6

Writing csv as a string

Consider the following DataFrame:

df = pd.DataFrame({"A":[3,4], "B":[5,6]}, index=["a","b"])
df
   A  B
a  3  5
b  4  6

To save our df as a csv string, don't specify path_or_buf:

df.to_csv()
',A,B\na,3,5\nb,4,6\n'

Note that printing this out will render the \n to take effect:

print(df.to_csv())
,A,B
a,3,5
b,4,6

Specifying float_format

Consider the following DataFrame that contains floating point numbers:

df = pd.DataFrame({"A":[3.00005,4],"B":[5,6]}, index=["a","b"])
df
   A        B
a  3.00005  5
b  4.00000  6

In order to format the floating point numbers (e.g. how many decimal places to include), we use the float_format parameter. The syntax follows that of Python's standard string formatter, which we cover here in detail.

As an example, to include up to 3 decimal places:

df.to_csv(float_format="%.3f")
',A,B\na,3.000,5\nb,4.000,6\n'

Specifying columns

Consider the following DataFrame:

df = pd.DataFrame({"A":[3,4], "B":[5,6]}, index=["a","b"])
df
   A  B
a  3  5
b  4  6

By default, all columns are included in the resulting csv. To include only specific columns, specify their column labels like so:

print(df.to_csv(columns=["A"]))
,A
a,3
b,4

Specifying header

Consider the following DataFrame:

df = pd.DataFrame({"A":[3,4],"B":[5,6]}, index=["a","b"])
df
   A  B
a  3  5
b  4  6

By default, header=True, which means that the header is include in the resulting csv:

print(df.to_csv())   # header=True
,A,B
a,3,5
b,4,6

To exclude the headers, set header=False like so:

print(df.to_csv(header=False))
a,3,5
b,4,6

We can also pass a list of new column labels like so:

print(df.to_csv(header=["C","D"]))
,C,D
a,3,5
b,4,6

Specifying index_label

Consider the following DataFrame:

df = pd.DataFrame({"A":[3,4],"B":[5,6]}, index=["a","b"])
df
   A  B
a  3  5
b  4  6

By default, index_label=None, which means that an empty index label will be included:

print(df.to_csv())
,A,B
a,3,5
b,4,6

Notice how we begin with a comma here - the index label is empty, but it is still included.

To remove the index names, set index_label=False like so:

print(df.to_csv(index_label=False))
A,B
a,3,5
b,4,6

Some statistical software like R may find this format easier to parse.

Specifying quotechar

Consider the following DataFrame:

df = pd.DataFrame({"A":["3,9",4], "B":[5,6]}, index=["a","b"])
df
   A    B
a  3,9  5
b  4    6

Here, notice how one of our values is "3,9", which unfortunately contains the default separator ,. If we were to turn this into a csv, we would end up with 3,9,5 in the first row, which is incorrect since it indicates that we have 3 values in this row instead of 2.

In order to indicate that 3,9 is one single value, the to_csv(~) method wraps a quotation mark (") around it by default:

df.to_csv(sep=",")
',A,B\na,"3,9",5\nb,4,6\n'

Notice how we have "3,9" now.

Instead of ", we can specify a single character as the wrapper by passing in quotechar like so:

df.to_csv(quotechar="@")
',A,B\na,@3,9@,5\nb,4,6\n'

Specifying line_terminator

Consider the following DataFrame:

df = pd.DataFrame({"A":[3,4], "B":[5,6]}, index=["a","b"])
df
   A  B
a  3  5
b  4  6

By default, each row is split using a new line character (\n):

df.to_csv()
',A,B\na,3,5\nb,4,6\n'

We can use a custom character to split the rows by passing in line_terminator:

df.to_csv(line_terminator="@")
',A,B@a,3,5@b,4,6@'

Specifying doublequote and escapechar

Consider the following DataFrame:

df = pd.DataFrame({"A":['3"9',4],"B":[5,6]}, index=["a","b"])
df
   A    B
a  3"9  5
b  4    6

Notice how df contains a value that has a single double quotation mark ".

By default, doublequote=True, which means that the quotechar is duplicated like so:

print(df.to_csv())
,A,B
a,"3""9",5
b,4,6

Here, note the following:

  • originally the output was "3"9", where the outer "" is there to indicate that the value is a string.

  • the problem here is that, the value actually contains " so this results in syntax error as "3"9" is not a valid string.

  • to overcome this problem, the method escapes the middle " using the quotechar, which is (confusingly) " by default.

  • as a result, we end up with this peculiar-looking "3""9".

We can turn off this behaviour where strings are wrapped in "" by setting doublequote=False, and then providing the escapechar parameter:

print(df.to_csv(doublequote=False, escapechar="@"))
,A,B
a,3@"9,5
b,4,6

Here, the escapechar is needed since "39 alone is invalid syntax - " denotes a start of a string, and there is no closing ". The role of escapechar is to indicate that " is not at all related to a string.

mail
Join our newsletter for updates on new DS/ML comprehensive guides (spam-free)
robocat
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
Ask a question or leave a feedback...