Pandas DataFrame | to_csv method
Start your free 7-days trial now!
Pandas DataFrame.to_csv(~) method converts the source DataFrame into comma-separated value format.
Parameters
1. path_or_buf | string or file handle | optional
The path to write the csv. By default, the csv is returned as a string.
2. sep | string of length one | optional
The separator to use. By default, sep=",".
3. na_rep | string | optional
The value to replace NaN in the source DataFrame.
4. float_formatlink | string | optional
The format string for floats.
5. columnslink | sequence | optional
The label of the columns to include. All other columns will be excluded from the resulting csv. By default, all columns are included in the resulting csv.
6. headerlink | boolean or list of string | optional
Whether or not to include the column labels in the csv. If a list of strings is passed, then those strings will overwrite the existing column labels. By default, header=True.
7. index | boolean | optional
Whether or not to include row labels in the csv. By default, index=True.
8. index_labellink | string or sequence or False or None | optional
The column labels to use. By default, index_label=None.
9. mode | string | optional
The mode to open the file. By default, mode="w", which stands for write mode.
10. encoding | string | optional
The encoding to use when writing to a file. By default, encoding="utf-8".
11. compression | string or dict | optional
The compression algorithm to use. The allowed values are as follows:
"infer" "gzip" "bz2" "zip" "xz"
By default, compression="infer", which means that if a path is supplied for path_or_buf, then the compression algorithm will be inferred from the extension you appended. For instance, if path_or_buf is "my_data.zip", then the "zip" compression will be used. If an extension is not supplied, then no compression will take place.
12. quoting | optional
By default, quoting=csv.QUOTE_MINIMAL.
13. quotecharlink | string of length one | optional
When a value happens to contain a delimiter, then the value will erroneously be divided up. You can wrap your value using quotechar to prevent such unwanted splits from happening. By default, '"'.
14. line_terminatorlink | string | optional
The character used to indicate a line break. Defaults to os.linesep.
15. chunksize | int or None | optional
The number of rows to write at one time. If your DataFrame is large, using a large chunksize (e.g. 10000) may lead to better performance. By default, chunksize=None.
16. date_format | string | optional
The format string for datetime objects.
17. doublequotelink | boolean | optional
Whether or not to parse quotechar. By default, doublequote=True.
18. escapecharlink | string | optional
The character to escape the double quotation marks. By default, escapechar=None.
19. decimal | string | optional
The character denoting a decimal point. By default, decimal=".". This parameter exists since some European countries like France use a , to denote a decimal point instead.
Return Value
If path_or_buf is specified, then None is returned. Otherwise, a string is returned.
Examples
Writing csv to a file
Consider the following DataFrame:
df
A Ba 3 5b 4 6
To save our df as a CSV file:
df.to_csv(path_or_buf="my_data")
The outputted file my_data, which is located in the same directory as the Python script, is as follows:
,A,Ba,3,5b,4,6
Writing csv as a string
Consider the following DataFrame:
df
A Ba 3 5b 4 6
To save our df as a csv string, don't specify path_or_buf:
df.to_csv()
',A,B\na,3,5\nb,4,6\n'
Note that printing this out will render the \n to take effect:
,A,Ba,3,5b,4,6
Specifying float_format
Consider the following DataFrame that contains floating point numbers:
df
A Ba 3.00005 5b 4.00000 6
In order to format the floating point numbers (e.g. how many decimal places to include), we use the float_format parameter. The syntax follows that of Python's standard string formatter, which we cover here in detail.
As an example, to include up to 3 decimal places:
df.to_csv(float_format="%.3f")
',A,B\na,3.000,5\nb,4.000,6\n'
Specifying columns
Consider the following DataFrame:
df
A Ba 3 5b 4 6
By default, all columns are included in the resulting csv. To include only specific columns, specify their column labels like so:
,Aa,3b,4
Specifying header
Consider the following DataFrame:
df
A Ba 3 5b 4 6
By default, header=True, which means that the header is include in the resulting csv:
,A,Ba,3,5b,4,6
To exclude the headers, set header=False like so:
a,3,5b,4,6
We can also pass a list of new column labels like so:
,C,Da,3,5b,4,6
Specifying index_label
Consider the following DataFrame:
df
A Ba 3 5b 4 6
By default, index_label=None, which means that an empty index label will be included:
,A,Ba,3,5b,4,6
Notice how we begin with a comma here - the index label is empty, but it is still included.
To remove the index names, set index_label=False like so:
A,Ba,3,5b,4,6
Some statistical software like R may find this format easier to parse.
Specifying quotechar
Consider the following DataFrame:
df
A Ba 3,9 5b 4 6
Here, notice how one of our values is "3,9", which unfortunately contains the default separator ,. If we were to turn this into a csv, we would end up with 3,9,5 in the first row, which is incorrect since it indicates that we have 3 values in this row instead of 2.
In order to indicate that 3,9 is one single value, the to_csv(~) method wraps a quotation mark (") around it by default:
df.to_csv(sep=",")
',A,B\na,"3,9",5\nb,4,6\n'
Notice how we have "3,9" now.
Instead of ", we can specify a single character as the wrapper by passing in quotechar like so:
df.to_csv(quotechar="@")
',A,B\na,@3,9@,5\nb,4,6\n'
Specifying line_terminator
Consider the following DataFrame:
df
A Ba 3 5b 4 6
By default, each row is split using a new line character (\n):
df.to_csv()
',A,B\na,3,5\nb,4,6\n'
We can use a custom character to split the rows by passing in line_terminator:
df.to_csv(line_terminator="@")
',A,B@a,3,5@b,4,6@'
Specifying doublequote and escapechar
Consider the following DataFrame:
df
A Ba 3"9 5b 4 6
Notice how df contains a value that has a single double quotation mark ".
By default, doublequote=True, which means that the quotechar is duplicated like so:
,A,Ba,"3""9",5b,4,6
Here, note the following:
originally the output was
"3"9", where the outer""is there to indicate that the value is a string.the problem here is that, the value actually contains
"so this results in syntax error as"3"9"is not a valid string.to overcome this problem, the method escapes the middle
"using thequotechar, which is (confusingly)"by default.as a result, we end up with this peculiar-looking
"3""9".
We can turn off this behaviour where strings are wrapped in "" by setting doublequote=False, and then providing the escapechar parameter:
,A,Ba,3@"9,5b,4,6
Here, the escapechar is needed since "39 alone is invalid syntax - " denotes a start of a string, and there is no closing ". The role of escapechar is to indicate that " is not at all related to a string.