search
Search
Login
Math ML Join our weekly DS/ML newsletter
menu
menu search toc more_vert
Robocat
Guest 0reps
Thanks for the thanks!
close
chevron_left Creating DataFrames Cookbook
Combining multiple Series into a DataFrameCombining multiple Series to form a DataFrameConverting a Series to a DataFrameConverting list of lists into DataFrameConverting list to DataFrameConverting percent string into a numeric for read_csvConverting scikit-learn dataset to Pandas DataFrameConverting string data into a DataFrameCreating a DataFrame from a stringCreating a DataFrame using listsCreating a DataFrame with different type for each columnCreating a DataFrame with empty valuesCreating a DataFrame with missing valuesCreating a DataFrame with random numbersCreating a DataFrame with zerosCreating a MultiIndex DataFrameCreating a Pandas DataFrameCreating a single DataFrame from multiple filesCreating empty DataFrame with only column labelsFilling missing values when using read_csvImporting DatasetImporting tables from PostgreSQL as Pandas DataFramesInitialising a DataFrame using a constantInitialising a DataFrame using a dictionaryInitialising a DataFrame using a list of dictionariesInserting lists into a DataFrame cellKeeping leading zeroes when using read_csvParsing dates when using read_csvPreventing strings from getting parsed as NaN for read_csvReading data from GitHubReading file without headerReading large CSV files in chunksReading n random lines using read_csvReading space-delimited filesReading specific columns from fileReading tab-delimited filesReading the first few lines of a file to create DataFrameReading the last n lines of a fileReading URL using read_csvReading zipped csv file as a DataFrameRemoving Unnamed:0 columnResolving ParserError: Error tokenizing dataSaving DataFrame as zipped csvSkipping rows without skipping header for read_csvSpecifying data type for read_csvTreating missing values as empty strings rather than NaN for read_csv
Comments
Log in or sign up
Cancel
Post
account_circle
Profile
exit_to_app
Sign out
help Ask a question
Share on Twitter
search
keyboard_voice
close
Searching Tips
Search for a recipe:
"Creating a table in MySQL"
Search for an API documentation: "@append"
Search for code: "!dataframe"
Apply a tag filter: "#python"
Useful Shortcuts
/ to open search panel
Esc to close search panel
to navigate between search results
d to clear all current filters
Enter to expand content preview
icon_star
Doc Search
icon_star
Code Search Beta
SORRY NOTHING FOUND!
mic
Start speaking...
Voice search is only supported in Safari and Chrome.
Navigate to
A
A
brightness_medium
share
arrow_backShare
Twitter
Facebook
chevron_left Creating DataFrames Cookbook
Combining multiple Series into a DataFrameCombining multiple Series to form a DataFrameConverting a Series to a DataFrameConverting list of lists into DataFrameConverting list to DataFrameConverting percent string into a numeric for read_csvConverting scikit-learn dataset to Pandas DataFrameConverting string data into a DataFrameCreating a DataFrame from a stringCreating a DataFrame using listsCreating a DataFrame with different type for each columnCreating a DataFrame with empty valuesCreating a DataFrame with missing valuesCreating a DataFrame with random numbersCreating a DataFrame with zerosCreating a MultiIndex DataFrameCreating a Pandas DataFrameCreating a single DataFrame from multiple filesCreating empty DataFrame with only column labelsFilling missing values when using read_csvImporting DatasetImporting tables from PostgreSQL as Pandas DataFramesInitialising a DataFrame using a constantInitialising a DataFrame using a dictionaryInitialising a DataFrame using a list of dictionariesInserting lists into a DataFrame cellKeeping leading zeroes when using read_csvParsing dates when using read_csvPreventing strings from getting parsed as NaN for read_csvReading data from GitHubReading file without headerReading large CSV files in chunksReading n random lines using read_csvReading space-delimited filesReading specific columns from fileReading tab-delimited filesReading the first few lines of a file to create DataFrameReading the last n lines of a fileReading URL using read_csvReading zipped csv file as a DataFrameRemoving Unnamed:0 columnResolving ParserError: Error tokenizing dataSaving DataFrame as zipped csvSkipping rows without skipping header for read_csvSpecifying data type for read_csvTreating missing values as empty strings rather than NaN for read_csv
check_circle
Mark as learned
thumb_up
1
thumb_down
0
chat_bubble_outline
0
auto_stories new
settings

Resolving ParserError: Error tokenizing data in Pandas

Pandas
chevron_right
Cookbooks
chevron_right
DataFrame Cookbooks
chevron_right
Creating DataFrames Cookbook
schedule Jul 1, 2022
Last updated
local_offer PythonPandas
Tags

Common reasons for ParserError: Error tokenizing data when initiating a Pandas DataFrame include:

  • Using the wrong delimiter

  • Number of fields in certain rows do not match with header

To resolve the error we can try the following:

  • Specifying the delimiter through sep parameter in read_csv(~)

  • Fixing the original source file

  • Skipping bad rows

Examples

Specifying sep

By default the read_csv(~) method assumes sep=",". Therefore when reading files that use a different delimiter, make sure to explicitly specify the delimiter to use.

Consider the following slash-delimited file called test.txt:

col1/col2/col3
1/A/4
2/B/5
3/C,D,E/6

To initialize a DataFrame using default sep=",":

import pandas as pd
df = pd.read_csv('test.txt')
ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 3

An error is raised as the first line in the file does not contain any commas, so read_csv(~) expects all lines in the file to only contain 1 field. However, line 4 has 3 fields as it contains two commas, which results in the ParserError.

To initialize the DataFrame by correctly specifying slash (/) as the delimiter:

df = pd.read_csv('test.txt', sep='/')
df
col1 col2 col3
0 1 A 4
1 2 B 5
2 3 C,D,E 6

We can now see the DataFrame is initialized as expected with each line containing the 3 fields which were separated by slashes (/) in the original file.

Fixing original source file

Consider the following comma-separated file called test.csv:

col1,col2,col3
1,A,4
2,B,5,
3,C,6

To initialize a DataFrame using the above file:

import pandas as pd
df = pd.read_csv('test.csv')
ParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 4

Here there is an error on the 3rd line as 4 fields are observed instead of 3, caused by the additional comma at the end of the line.

To resolve this error, we can correct the original file by removing the extra comma at the end of line 3:

col1,col2,col3
1,A,4
2,B,5
3,C,6

To now initialize the DataFrame again using the corrected file:

df = pd.read_csv('test.csv')
df
col1 col2 col3
0 1 A 4
1 2 B 5
2 3 C 6

Skipping bad rows

Consider the following comma-separated file called test.csv:

col1,col2,col3
1,A,4
2,B,5,
3,C,6

To initialize a DataFrame using the above file:

import pandas as pd
df = pd.read_csv('test.csv')
ParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 4

Here there is an error on the 3rd line as 4 fields are observed instead of 3, caused by the additional comma at the end of the line.

To skip bad rows pass on_bad_lines='skip' to read_csv(~):

df = pd.read_csv('test.csv', on_bad_lines='skip')
df
col1 col2 col3
0 1 A 4
1 3 C 6

Notice how the problematic third line in the original file has been skipped in the resulting DataFrame.

WARNING

This should be your last resort as valuable information could be contained within the problematic lines. Skipping these rows means you lose this information. As much as possible try to identify the root cause of the error and fix the underlying problem.

mail
Join our newsletter for updates on new DS/ML comprehensive guides (spam-free)
robocat
Published by Arthur Yanagisawa
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Ask a question or leave a feedback...
thumb_up
1
thumb_down
0
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!