3
votes

When I import a csv file in pandas, I get a DtypeWarning:

Columns (3) have mixed types. Specify dtype option on import or set low_memory=False.

  1. How do I find out what the dtype is of each cell? I think there might be some issue with the data that is why the warning is coming but it is a file with ~5 milllion rows so hard to ideentify the culprit?
  2. Is it a good practice to specify dtype on Import? Aid if that is done, will it not result in "loss" of data?
1
dtype != type, The warning is telling you, most likely, that a column has things that look like integers and things that look like strings. For example, the SEDOL security identifier has some identifiers that look like integers 200001 for Amazon, and others that are strings B02HJf (made that up). I can specify the dtype of the entire column by passing a dictionary to the converters argument that tells read_csv how to convert stuff. read_csv(... converters={'ID': str}, ...) to make sure my ID column comes in as strings. This dictionary can take care of other columns too.piRSquared
My last comment aside, it's best if you can provide a minimal example that reproduces the problem.piRSquared
Pandas seems to throw an exception when it encounters values that can't be converted to the specified type, it may be worth just trying to specify the expected type and seeing on which value it fails.nyrocron

1 Answers

5
votes

I agree with piRSquared. Just adding to his comments, I had a similar problem. My column was supposed to have string values, but one value was a float value (with a NaN value).

There are some things you can do to help you with your analysis. Supose your dataframe is df. You can check each column's type with:

df.dtypes

For each column of type 'object', you can inspect even more by creating a cell's type:

df['type'] = df['mycolumn'].apply(lambda x: type(x).__name__)

If your column is supposed to be string valued, you can check which cells are not string with:

df[df.type != 'str']