When I import a csv file in pandas, I get a DtypeWarning:
Columns (3) have mixed types. Specify dtype option on import or set low_memory=False.
- How do I find out what the dtype is of each cell? I think there might be some issue with the data that is why the warning is coming but it is a file with ~5 milllion rows so hard to ideentify the culprit?
- Is it a good practice to specify dtype on Import? Aid if that is done, will it not result in "loss" of data?
dtype != type
, The warning is telling you, most likely, that a column has things that look like integers and things that look like strings. For example, the SEDOL security identifier has some identifiers that look like integers200001
for Amazon, and others that are stringsB02HJf
(made that up). I can specify thedtype
of the entire column by passing a dictionary to theconverters
argument that tellsread_csv
how to convert stuff.read_csv(... converters={'ID': str}, ...)
to make sure myID
column comes in as strings. This dictionary can take care of other columns too. – piRSquared