python - Pandas: ValueError: cannot convert float NaN to integer

Question

I get ValueError: cannot convert float NaN to integer for following:

df = pandas.read_csv('zoom11.csv')
df[['x']] = df[['x']].astype(int)

The "x" is obviously a column in the csv file, but I cannot spot any float NaN in the file, and dont get what does it mean by this.
When I read the column as String, then it has values like -1,0,1,...2000, all look very nice int numbers to me.
When I read the column as float, then this can be loaded. Then it shows values as -1.0,0.0 etc, still there are no any NaN-s
I tried with error_bad_lines = False and dtype parameter in read_csv to no avail. It just cancels loading with same exception.
The file is not small (10+ M rows), so cannot inspect it manually, when I extract a small header part, then there is no error, but it happens with full file. So it is something in the file, but cannot detect what.
Logically the csv should not have missing values, but even if there is some garbage then I would be ok to skip the rows. Or at least identify them, but I do not see way to scan through file and report conversion errors.

Update: Using the hints in comments/answers I got my data clean with this:

# x contained NaN
df = df[~df['x'].isnull()]

# Y contained some other garbage, so null check was not enough
df = df[df['y'].str.isnumeric()]

# final conversion now worked
df[['x']] = df[['x']].astype(int)
df[['y']] = df[['y']].astype(int)

You need to figure out what you want to do with any NaNs, and then do it. — cs95
thanks @jezrael , now df[df['x'].isnull()] did identify a row with "NaN" and I could remove it ! Now with another similar field - this seems to have some other garbage which is not int. Is there generic way to find rows which are not convertable to given datatype, so I can identify and garbage them all? — JaakL
Use pd.to_numeric with errors = coerce instead of astype int then fillna with whatever you want. — Bharath
In v0.24, pandas introduces Nullable Integer Types which support Integer columns with NaNs. See this answer for more information. — cs95

jezrael jezrael · Accepted Answer · 2017-11-16T15:42:48

For identifying NaN values use boolean indexing:

print(df[df['x'].isnull()])

Then for removing all non-numeric values use to_numeric with parameter errors='coerce' - to replace non-numeric values to NaNs:

df['x'] = pd.to_numeric(df['x'], errors='coerce')

And for remove all rows with NaNs in column x use dropna:

df = df.dropna(subset=['x'])

Last convert values to ints:

df['x'] = df['x'].astype(int)

python - Pandas: ValueError: cannot convert float NaN to integer

5 Answers

ValueError: cannot convert float NaN to integer