Python Pandas , fixing values in a column and resetting the dtypes?

Question

My code reads a csv file into a pandas dataframe. However in cases where the csv file has 'null' values in columns I run into issues with the below error : Columns (10,11,12) have mixed types. Specify dtype option on import or set low_memory=False. This null value is mostly found in the last row of the data.

I replace the null with np.nan as below : df.replace('null', np.nan,inplace=True)

However the columns dtype still remains as an object. Is there a way to auto-reinitialize the dtypes or is there an optimal way to cleanup such data ?

jezrael jezrael · Accepted Answer · 2018-01-24T08:23:01

You can use parameter na_values in read_csv:

df = pd.read_csv(path, keep_default_na=False, na_values=["null"])

More information in pandas docs.

If all columns are floats is possible replace and cast:

df = df.replace('null', np.nan).astype(float)

You get object dtype after replace, because mixed values - floats (NaNs) with strings (numbers saved as strings).

Python Pandas , fixing values in a column and resetting the dtypes?

1 Answers