0
votes

My concern is using pandas to drop null values that have "Unknown" keyword in them. This specific dataset happens to have all NaN null values with this keyword already as a default when I uploaded the .csv file.

Picture: Data head: 121 values, 8 columns Info about the dataset itself is as follows:

<class 'pandas.core.frame.DataFrame'>
Index: 119 entries, ROMANIA to CZECH REPUBLIC
Data columns (total 7 columns):
authority               119 non-null object
date                    119 non-null object
fine                    119 non-null object
controller/processor    119 non-null object
quoted article          119 non-null object
type                    119 non-null object
infos                   119 non-null object
dtypes: object(7)
memory usage: 9.9+ KB

I already used gdpr_fines.isnull().sum(), gdpr_fines.dropna() and gdpr_fines = gdpr_fines.drop_duplicates() functions in clearing the data but without success.

This problem arise when I tried to filter specifically 'fine' column (fines = gdpr_fines['fine']) and tried to convert it from string to float with float(fines) function, but I get the following error:

TypeError: cannot convert the series to

I'm not 100% sure is the problem that pandas does not recognize fine amounts as numbers at all OR am I getting error because of having some "Unknown" NaN value cells in the column.

1
Do I get it right that you want to remove all rows from the dataset, where the fine-column has the value 'Unknown'? - Lukas Thaler
Welcome to StackOverflow. Please take the time to read this post on how to provide a great pandas example as well as how to provide a minimal, complete, and verifiable example and revise your question accordingly. These tips on how to ask a good question may also be useful. - jezrael

1 Answers

0
votes

If you replace all values 'Unknown' with np.nan, you will be able to perform a .dropna() on your dataframe.

import numpy as np
gdpr_fines = gdpr_fines.replace('Unknown', np.nan)
gdpr_fines = gdpr_fines.dropna()