0
votes

I cannot delete a column from csv using pandas. I tried to delete it in many ways using different axis, del function but it doesn't work. Does somebody know why ?

Here is my pandas.head()

age;"job";"marital";"education";"default";"balance";"housing";"loan";"contact";"day";"month";"duration";"campaign";"pdays";"previous";"poutcome";"y"
0  58;"management";"married";"tertiary";"no";2143...
1 44;"technician";"single";"secondary";"no";29;"...
2 33;"entrepreneur";"married";"secondary";"no";2...
3 47;"blue-collar";"married";"unknown";"no";1506...
4 33;"unknown";"single";"unknown";"no";1;"no";"n...

Here is my code:

import pandas  
df = pd.read_csv('bank-full.csv')
print(df.head())
df = df.drop(['day', 'poutcome'], axis=1)

Here is the error:

Traceback (most recent call last):
  File "/home/administrator/PycharmProjects/BankMarketinData/main.py", line 21, in 
    main()
  File "/home/administrator/PycharmProjects/BankMarketinData/main.py", line 19, in main
    df = df.drop(['day', 'poutcome'], axis=1)
  File "/home/administrator/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 3697, in drop
    errors=errors)
  File "/home/administrator/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 3111, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
  File "/home/administrator/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 3143, in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
  File "/home/administrator/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 4404, in drop
    '{} not found in axis'.format(labels[mask]))
KeyError: "['day' 'poutcome'] not found in axis"
3
I think you have quotes in your column headers. try df.columns = df.columns.str.strip('\"') - Scott Boston
Please show df.head().to_dict() maybe you have whitespace ? - Scott Boston
Posted answer below. Tested it on a sample dataframe - Edeki Okoh
My guess is that your column labels are quoted, e.g., "day" - Paul H

3 Answers

1
votes

So it's a pretty simple problem. First of all, i would advise you to use delimiter whenever you're dealing with tabular data. Now let's focus on your problem, so you're reading your dataframe like this:

import pandas as pd  
df = pd.read_csv('bank-full.csv')
df = df.drop(['day', 'poutcome'], axis=1)

Now your column names contain "" in them. So the name of your columns is "day" & "poutcome" not day & poutcome. Remember these double quotes "" are part of your column name. So you should write something like this to drop these columns:

df = df.drop(['"day"', '"poutcome"'], axis=1)

I hope this helps you. If you've any further questions, let me know

0
votes

You can drop them one by one, or use a loop to drop multiple columns. You do need to make sure that those column names are the ones in the dataframe. It looks like from your question your column name are wrapped in "". Make sure to define your delimiter correctly when reading in the dataframe also. When using read_csv it will default to ',', but in this case it is ';'.

One by one

df = pd.read_csv('bank-full.csv', sep=';')
df = df.drop(['day'], axis=1)
df = df.drop(['poutcome'], axis=1)

Loop

df = pd.read_csv('bank-full.csv', sep=';')
Drop_list = ['day','poutcome']
for column in Drop_list: 
    df = df.drop([column], axis=1)

Test I used for question:

df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
df.head(5)

              A         B         C         D
    0  0.860680 -0.408577  0.727530 -0.119050
    1 -1.140042  0.241970 -1.509257 -0.303601
    2  0.811929  0.146228  2.102941  0.772328
    3 -0.590157  0.753719  0.220592 -0.563953
    4  0.031505 -0.521978  0.410718 -0.325865

Drop_list = ['A','B','C']
for column in Drop_list:
    df = df.drop([column], axis=1)
df.head(5)

          D
0 -0.119050
1 -0.303601
2  0.772328
3 -0.563953
4 -0.325865
0
votes
df = pd.read_csv('bank-full.csv', sep=';')
df.columns = [col.replace('"', '') for col in df.columns]
df.drop(columns=['day','poutcome'], inplace=True)

As you can see from the follow up comments, your issues are that you have the wrong separator when bringing in your csv file. Then, you need to remove the quotation marks that are in your column names so you can drop those columns.