python - Deleting multiple columns based on column names in Pandas

Question

I have some data and when I import it I get the following unneeded columns I'm looking for an easy way to delete all of these

   'Unnamed: 24', 'Unnamed: 25', 'Unnamed: 26', 'Unnamed: 27',
   'Unnamed: 28', 'Unnamed: 29', 'Unnamed: 30', 'Unnamed: 31',
   'Unnamed: 32', 'Unnamed: 33', 'Unnamed: 34', 'Unnamed: 35',
   'Unnamed: 36', 'Unnamed: 37', 'Unnamed: 38', 'Unnamed: 39',
   'Unnamed: 40', 'Unnamed: 41', 'Unnamed: 42', 'Unnamed: 43',
   'Unnamed: 44', 'Unnamed: 45', 'Unnamed: 46', 'Unnamed: 47',
   'Unnamed: 48', 'Unnamed: 49', 'Unnamed: 50', 'Unnamed: 51',
   'Unnamed: 52', 'Unnamed: 53', 'Unnamed: 54', 'Unnamed: 55',
   'Unnamed: 56', 'Unnamed: 57', 'Unnamed: 58', 'Unnamed: 59',
   'Unnamed: 60'

They are indexed by 0-indexing so I tried something like

    df.drop(df.columns[[22, 23, 24, 25, 
    26, 27, 28, 29, 30, 31, 32 ,55]], axis=1, inplace=True)

But this isn't very efficient. I tried writing some for loops but this struck me as bad Pandas behaviour. Hence i ask the question here.

I've seen some examples which are similar (Drop multiple columns pandas) but this doesn't answer my question.

What do you mean, efficient? Is it running too slow? If your problem is that you don't want to get the indices of all the columns that you want to delete, please note that you can just give df.drop a list of column names: df.drop(['Unnamed: 24', 'Unnamed: 25', ...], axis=1) — Carsten
Would it not be easier to just subset the columns of interest: i.e. df = df[cols_of_interest], otherwise you could slice the df by columns and get the columns df.drop(df.ix[:,'Unnamed: 24':'Unnamed: 60'].head(0).columns, axis=1) — EdChum
Might be worth noting that in most cases it's easier just to keep the columns you want then delete the ones that you don't: df = df['col_list'] — sparrow

EdChum EdChum · Accepted Answer · 2015-02-16T09:58:11

I don't know what you mean by inefficient but if you mean in terms of typing it could be easier to just select the cols of interest and assign back to the df:

df = df[cols_of_interest]

Where cols_of_interest is a list of the columns you care about.

Or you can slice the columns and pass this to drop:

df.drop(df.ix[:,'Unnamed: 24':'Unnamed: 60'].head(0).columns, axis=1)

The call to head just selects 0 rows as we're only interested in the column names rather than data

update

Another method: It would be simpler to use the boolean mask from str.contains and invert it to mask the columns:

In [2]:
df = pd.DataFrame(columns=['a','Unnamed: 1', 'Unnamed: 1','foo'])
df

Out[2]:
Empty DataFrame
Columns: [a, Unnamed: 1, Unnamed: 1, foo]
Index: []

In [4]:
~df.columns.str.contains('Unnamed:')

Out[4]:
array([ True, False, False,  True], dtype=bool)

In [5]:
df[df.columns[~df.columns.str.contains('Unnamed:')]]

Out[5]:
Empty DataFrame
Columns: [a, foo]
Index: []

python - Deleting multiple columns based on column names in Pandas

10 Answers