how to read from an array without a particular column in python

Question

I have a numpy array of dtype = object (which are actually lists of various data types). So it makes a 2D array because I have an array of lists (?). I want to copy every row & only certain columns of this array to another array. I stored data in this array from a csv file. This csv file contains several fields(columns) and large amount of rows. Here's the code chunk I used to store data into the array.

data = np.zeros((401125,), dtype = object)
for i, row in enumerate(csv_file_object):
    data[i] = row

data can be basically depicted as follows

column1  column2  column3  column4  column5 ....
1         none     2       'gona'    5.3
2         34       2       'gina'    5.5
3         none     2       'gana'    5.1
4         43       2       'gena'    5.0
5         none     2       'guna'    5.7
.....     ....   .....      .....    ....
.....     ....   .....      .....    ....
.....     ....   .....      .....    ....

There're unwanted fields in the middle that I want to remove. Suppose I don't want column3. How do I remove only that column from my array? Or copy only relevant columns to another array?

Are you looking to process the CSV input before it gets into the numpy array, or to remove columns from the array after it's been created? (Or just "whichever is easier" or "whichever is faster"?) — abarnert
@maheshakyha: Then I think root's answer is the easiest. If you can't/don't want to replace your reading with pandas.read_csv, then probably my numpy.delete is easiest, but I think you're better off with his answer. — abarnert

root root · Accepted Answer · 2013-01-28T09:17:28

Use pandas. Also it seems to me, that for various type of data as yours, the pandas.DataFrame may be better fit.

from StringIO import StringIO
from pandas import *
import numpy as np

data = """column1  column2  column3  column4  column5
1         none     2       'gona'    5.3
2         34       2       'gina'    5.5
3         none     2       'gana'    5.1
4         43       2       'gena'    5.0
5         none     2       'guna'    5.7"""

data = StringIO(data)
print read_csv(data, delim_whitespace=True).drop('column3',axis =1)

out:

   column1 column2 column4  column5
0        1    none  'gona'      5.3
1        2      34  'gina'      5.5
2        3    none  'gana'      5.1
3        4      43  'gena'      5.0
4        5    none  'guna'      5.7

If you need an array instead of DataFrame, use the to_records() method:

df.to_records(index = False)
#output:
rec.array([(1L, 'none', "'gona'", 5.3),
           (2L, '34', "'gina'", 5.5),
           (3L, 'none', "'gana'", 5.1),
           (4L, '43', "'gena'", 5.0),
           (5L, 'none', "'guna'", 5.7)], 
            dtype=[('column1', '<i8'), ('column2', '|O4'),
                   ('column4', '|O4'), ('column5', '<f8')])

how to read from an array without a particular column in python

3 Answers