1
votes

I have 2 excel files: IDList.csv and Database.csv. IDList contains a list of 300 ID numbers that I want to filter out of the Database, which contains 2000 entries (leaving 1700 entries in the Database).

I tried writing a for loop (For each ID in the IDList, filter out that ID in Database.csv) but am having some troubles with the filter function. I am using Pyvot (http://packages.python.org/Pyvot/tutorial.html). I get a syntax error...Python/Pyvot doesn't like my syntax for xl.filter, but I can't figure out how to correct the syntax. This is what the documentation says:

xl.tools.filter(func, range) Filters rows or columns by applying func to the given range. func is called for each value in the range. If it returns False, the corresponding row / column is hidden. Otherwise, the row / column is made visible.

range must be a row or column vector. If it is a row vector, columns are hidden, and vice versa.

Note that, to unhide rows / columns, range must include hidden cells. For example, to unhide a range: xl.filter(lambda v: True, some_vector.including_hidden)

And here's my code:

import xl 

IDList = xl.Workbook("IDList.xls").get("A1:A200").get() 

for i in range(1,301):
     xl.filter(!=IDList[i-1],"A1:A2000")

How can I filter a column in Database.csv using criteria in IDList.csv? I am open to solutions in Python or an Excel VBA macro, although I prefer Python.

1
I'm almost sure there's a better way to deal with csv files in python than to use an excel api. - phipsgabler
I'm sure there is as well, but the .csv file is filtered with an excel macro and I need to preserve the filter. - user1137778
Looking at your code again, the first argument of filter looks strange to me. Is this kind of a partial application? And is that even valid? Or a spelling mistake? - phipsgabler
Actually, I think you need a lambda here. Try this: xl.filter(lambda cell: cell != IDList[i-1], "A1:A2000"). - phipsgabler
@phg: I tried that and it gives me an AttributeError: 'str' object has no attribute 'shape'. I'm assuming it has something to do w/ the xl.filter code. In any case, the csv solution below works because I can run the filtering macro after the csv filter. Thanks. - user1137778

1 Answers

4
votes
import csv

with open("IDList.csv","rb") as inf:
    incsv = csv.reader(inf)
    not_wanted = set(row[0] for row in incsv)

with open("Database.csv","rb") as inf, open("FilteredDatabase.csv","wb") as outf:
    incsv = csv.reader(inf)
    outcsv = csv.writer(outf)
    outcsv.writerows(row for row in incsv if row[0] not in not_wanted)