0
votes

I have a CsvWriter class that is intended to write one single common output file...

import csv

class CsvWriter:
  def __init__(self, outputFileName):
    self.outputFile = outputFileName

  def m_harvestFromAllCsvFiles(): # <-- NOTE THIS METHOD
    with open(self.outputFile, 'wb') as outputCsvFile:
      wr = csv.writer(outputCsvFile, delimiter =",", quoting=csv.QUOTE_ALL)

      # Load 3 CSV files into memory...
      readerA = CsvReader("fileA.csv")
      readerB = CsvReader("fileB.csv")
      readerC = CsvReader("fileC.csv")

      readerA.m_getValues(wr, "attributeA", "attributeZ")
      readerB.m_getValues(wr, "attributeF", "attributeG")
      readerC.m_getValues(wr, "attributeM", "attributeS")

I also have a CsvReader class that allows me to read and store the contents of a CSV file in memory (self.csvFileArray), for each instance (there can be more than one instance)...

import csv

class CsvReader:

  def __init__(self, inputFileName):
    self.nounDef = nounDef
    self.csvFileArray = []
    self.csvHeader = []
    self.csvHeaderDictionary = {}
    with open(inputFileName, 'rU') as csvFile:
      for idx, row in enumerate(csv.reader(csvFile, delimiter=',')):
        if idx == 0:
          self.csvHeader = row
        self.csvFileArray.append(row)
    for idx, key in enumerate(self.csvHeader):
      self.csvHeaderDictionary[key] = idx
  ...
  def m_getValues(csvWriter, attributeList): <<-- NOTE THIS METHOD
    ...

In short, the goal is to open up different CSV files using the CsvReader, where each file that is read into memory may have different attributes, and then harvest specific attributes from each.

It is too slow to query each CsvReader from the CsvWriter class and then write the common output file from the CsvWriter. The steps would be...

1. CsvWriter asks the first CsvReader for its data
   1a. First CsvReader collects the data into a structure and returns it back to the CsvWriter
2. CsvWriter receives writes the data to the common output file
3. CsvWriter asks the second CsvReader for its data
   3a. Second CsvReader collects the data into a structure and returns it back to the CsvWriter
4. CsvWriter receives and writes the data to the common output file
5. CsvWriter asks the third CsvReader for its data
   5a. Third CsvReader collect the data into a structure and returns it back to the CsvWriter
6. CsvWriter receives and writes the data to the common output file

It would be MUCH faster to just pass the writer object (w/ file handle) "self.wr" to the CsvReader and have it write the common file, directly...

1. CsvWriter tells CsvReader A to write data directly to common output file
2. CsvWriter tells CsvReader B to write data directly to common output file
3. CsvWriter tesll CsvReader C to write data directly to common output file

MY QUESTION: Given that the writer object (the file handle) is instantiated in the CsvWriter instance object, is it safe in Python to pass the writer object to other instances/objects, where it was not created (e.g. the CsvReader instances)? [Refer to CsvReader method "m_getValues(csvWriter, attributeList)" that receives the writer as an argument.] If it's not safe, why not and what's the proper way to handle this problem?

2
What happened when you ran your code? I would expect that the file handle would be closed in CsvWriter.__init__ after the with block and so this wouldn't work. - user8651755
Sorry I was correcting it as you wrote your response. I moved the opening of the output file into the method that actually calls the reader for their values, which keeps the file writer object open during those calls. (Still trying to figure out why it doesn't work.) - Information Technology
By the way, I forgot to add that the code does work. My question is about whether or not it's safe and the right way to handle it. -- thx - Information Technology

2 Answers

2
votes

I'd try an approach like the following:

import csv

class Writer:

    def __init__(self, file_name):
        self.file_name = file_name
        self.readers = [
            Reader('test.csv', ('a', 'b', 'c')),
            Reader('test.csv', ('d', 'e')),
        ]

    def run(self):
        with open(self.file_name, 'w') as fp:
            writer = csv.writer(fp)
            for reader in self.readers:
                for row in reader:
                    writer.writerow(row)

class Reader:

    def __init__(self, file_name, fields):
        self.file_name = file_name
        self.fields = fields

    def __iter__(self):
        with open(self.file_name) as fp:
            reader = csv.DictReader(fp)
            for row in reader:
                yield self.convert_row(row)

    def convert_row(self, row):
        # Create subclasses of Reader that override this logic if necessary
        items = tuple(row[k] for k in self.fields)
        return items

w = Writer('test-out.csv')
w.run()

with open(w.file_name) as fp:
    print(fp.read(), end='')

My test.csv looks like this:

a,b,c,d,e
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5

And the output file looks like this:

1,2,3
1,2,3
1,2,3
1,2,3
1,2,3
4,5
4,5
4,5
4,5
4,5
1
votes

As long as the logic using the file handle returned by open(outpath, 'wb') or the csv.writer(outfile) wrapping it operates in succession rather than in parallel (i.e. is single-threaded) and the handle is properly closed after last using it and not prematurely (e.g. given a with open(outpath, 'wb') as outfile block, don't return an object out of the block which holds a reference to outfile), there should be no problems passing it to multiple objects: only one CsvReader instance is "using" it at a time in your code.

However, IMHO you're making this much more complicated than it has to be; think of the single responsibility principle. Even your distinction of "reader" vs. "writer" is blurry.