I'm currently loading in my data with one single dataset class. Within the dataset, I split the train, test, and validation data separately. For example:
class Data():
def __init__(self):
self.load()
def load(self):
with open(file=file_name, mode='r') as f:
self.data = f.readlines()
self.train = self.data[:checkpoint]
self.valid = self.data[checkpoint:halfway]
self.test = self.data[halfway:]
Many of the details have been omitted for the sake of readability. Basically, I read in one big dataset and make the splits manually.
My question is arising from how to override the __len__
method when the lengths of my train, valid, and test data all differ?
The reason I want to do this is because I want to keep the split data in one single class, and I also want to create separate Dataloaders for each, and so something like:
def __len__(self):
return len(self.train)
wouldn't be appropriate for self.test
and self.valid
.
Perhaps I'm fundamentally misunderstanding the Dataloader, but how should I approach this issue? Thanks in advance.