I need to add a column to my DASK dataframe, which should contain auto-increment IDs. I have an idea how to do it in Pandas, as I have found a Pandas solution on SO, but I cannot figure out how to do it in DASK. My best attempt looks like this, and it turns out the autoincrement function only runs twice for my 100 line test file and all of the ids are 2.
def autoincrement(self):
print('*')
self.report_line = self.report_line + 1
return self.report_line
self.df = self.df.map_partitions(
lambda df: df.assign(raw_report_line=self.autoincrement())
)
The Pandas way looks something like this
df.insert(0, 'New_ID', range(1, 1 + len(df)))
Alternatively, if I can fetch the row number of the specific CSV row and add that to a column, that would be great, at this stage, it does not seem easily possible.