First, I just started with pandas and my task is that I have two csv files, I read them, compare and append which rows are they occupying in both files. I am using pandas DataFrame. Now I have to write it back to csv, append extra column with these positions. I am thinking about using dict or lists and append them at the end but maybe there is more efficient and elegant solution.
df1_index = 0
df1['position'] = 0
for row in df1['Names']:
ur = row.upper()
df1_index += 1
df2_index = 1
for line in df2['Students']:
ul = line.upper()
if ur == ul:
df1['position'].append(row + " in second file: " + str(df2_index) + " row.")
print(row + " is also present in the second file in " + str(df2_index) + " row.")
print(line + " is also present in the first file in " + str(df1_index) + " row.")
df2_index += 1
EDIT: the above approach has very bad efficiency, I have to generalize the question
So I have two csv files: csv1.csv:
- 2010-12-10, Steve, Jobs,
- 20 2010-12-12, Elon, Musk,
- 23 2011-01-02, Larry, Page, 25
csv2.csv:
- 2015-01-03, STeve JObs, 34
- 2015-02-13, Elon Musk, 27
- 2015-01-17, LArry Page, 18
- 2015-09-11, Steve Jobs, 25
The task is about finding similarities in these two files by name ofc and adding a row (in new column) with their index/indices from another file. For now I extract columns (merge 2 from csv1 and append as new column, both as DataFrames), uppercase them and I can print their positions in the opposite file (very primitive, nested loop and external int indices). Does pandas provide some tools to get what I want? Do you have any idea for possible solution? What should I learn more about? ;)
Desired output of ex. csv1:
- 2010-12-10, Steve, Jobs, 20, [1,4]
- 2010-12-12, Elon, Musk, 23,
- 2011-01-02, Larry, Page, 25,
csv2:
csv2.csv:
- 2015-01-03, STeve JObs, 34,1
- 2015-02-13, Elon Musk, 27,
- 2015-01-17, LArry Page, 18,
- 2015-09-11, Steve Jobs, 25,1
.merge()
to mark which rows are in both datasets? – Let's try