How can I add position of similar records to two separate csv files?

Question

First, I just started with pandas and my task is that I have two csv files, I read them, compare and append which rows are they occupying in both files. I am using pandas DataFrame. Now I have to write it back to csv, append extra column with these positions. I am thinking about using dict or lists and append them at the end but maybe there is more efficient and elegant solution.

df1_index = 0
df1['position'] = 0

for row in df1['Names']:
    ur = row.upper()
    df1_index += 1
    df2_index = 1
    for line in df2['Students']:
        ul = line.upper()
        if ur == ul:
            df1['position'].append(row + " in second file: " + str(df2_index) + " row.")
            print(row + " is also present in the second file in " + str(df2_index) + " row.")
            print(line + " is also present in the first file in " + str(df1_index) + " row.")
        df2_index += 1

EDIT: the above approach has very bad efficiency, I have to generalize the question

So I have two csv files: csv1.csv:

2010-12-10, Steve, Jobs,
20 2010-12-12, Elon, Musk,
23 2011-01-02, Larry, Page, 25

csv2.csv:

2015-01-03, STeve JObs, 34
2015-02-13, Elon Musk, 27
2015-01-17, LArry Page, 18
2015-09-11, Steve Jobs, 25

The task is about finding similarities in these two files by name ofc and adding a row (in new column) with their index/indices from another file. For now I extract columns (merge 2 from csv1 and append as new column, both as DataFrames), uppercase them and I can print their positions in the opposite file (very primitive, nested loop and external int indices). Does pandas provide some tools to get what I want? Do you have any idea for possible solution? What should I learn more about? ;)

Desired output of ex. csv1:

2010-12-10, Steve, Jobs, 20, [1,4]
2010-12-12, Elon, Musk, 23,
2011-01-02, Larry, Page, 25,

csv2:

csv2.csv:

2015-01-03, STeve JObs, 34,1
2015-02-13, Elon Musk, 27,
2015-01-17, LArry Page, 18,
2015-09-11, Steve Jobs, 25,1

How to add the position of ur and ul to opposite files and whether my idea with dictionary (to add multiple values to one index) is a good one or maybe there is some other, more efficient. — Alexxx
Why dont you use .merge() to mark which rows are in both datasets? — Let's try
I am not sure, I have to keep indexes of records in both original files so I could append them at the end — Alexxx

el_bobo el_bobo · Accepted Answer · 2020-08-12T08:41:07

If I understand you right you could:

# list of the values in the common column
list_common = df['common'].tolist()

# Get the index of the values matching in the other dataframes
matching_df1 = df.index[df1['Student'].isin(list_common)].tolist()
matching_df2 = df.index[df2['Student'].isin(list_common)].tolist()

The output list store the indexes of the value in their dataframes. I hope this helps, from there you could use the format that helps you suits you the most

How can I add position of similar records to two separate csv files?

1 Answers