0
votes

First, I just started with pandas and my task is that I have two csv files, I read them, compare and append which rows are they occupying in both files. I am using pandas DataFrame. Now I have to write it back to csv, append extra column with these positions. I am thinking about using dict or lists and append them at the end but maybe there is more efficient and elegant solution.

df1_index = 0
df1['position'] = 0

for row in df1['Names']:
    ur = row.upper()
    df1_index += 1
    df2_index = 1
    for line in df2['Students']:
        ul = line.upper()
        if ur == ul:
            df1['position'].append(row + " in second file: " + str(df2_index) + " row.")
            print(row + " is also present in the second file in " + str(df2_index) + " row.")
            print(line + " is also present in the first file in " + str(df1_index) + " row.")
        df2_index += 1

EDIT: the above approach has very bad efficiency, I have to generalize the question

So I have two csv files: csv1.csv:

  • 2010-12-10, Steve, Jobs,
  • 20 2010-12-12, Elon, Musk,
  • 23 2011-01-02, Larry, Page, 25

csv2.csv:

  • 2015-01-03, STeve JObs, 34
  • 2015-02-13, Elon Musk, 27
  • 2015-01-17, LArry Page, 18
  • 2015-09-11, Steve Jobs, 25

The task is about finding similarities in these two files by name ofc and adding a row (in new column) with their index/indices from another file. For now I extract columns (merge 2 from csv1 and append as new column, both as DataFrames), uppercase them and I can print their positions in the opposite file (very primitive, nested loop and external int indices). Does pandas provide some tools to get what I want? Do you have any idea for possible solution? What should I learn more about? ;)

Desired output of ex. csv1:

  • 2010-12-10, Steve, Jobs, 20, [1,4]
  • 2010-12-12, Elon, Musk, 23,
  • 2011-01-02, Larry, Page, 25,

csv2:

csv2.csv:

  • 2015-01-03, STeve JObs, 34,1
  • 2015-02-13, Elon Musk, 27,
  • 2015-01-17, LArry Page, 18,
  • 2015-09-11, Steve Jobs, 25,1
1
What is your exact question?dgg32
How to add the position of ur and ul to opposite files and whether my idea with dictionary (to add multiple values to one index) is a good one or maybe there is some other, more efficient.Alexxx
Why dont you use .merge() to mark which rows are in both datasets?Let's try
I am not sure, I have to keep indexes of records in both original files so I could append them at the endAlexxx

1 Answers

0
votes

If I understand you right you could:

# list of the values in the common column
list_common = df['common'].tolist()

# Get the index of the values matching in the other dataframes
matching_df1 = df.index[df1['Student'].isin(list_common)].tolist()
matching_df2 = df.index[df2['Student'].isin(list_common)].tolist()

The output list store the indexes of the value in their dataframes. I hope this helps, from there you could use the format that helps you suits you the most