0
votes

I have a csv file that contains trade data for some countries. The data has a format as follows:

rep   par  commodity  value
USA   GER    1         700
USA   GER    2         100
USA   GER    3         400
USA   GER    5         100
USA   GER    80        900
GER   USA    2         300
GER   USA    4         500
GER   USA    5         700
GER   USA    97        450
GER   UK     50        300
UK    USA    4         1100
UK    USA    80        200
UK    GER    50        200
UK    GER    39        650

I intend to make a new dictionary and by using the created dictionary, calculate the total value of common traded commodities between countries. For example, consider trade between USA-GER, I intend to check whether GER-USA is in the data and if it exists, values for the common commodities be summed and do the same for all countries. The dictionary should be like:

Dic_c1c2_producs=
{('USA','GER'): ('1','700'),('2','100'),('3','400'),('5','100'),('80','900'); 
('GER','USA'):('2','300'),('4','500'),('5','700'),('97','450') ;
('GER','UK'):('50','300'); 
('UK','USA'): ('4','80'),('80','200'); 
('UK','GER'): ('50','200'),('39','650')}

As you can see, USA-GER and GER-USA have commodities 2 and 5 in common and the value of these goods are (100+300)+(100+700). For the pairs USA-UK and UK-USA, we have common commodities: 0 so total trade will be 0 as well. For GER-UK and UK-GER, commodity 50 is in common and total trade is 300+200. At the end, I want to have something like:

Dic_c1c2_summation={('USA','GER'):1200;('GER','UK'):500; ('UK','USA'):0}

Any help would be appreciated.

In addition to my post, I have written following lines:

from collections import defaultdict
rfile = csv.reader(open("filepath",'r'))
rfile.next()
dic_c1c2_products = defaultdict(set) 
dic_c_products = {}
country = set()
for row in rfile :
      c1 = row[0]
      c2 = row[1]
      p = row[2]
      country.add(c1)
for i in country :
    dic_c_products[i] = set()
rfile = csv.reader(open("filepath"))
rfile.next()
for i in rfile:
    c1 = i[0]
    c2 = i[1]
    p = i[2]
    v=i[3]
    dic_c_products[c1].add((p,v))
    if not dic_c1c2_products.has_key((c1,c2)) :
        dic_c1c2_products[(c1,c2)] = set()
        dic_c1c2_products[(c1,c2)].add((p,v))

    else:
            dic_c1c2_products[(c1,c2)].add((p,v))
c_list = dic_c_products.keys()
dic_c1c2_productsummation = set()
for i in dic_c1c2_products.keys():
    if dic_c1c2_products.has_key((i[1],i[0])):
       for p1, v1 in dic_c1c2_products[(i[0],i[1])]:
           for p2, v2 in dic_c1c2_products[(i[1],i[0])]:
               if p1==p2:
                  summation=v1+v2
                  if i not in dic_c1c2_productsum.keys():
                       dic_c1c2_productsum[(i[0],i[1])]=(p1, summation)
                  else:
                       dic_c1c2_productsum[(i[0],i[1])].add((p1, summation))
    else:
          dic_c1c2_productsn[i] = " "
1
have you tried anything so far?user1269942
@user1269942 I have posted my code. It does not work thoughhomayoun

1 Answers

1
votes
# save your data in a file called data
import pandas as pd
data = pd.read_csv('data', delim_whitespace=True)
data['par_rep'] = data.apply(lambda x: '_'.join(sorted([x['par'], x['rep']])), axis=1)
result = data.groupby(('par_rep', 'commodity')).filter(lambda x: len(x) >= 2).groupby(('par_rep'))['value'].sum().to_dict()

at the end result is {'GER_UK': 500, 'GER_USA': 1200}