I've a two columns dataframe, and intend to convert it to python dictionary - the first column will be the key and the second will be the value. Thank you in advance.
Dataframe:
id value
0 0 10.2
1 1 5.7
2 2 7.4
I've a two columns dataframe, and intend to convert it to python dictionary - the first column will be the key and the second will be the value. Thank you in advance.
Dataframe:
id value
0 0 10.2
1 1 5.7
2 2 7.4
See the docs for to_dict
. You can use it like this:
df.set_index('id').to_dict()
And if you have only one column, to avoid the column name is also a level in the dict (actually, in this case you use the Series.to_dict()
):
df.set_index('id')['value'].to_dict()
The answers by joris in this thread and by punchagan in the duplicated thread are very elegant, however they will not give correct results if the column used for the keys contains any duplicated value.
For example:
>>> ptest = p.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value'])
>>> ptest
id value
0 a 1
1 a 2
2 b 3
# note that in both cases the association a->1 is lost:
>>> ptest.set_index('id')['value'].to_dict()
{'a': 2, 'b': 3}
>>> dict(zip(ptest.id, ptest.value))
{'a': 2, 'b': 3}
If you have duplicated entries and do not want to lose them, you can use this ugly but working code:
>>> mydict = {}
>>> for x in range(len(ptest)):
... currentid = ptest.iloc[x,0]
... currentvalue = ptest.iloc[x,1]
... mydict.setdefault(currentid, [])
... mydict[currentid].append(currentvalue)
>>> mydict
{'a': [1, 2], 'b': [3]}
Simplest solution:
df.set_index('id').T.to_dict('records')
Example:
df= pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id','value'])
df.set_index('id').T.to_dict('records')
If you have multiple values, like val1, val2, val3,etc and u want them as lists, then use the below code:
df.set_index('id').T.to_dict('list')
Another (slightly shorter) solution for not losing duplicate entries:
>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id','value'])
>>> ptest
id value
0 a 1
1 a 2
2 b 3
>>> pdict = dict()
>>> for i in ptest['id'].unique().tolist():
... ptest_slice = ptest[ptest['id'] == i]
... pdict[i] = ptest_slice['value'].tolist()
...
>>> pdict
{'b': [3], 'a': [1, 2]}
I found this question while trying to make a dictionary out of three columns of a pandas dataframe. In my case the dataframe has columns A, B and C (let's say A and B are the geographical coordinates of longitude and latitude and C the country region/state/etc, which is more or less the case).
I wanted a dictionary with each pair of A,B values (dictionary key) matching the value of C (dictionary value) in the corresponding row (each pair of A,B values is guaranteed to be unique due to previous filtering, but it is possible to have the same value of C for different pairs of A,B values in this context), so I did:
mydict = dict(zip(zip(df['A'],df['B']), df['C']))
Using pandas to_dict() also works:
mydict = df.set_index(['A','B']).to_dict(orient='dict')['C']
(none of the columns A or B were used as index before executing the line creating the dictionary)
Both approaches are fast (less than one second on a dataframe with 85k rows, 5-year-old fast dual-core laptop).
The reasons I'm posting this: