0
votes

I have two columns A and B with corresponding keys and values if split by ':'. I am trying to create a dictionary in ColumnC to be able to later add more columns based on keys in ColumnC.

ColumnA ColumnB ColumnC abc def
abc:def:ghi 111:222:333 {'abc':111,'def':222,'ghi':333} 111 222
jkl:def 444:555 {'jkl':444,'def':555} 555
abc:stu:vwx:ach 666:777:888:999 {'abc':666,'stu':777,'vwx':888,'ach':999} 666

The code below works when the lists are not parts for panda series, but when I apply to panda series, it throws TypeError: unhashable type: 'list'.

ColumnA = 'abc:def:ghi'
ColumnB = '111:222:333'

ColumnA_list = ColumnA.split(':')
ColumnB_list = ColumnB.split(':')

ColumnC = dict(zip(ColumnA_list, ColumnB_list))
print(ColumnC)

However, the following code throws an error:

KEYS = df['ColumnA'].str.split(':')
VALUES = df['ColumnB'].str.split(':')

df['ColumnC'] = dict(zip(KEYS, VALUES))
print(df['ColumnC'])

TypeError: unhashable type: 'list'

I tried creating dictionary using other methods, but cannot get past this error. I also tried converting them to tuples and sets first, but did not work. Is this the best way to achieve what I am trying get from ColumnC. Is there an alternate method?

2

2 Answers

0
votes

Simplified a bit in the process as below:

Code:

import pandas as pd
import numpy as np

ColumnA = 'abc:def:ghi'
ColumnB = '111:222:333'
ColumnC = dict(zip(ColumnA.split(':'), ColumnB.split(':')))
print(ColumnA, ColumnB, ColumnC)

df = pd.DataFrame()
df['ColumnA'] = ColumnA.split(':')
df['ColumnB'] = ColumnB.split(':')
df['ColumnC'] = ColumnC.items()
print(df)

Output:

abc:def:ghi 111:222:333 {'abc': '111', 'def': '222', 'ghi': '333'}
  ColumnA ColumnB     ColumnC
0     abc     111  (abc, 111)
1     def     222  (def, 222)
2     ghi     333  (ghi, 333)
0
votes

You could try:

>>> df
           ColumnA          ColumnB
0      abc:def:ghi      111:222:333
1          jkl:def          444:555
2  abc:stu:vwx:ach  666:777:888:999
>>>
>>> df['ColumnC'] = df.apply(lambda row: dict(zip(row.ColumnA.split(':'), row.ColumnB.split(':'))), axis=1)
>>>
>>> df['ColumnC']
0    {'abc': '111', 'def': '222', 'ghi': '333'}              
1    {'jkl': '444', 'def': '555'}                            
2    {'abc': '666', 'stu': '777', 'vwx': '888', 'ach': '999'}
Name: ColumnC, dtype: object