0
votes

I have two data frames "base_level" and "raw_inventory" with the following columns:

"base_level" columns -> "a" , "b", "c" , "inventory_id"...

"raw_inventory" columns -> "1", "2", "3", "inventoryparentid",.....

when I use merge join directly as shown below, everything works as expected.

level = pd.merge(base_level, raw_inventory, left_on='inventory_id', right_on='inventoryparentid', how='left')

print(level)

But when I use it in a function and try to call as shown below

def inv_level ( child_inv, parent_inv, lefton, righton, how ):
      level_inv = pd.merge(parent_inv, child_inv, left_on=lefton, right_on=righton, how=how)
      return level_inv


level = inv_level(base_level, raw_inv, 'inventory_id', 'inventoryparentid', 'left')
print(level)

It throws the following error


  File "C:\temp\env\3.8.6\lib\site-packages\pandas\core\reshape\merge.py", line 652, in __init__
    ) = self._get_merge_keys()
  File "C:\temp\env\3.8.6\lib\site-packages\pandas\core\reshape\merge.py", line 1005, in _get_merge_keys
    right_keys.append(right._get_label_or_level_values(rk))
  File "C:\temp\env\3.8.6\lib\site-packages\pandas\core\generic.py", line 1563, in _get_label_or_level_values
    raise KeyError(key)
KeyError: 'inventoryparentid'

I am not able to identify what could be the reason. Any inputs regarding this issue is appreciated.

Edit:

I tried in following sample code to show case what I am trying to do and easy for understanding. I get the same error.

import numpy as np
import pandas as pd


def inv_level ( child_inv, parent_inv, lefton, righton, how ):
    level_inv = pd.merge(parent_inv, child_inv, left_on=lefton, right_on=righton, how=how)
    return level_inv


def main(event, context):
    np.random.seed(0)
    # transactions
    left = pd.DataFrame({'transaction_id': ['A', 'B', 'C', 'D'], 
                        'user_id': ['Peter', 'John', 'John', 'Anna'],
                        'value': np.random.randn(4),
                    })
    # users
    right = pd.DataFrame({'new_id': ['Paul', 'Mary', 'John', 'Anna'],
                        'favorite_color': ['blue', 'blue', 'red', 
                                            np.NaN],
                        })

'''
    test = inv_level(left, right, 'user_id', 'new_id', 'left') #left.merge(right, on='user_id', how='left')
     The above throws an error
'''
    test = pd.merge(left, right, left_on='user_id', right_on='new_id', how='left') 

    print(test)

if __name__ == "__main__":
    main("", "")

Error:

File "C:\temp\env\3.8.6\lib\site-packages\pandas\core\reshape\merge.py", line 1005, in _get_merge_keys right_keys.append(right._get_label_or_level_values(rk)) File "C:\temp\env\3.8.6\lib\site-packages\pandas\core\generic.py", line 1563, in _get_label_or_level_values raise KeyError(key) KeyError: 'new_id'

Here is the intended output:

  transaction_id user_id     value new_id favorite_color
0              A   Peter  1.764052    NaN            NaN
1              B    John  0.400157   John            red
2              C    John  0.978738   John            red
3              D    Anna  2.240893   Anna            NaN

Thanks,

1
Check 'inventoryparentid' column on your dataframes.LoukasPap
@L.Papadopoulos Thank you for your response. I did check the column and it exists. I added small python script with the sample data and it has the same error. Probably it helps to understand the issue.RajData
Can you please, put the desired output? Of the testing dataframes?LoukasPap

1 Answers

0
votes

Try this:

test = inv_level(left, right, left['user_id'], right['new_id'], 'left')