I have two DataFrames as follows,
DataFrame A:
DEPT_ID DEPT_NAME
10 Finance
20 Marketing
DataFrame B:
EMP_ID DEPT_ID EMP_NAME EMP_SALARY
101 10 AAAA 1000
102 20 BBBB 2000
103 10 CCCC 1500
104 20 DDDD 3000
Expected Result: in Pyspark, I need to join Data Frame A & B on DEPT_ID and convert the non key columns in Dataframe B to_json string and store it in Json_Data column in Dataframe C
DataFrame C:
DEPT_ID DEPT_NAME Json_Data
10 Finance [{"_status": "normal","EmpDetails":{"EMP_ID":"101","EMP_NAME":"AAAA","EMP_SALARY":"1000"},{"EMP_ID":"103","EMP_NAME":"CCCC","EMP_SALARY":"1500" }]
20 Marketing [{"_status": "normal","EmpDetails":{"EMP_ID":"102","EMP_NAME":"BBBB","EMP_SALARY":"2000"},{"EMP_ID":"104","EMP_NAME":"DDDD","EMP_SALARY":"3000" }]