The code below maps values and column names of my reference df with my actual dataset, finding exact matches and if an exact match is found, return the OutputValue. However, I'm trying to add the rule that when PrimaryValue = DEFAULT to also return the OutputValue.
The solution I'm trying out to tackle this is to create a new dataframe with null values - since there was no match provided by code below. Thus the next step would be to target the null values whose corresponding PrimaryValue = DEFAULT to replace null by the OutputValue.
#create a map based on columns from reference_df
map_key = concat_ws('\0', final_reference.PrimaryName, final_reference.PrimaryValue)
map_value = final_reference.OutputValue
#dataframe of concatinated mappings to get the corresponding OutputValues from reference table
d = final_reference.agg(collect_set(array(concat_ws('\0','PrimaryName','PrimaryValue'), 'OutputValue')).alias('m')).first().m
#display(d)
#iterate through mapped values
mappings = create_map([lit(i) for i in chain.from_iterable(d)])
#dataframe with corresponding matched OutputValues
dataset = datasetM.select("*",*[ mappings[concat_ws('\0', lit(c), col(c))].alias(c_name) for c,c_name in matched_List.items()])
display(dataset)
primaryLookupAttributeName_Listdoes not exists in datasetMatchedPortfolio which will yield ERROR? so you want to add a default name to go through the ERROR? - jxcDEFAULT, it will have a regular value. WhenPrimaryLookupAttributeNameis DEFAULT then I will like to replace those null (no match found) by the correspondingOutputItemNameByValue. I will update my question with more info! - jgtrzcoalesce(mappings[concat_ws('\0', lit(c), col(c))], lit("DEFAULT")).alias(c_name). make sure to import pyspark.sql.functions.coalesce - jxcdatasetPrimaryAttributes_False =- jgtrz