The code below maps values and column names of my reference df with my actual dataset, finding exact matches and if an exact match is found, return the OutputValue
. However, I'm trying to add the rule that when PrimaryValue = DEFAULT
to also return the OutputValue
.
The solution I'm trying out to tackle this is to create a new dataframe with null values - since there was no match provided by code below. Thus the next step would be to target the null values whose corresponding PrimaryValue = DEFAULT
to replace null by the OutputValue
.
#create a map based on columns from reference_df
map_key = concat_ws('\0', final_reference.PrimaryName, final_reference.PrimaryValue)
map_value = final_reference.OutputValue
#dataframe of concatinated mappings to get the corresponding OutputValues from reference table
d = final_reference.agg(collect_set(array(concat_ws('\0','PrimaryName','PrimaryValue'), 'OutputValue')).alias('m')).first().m
#display(d)
#iterate through mapped values
mappings = create_map([lit(i) for i in chain.from_iterable(d)])
#dataframe with corresponding matched OutputValues
dataset = datasetM.select("*",*[ mappings[concat_ws('\0', lit(c), col(c))].alias(c_name) for c,c_name in matched_List.items()])
display(dataset)
primaryLookupAttributeName_List
does not exists in datasetMatchedPortfolio which will yield ERROR? so you want to add a default name to go through the ERROR? - jxcDEFAULT
, it will have a regular value. WhenPrimaryLookupAttributeName
is DEFAULT then I will like to replace those null (no match found) by the correspondingOutputItemNameByValue
. I will update my question with more info! - jgtrzcoalesce(mappings[concat_ws('\0', lit(c), col(c))], lit("DEFAULT")).alias(c_name)
. make sure to import pyspark.sql.functions.coalesce - jxcdatasetPrimaryAttributes_False =
- jgtrz