I trained a Spark ML model, scored my holdout dataset with it, and now need to look up the prediction for specific entities.
How can I figure out which prediction is for whom? Is there a way I can add the entity primary key (e.g. Member_ID) to my prediction output?
More specifically: to score the dataset, I used:
predictions = trained_model.transform(holdout_data)
It produces a dataframe with columns: "features", "label", "prediction" (label is the response variable)
How do I find out the corresponding Member_ID for each prediction?