0
votes

I trained a Spark ML model, scored my holdout dataset with it, and now need to look up the prediction for specific entities.

How can I figure out which prediction is for whom? Is there a way I can add the entity primary key (e.g. Member_ID) to my prediction output?

More specifically: to score the dataset, I used: predictions = trained_model.transform(holdout_data)

It produces a dataframe with columns: "features", "label", "prediction" (label is the response variable)

How do I find out the corresponding Member_ID for each prediction?

1

1 Answers

1
votes

Does holdout_data only contain the columns: ["features", "label"]? If so then add the Member_ID to it.

The .transform() method of the pyspark.ml model adds the extra column prediction to the holdout_data, so if Member_ID is there to begin with, then problem solved.