How to map XGBoost predictions to the corresponding data rows?

Question

XGBoost generates a list of predictions for the test dataset. My question is how can I map the generated predictions to the actual test file rows ? Is it strictly safe to assume that the nth prediction corresponds to the nth data row ? XGBoost leverages multi-threading for its operations. So, in such a setting can it be trusted that the prediction results strictly map to the test data rows ? Ideally would have really loved if there was a way to annotate the predictions with some row identifier from the test data file ?

I am using this example and working with DMatrix data format of XGBoost. https://github.com/dmlc/xgboost/tree/master/demo/binary_classification

Dunstan Dunstan · Accepted Answer · 2016-05-26T03:41:04

I'm not sure if its strictly safe but based on my experience, that assumption works. Also, for most of the code snippets using xgboost I have seen on Kaggle competitions like this one, folks make this same assumption and it works. In short, you can be rest assured that it works, however, I haven't dug into the documentation and so I cant say that it works all the time.

How to map XGBoost predictions to the corresponding data rows?

1 Answers