sklearn variance for Linear Regression prediction

Question

I am trying to fit a Linear model using LinearRegression from scikit. From the predict function, I get a point estimate prediction, but I need a distribution of the possible value with probably the point value from predict being the mean of a Gaussian. I would like to know if there is a way to get such a distribution from any of the scikit models. I checked the variance score, but could not figure out a way to map it to the variance. Please help.

Keith Brodie Keith Brodie · Accepted Answer · 2016-03-19T00:17:10

If the data you're fitting is in fact from a linear-Gaussian process and the sample set you used to fit is large enough and corrupted by Gaussian noise, then you can get the distribution for the predictions from the R^2 coefficient returned by score() method of the linear regression object. R^2 is 1 - (variance of prediction error) / (variance of y). So the variance of the predicted points is:

var(pred) = (1 - R^2) * var(y)

sklearn variance for Linear Regression prediction

1 Answers