0
votes

I am trying to make a linear regression with fixed slope (to 1) and intercept (to 0), and finally want to remove points far from this fit using statistics.

# set slope =1
my_slope <-1
mylm <- lm(as.numeric(y) ~ 1 + offset(x*my_slope), data = mydata)

# set intercept=0
my_intercept<-0
mylim<-lim(I(x-intercept)~0+y)

I am not sure if these two codes are correct, first. If yes, is there any way to combine these two codes in a easier way?

1
It seems as if you want to define the complete linear equation yourself. Then you can simply remove the outliers using subset(data, abs(y - 1 + x) < threshold), or by any other formula like squared distance. No need for a linear regression.ziggystar
The purpose of linear regression is to find the slope and the intercept ! Sometines, for reasonable causes, you may force the intercept (for example if you know you're dealing with a linear phenomenon). But if you fix both slope and intercept, what could be the use of you regression ? You all ready have the equation of your straight line, and you can compute easliy the errors from your dataset, assuming this data follow the scheme given par your equation ..MrSmithGoesToWashington
thanks guys. @ziggystar, could you provide me an example for applying squared distance or whatever which is statistically acceptable as a threshold? I tried Mahalanobis Outliers, confidence ellipse, etc.. but they are not what I actually want. I want to extract points only close to the 1:1 fit.user2928318
There is no concept of statistical acceptability for what you want to do. Maybe if you describe your motives, we can get to something. Otherwise, just do what you require. Actually, it does not make sense to use squared distance as the threshold. The formula I have in the first comment is all you need, if you want to consider each point independently.ziggystar
@Ziggystar, you are right. the formula that you suggested is alternatively what I want. Thanks. :)user2928318

1 Answers

0
votes

As you already have your linear model fully specified, you don't need to do any regression. Your model is

y = x + 1

You can then prune any points, whose residual (distance to predicted value) is greater than d using

data.pruned = subset(data,abs(y - (x+1)) <= d)

Note that if you want to prune values whose squared residual is less than dq, you simply set

d = sqrt(dq)