1
votes

I am newbie in R, but trying to use it with SQL Server R Services.

As input to R script I select few numeric columns and one text column with categorical data. Let it be "company_name".

Then I execute:

 declare @script_r nvarchar(max) 
    = '
dat <- InputDataSet;
model <- lm(first_commission_received_delay ~ ., data = dat);
print(coef(model));
predict(model, dat);
'
EXEC sp_execute_external_script
        @language =N'R'
        , @script = @script_r 
        , @input_data_1 = @script_txt

But I am getting error:

An external script error occurred: Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor company_name has new levels XXX Calls: source ... predict -> predict.lm -> model.frame -> model.frame.default

And in the list of coefficients which shown with function "print(coef(model))" there is no dummy variable "company_nameXXX", but exist all others. I understand, that dummy variable "company_nameXXX" was not created because of multicollinearity.

But how to use calculated model for prediction? Even for the same set of data... How to fix that error?

1

1 Answers

1
votes

I can fix that error using additional creation of dummy variables myself:

dat <- InputDataSet;

FactoredVariable = factor(dat$company_name) 
dumm = as.data.frame(model.matrix(~FactoredVariable)[,-1])
datWithDummies = cbind(dat[, -2], dumm) #join 2 tables except 2nd column (company_name) from dat #print(colnames(dat))

model <- lm(first_commission_received_delay ~ ., data = datWithDummies);
predicted.delay <- predict(model, datWithDummies);

print(cbind(predicted.delay, datWithDummies[, 1]))

But I still don't know, why my initial code returned error...