I am newbie in R, but trying to use it with SQL Server R Services.
As input to R script I select few numeric columns and one text column with categorical data. Let it be "company_name".
Then I execute:
declare @script_r nvarchar(max)
= '
dat <- InputDataSet;
model <- lm(first_commission_received_delay ~ ., data = dat);
print(coef(model));
predict(model, dat);
'
EXEC sp_execute_external_script
@language =N'R'
, @script = @script_r
, @input_data_1 = @script_txt
But I am getting error:
An external script error occurred: Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor company_name has new levels XXX Calls: source ... predict -> predict.lm -> model.frame -> model.frame.default
And in the list of coefficients which shown with function "print(coef(model))" there is no dummy variable "company_nameXXX", but exist all others. I understand, that dummy variable "company_nameXXX" was not created because of multicollinearity.
But how to use calculated model for prediction? Even for the same set of data... How to fix that error?