1
votes

I am trying to find predictors for people selling their cars by doing a logistic regression. My sample size is n=922 and has mostly kardinal and ordinal variables. Since some of my variables have up to 7 categories (--> 6 dummyvariables) I came across separation. In the literature they recommend the bias-reduced logistic regression approach of Firth.

After installing the package I used the following formula:

logistf(formula = attr(data, "formula"), data = sys.parent(), pl = TRUE, ...)

and entered (or tried to enter) my data:

mydataBrAll <- logistf(formula = attr(mydataBr$Verkauft, "formula"), data = mydataBr, pl = FALSE)
summary(mydataBrAll)

Verkauft being my dependent variable and mydataBr being my data

What kind of term has to be entered in "formula" ? And if this works, can I use the stepwise backwards algorithm (or the pseudo R² etc) the same way as I'd use it in a regular log.reg. model?:

'Backwards Selection'
backwards <- step(mydataBrAll, direction = "backward")

Some of you might consider this as an easy problem, but I can't figure it out with the help of the explanations online.

Any help is very much appreciated!

1

1 Answers

0
votes

The formula should be a regular formula object used in most modeling functions in R (like lm(), glm(), etc.). You can get details on how to write a formula by typing help(formula) in an R command line.

Essentially a formula is an expression of the form y ~ model. Quoting from the details section of help(formula):

An expression of the form y ~ model is interpreted as a specification that the response y is modelled by a linear predictor specified symbolically by model.

In your example, assuming you had 3 predictors called x1, x2, x3 (which could be either numeric or factor variables), you could write:

mydataBrAll <- logistf(formula = Verkauft ~ x1 + x2 + x3, data = mydataBr, pl = FALSE)
summary(mydataBrAll)

Note that the formula could be more complicated and involve transformations of the predictor variables. Again, for more details see the documentation of formula.

Just so you know, the attr(data, "formula") expression shown in the documentation of logistf() as part of the function signature (a.k.a. API) refers to the attribute called formula that might exist in the object containing the data you are passsing in the data parameter of the function. Since this is not the case in your example (as clearly you haven't added that attribute to your data object), you must explicitly define the formula when calling logistf().
For more information about attributes in objects, see help(attr).

For your second question about backward selection: in the Details section of the documentation for logistf() you see the following sentence:

Furthermore, forward and backward functions perform convenient variable selection

I.e. the package contains a forward() and backward() function that can be used to perform forward and backward selection of predictors.

The step() function might also work, as nothing indicates the contrary in its documentation page.