Shiny R: Subset rows in data frame

Question

I am trying to run models by certain factor levels or groups of these levels by using a select input widget in Shiny.

When I subset by one factor level, I get the correct model results. But when I try to run a model that includes all factor levels or groups of the levels, I do not get the correct model estimates.

For example, the correct model estimates when all factor levels are included (i.e. the model is run over the entire data frame) are:

But when I run my app and select all the levels of my factor variable, which represents different geographical regions, I obtain different results:

My question is how can I specify my reactive sub-setting function to accommodate all factor levels or groups of the levels?

Code for individual models that includes all levels and models by factor level for reference:

library(mlogit)
data("Heating", package = "mlogit")
mlogit(depvar ~ ic + oc | 0, data= Heating, shape = "wide", choice = "depvar", varying = c(3:12))
mlogit(depvar ~ ic + oc | 0, data= Heating[Heating$region=="ncostl" , ], shape = "wide", choice = "depvar", varying = c(3:12))
mlogit(depvar ~ ic + oc | 0, data= Heating[Heating$region=="scostl" , ], shape = "wide", choice = "depvar", varying = c(3:12))
mlogit(depvar ~ ic + oc | 0, data= Heating[Heating$region=="mountn" , ], shape = "wide", choice = "depvar", varying = c(3:12))
mlogit(depvar ~ ic + oc | 0, data= Heating[Heating$region=="valley" , ], shape = "wide", choice = "depvar", varying = c(3:12))

Shiny code:

### PART 1 - Load Libraries and Data
library(shiny)           # For running the app
library(mlogit)

#### data
data("Heating", package = "mlogit")

#### PART 2 - Define User Interface for application
ui <- fluidPage(

  ## Application title
  titlePanel("Housing Preference"),

  ## Sidebar with user input elements
  sidebarLayout(
    sidebarPanel(
      p("Select the inputs"), # Header
      # Speciality
      selectInput('regiontype', 'Region', choices = c("northern coastal region"= "ncostl", 
                                               "southern coastal region" = "scostl", 
                                               "mountain region"  = "mountn",
                                               "central valley region"= "valley"), multiple=TRUE, selectize=TRUE)

    ),

    ## Show a plot
    mainPanel(
      verbatimTextOutput("summary")
    )
  )
)

#### PART 3 - Define server logic required to run calculations and draw plots
server <- function(input, output) {

  output$summary <- renderPrint({

    df <- Heating

    ### Subset data
    df.subset <- reactive({ a <- subset(df, region == input$regiontype)
    return(a)})

    ### Model 
    estimates <- mlogit(depvar ~ ic + oc | 0, data= df.subset(), shape = "wide", choice = "depvar", varying = c(3:12))
    summary(estimates)


  })
}

### PART 4 - Run the application 
shinyApp(ui = ui, server = server)

Mark Mark · Accepted Answer · 2018-04-25T20:55:59

The problem is in your use of == in subset.

Let's take a look at your data:

table(Heating$region)

#> valley scostl mountn ncostl 
#>    177    361    102    260

900 rows, scostl and ncostl account for 621 of your rows. However, when I subset passing in a vector of matches only 305 are returned.

nrow(subset(Heating, region == c("ncostl","scostl")))
#> [1] 305

What happened? Why isn't it 621? Vector recycling is biting you here. Because Heating$region and c("ncostly","scostl") aren't the same length, the shorter one is repeated until they are the same length. So you're actually filtering on the pattern of ncostl, scostl and returning those matches.

Instead, you want to use the %in% operator.

nrow(subset(Heating, region %in% c("ncostl","scostl")))
#> [1] 621

Now, there's no vector recycling because each element of Heating$region is checked for membership in the list you provide.

The reason you're getting a vector is that's the output from a multiple selectInput in shiny.

Shiny R: Subset rows in data frame

1 Answers