0
votes
best<-function(state,outcome){
  data <- read.csv("outcome-of-care-measures.csv")
  filter<-data.frame(cbind(data[, 2],   # hospital
                              data[, 7],   # state
                              data[, 11],  # heart attack
                              data[, 17],  # heart failure
                              data[, 23]), # pneumonia
                              stringsAsFactors = FALSE)
  chosenState<-state
  colnames(filter) <- c("Hospital", "State", "heart attack", "heart failure", "pneumonia")
  if(!chosenState %in% filter[["State"]]){
    stop('invalid state')
  } 

The above code is the initial code. It is converting state values to numeric hence values are like 1,2,3... Now if I write colClasses="character" in read.csv then this conversion stops and I get column values as characters. Why is that so? Final code below-->

best<-function(state,outcome){
  data <- read.csv("outcome-of-care-measures.csv",colClasses = "character")
  filter<-data.frame(cbind(data[, 2],   # hospital
                              data[, 7],   # state
                              data[, 11],  # heart attack
                              data[, 17],  # heart failure
                              data[, 23]), # pneumonia
                              stringsAsFactors = FALSE)
  chosenState<-state
  colnames(filter) <- c("Hospital", "State", "heart attack", "heart failure", "pneumonia")
  if(!chosenState %in% filter[["State"]]){
    stop('invalid state')
  } 
1
If you don't specify colClasses="character" your states are being read in as factors during read.csv (that is unless you are using >R 4.0 where this behavior changed.) Then when you do all the cbind stuff on column vectors you are creating a matrix which makes everything numeric. There's no a good reason to cbind() before passing to data.frame. That step is the one that's triggering the data conversion. Not sure where that practice comes from but a lot of people seem to do it and it causes a lot of problems.MrFlick
@MrFlick -- Here's another homework assignment, JHU R Programming assignment 3.Len Greski
@LenGreski I usually assume that anytime I see hospital data. But do you know if that class is the source of the bad recommendation to use data.frame(cbind(...))? I've never tried to track down the course materials. Also don't they have their own chat boards? I thought they used to discourage using SO for homework.MrFlick
@MrFlick - no, Roger Peng at Johns Hopkins isn't training students to use data.frame(cbind(...)). I have all the course materials because have served as a Community Mentor for the curriculum. The guidance from JHU regarding SO is a bit ambivalent. On one hand, the Coursera Honor Code states that students must submit their own work. On the other hand, the professors (Roger Peng, Brian Caffo, and Jeff Leek) are big on the "hacker mentality", so much so that it frustrates the students that I had to write a blog article to address the topic.Len Greski
@MrFlick - Since you mentioned "tracking down the course materials..." For an overview of what's taught in the class, you can download R Programming for Data Science for free from leanpub.org. The book includes URLs to videos that are used for the course. Slides for all courses except the capstone are available in the Data Science Specialization repository. Also, there are additional leanpub books associated with some of the other courses.Len Greski

1 Answers

1
votes

The instructions for the Johns Hopkins University R Programming course Assignment 3 provide the code for reading the Hospital Outcome of Care data with the argument colClasses = "character".

enter image description here

A key part of the assignment is passing character based arguments into the three functions that are required for the assignment. If the data is read without colClasses = "character", character strings will be converted to factors, making it very difficult to use with functions that require character based arguments.

The test cases for the first function, best(), are as follows:

best("TX","heart attack")

best("MD","heart attack")

best("MD","pneumonia")

best("BB","heart attack")

best("NY","pneumonia")

Once the data is loaded into a dataframe within R, one can eliminate the columns that are unnecessary, and convert the numeric fields that are required for the assignment into numbers with as.numeric().

Why can't we leave the outcomes as character variables?

The assignment relies heavily on sorting the data based on outcome. If the outcomes aren't converted to numeric, values like 14.3 will sort ahead of 2.3, producing inaccurate results from the functions to be built.

A reproducible answer

To make the answer fully reproducible we can download the hospital data, subset it to the required columns, and convert numeric columns to numeric.

if(!file.exists("./data/outcome-of-care-measures.zip")){
     if(!dir.exists("./data")) dir.create("./data")
     url <- "https://d396qusza40orc.cloudfront.net/rprog%2Fdata%2FProgAssignment3-data.zip"
     download.file(url,destfile='./data/outcome-of-care-measures.zip',mode="wb")
     unzip(zipfile = "./data/outcome-of-care-measures.zip",exdir="./data")    
}

# read data & keep only necessary columns 
theData <- read.csv("./data/outcome-of-care-measures.csv",colClasses = "character",
                        na.strings="Not Available")[,c(2,7,11,17,23)]
colnames(theData) <- c("hospital","state","heart attack","heart failure","pneumonia")
theData[3:5]<- lapply(3:5,function(i) as.numeric(theData[,i]))

head(theData) 

...and the output:

> head(theData)
                          hospital state heart attack heart failure pneumonia
1 SOUTHEAST ALABAMA MEDICAL CENTER    AL         14.3          11.4      10.9
2    MARSHALL MEDICAL CENTER SOUTH    AL         18.5          15.2      13.9
3   ELIZA COFFEE MEMORIAL HOSPITAL    AL         18.1          11.3      13.4
4         MIZELL MEMORIAL HOSPITAL    AL           NA          13.6      14.9
5      CRENSHAW COMMUNITY HOSPITAL    AL           NA          13.8      15.8
6    MARSHALL MEDICAL CENTER NORTH    AL           NA          12.5       8.7
>