1
votes

I have a large db of courses with the grade and student id. It looks something like this (with many more variables that were removed to simplify):

studentID    course       grade
--------------------------------
1            chemistry    86
2            chemistry    85
2            math         72
3            english      52
3            math         90
...

I need to convert that large file into a file where each student has their row with the grades for all the different courses. Something more like this:

studentID    chemistry    math    english
----------------------------------------
1            86           NA      NA
2            85           72      NA
3            NA           90      52

Here is the code to create my sampler db:

course.db <- data.table(
              studentID=c("1", "2", "2", "3", "3"),
              course=c("chemistry", "chemistry", "math", "english", "math"),
              grade=c(86, 85, 72, 52, 90)
           )

What I usually do is I create a student file db with the usual info (GPA, school, etc) like this:

student.files <- course.db[, .(
    average=mean(grade, na.rm=T) #more vars are created here
), by="studentID"]

Then I create another table with the grade I need:

math.grades <- course.db[course=="math", .(
    math=grade
), by="studentID"]

Then I merge the whole thing. That works when there are only a few courses to get the grade from. But I need to compile grades from at least a dozen course. So my question is: how to I conditionally assing the grade depending on the value of the "grade" column? What I am looking for:

#careful: not working code
student.files <- course.db[, .(
    average = mean(grade, na.rm=T) #more vars are created here,
    math = ThenAMiracleOccurs("math", grade),
    english = ThenAMiracleOccurs("english", grade),
    chemistry = ThenAMiracleOccurs("chemistry", grade),
), by="studentID"]
1

1 Answers

3
votes

Thank you Hubert. I had not noticed there was a data.table version (my db is quite large so I need to remain in the data.table world whenever possible). Here's a working solution using dcast for data.table:

dcast.data.table(course.db, studentID~course, value.var="grade" )

Noted from comment: a simple dcast will also work and use the data.table method as long as the table is already a data.table object:

dcast(course.db, studentID~course, value.var="grade" )

Result:

   studentID chemistry english math
1:         1        86      NA   NA
2:         2        85      NA   72
3:         3        NA      52   90