I build and maintain a separate running list of ratings of every player after every event. That way you can refer to it for calculation in the next event.
First, loading all data , parameters and packages.
library(tidyr)
library(dplyr)
crosstab <- read.table(header=T,
text=" a1 a2 a3 a4 a5 h1 h2 h3 h4 h5 a.evt.score h.evt.score
3311 4003 2737 3784 4177 2632 726 633 438 5444 0 1
1696 371 4471 2119 274 1947 5745 3622 438 5444 1 0
1696 371 4471 1199 2230 1947 5745 3622 5034 4166 1 0
3191 4471 2737 274 2230 3598 633 5034 5444 3485 1 0
3191 3685 3486 3784 4177 2632 726 633 438 5444 0 1
127 713 1609 5444 4166 3311 371 4471 1199 2230 1 0
127 713 1609 2345 3485 1696 4003 2737 1199 2230 1 0
127 713 1609 2345 3485 1696 4003 2737 1199 2230 1 0
1947 5745 3622 438 5444 3311 371 4471 3784 4177 1 0
2632 726 633 5444 4166 3191 3685 3486 274 2230 0 1
2632 726 633 438 5444 3191 3685 3486 3784 4177 0 1
5745 3598 5198 4166 3485 1696 4003 2737 274 2230 0 1
2632 726 633 2345 5034 3311 371 4471 3784 4177 1 0
127 3859 726 438 5444 1696 4003 2737 2119 274 1 0
2632 713 633 5034 4166 3191 3685 3486 3784 4177 1 0")
#parameters
k <- 30
seed.rating <- 2200 # default used if a player is not found on ratings table
Next, two local helper functions to do the expectation calculations.
# calculate expected win against an opponent
calcExpect <- function(rating, opp.rating) {
return(1/(1+10^((opp.rating-rating)/400)))
}
# calculate average expectation of a player against all opponents in current event
compileExpect <- function(id) {
rowno <- which(roster$playerid==id)
opp <- roster %>% filter(ah!=roster$ah[rowno])
all.expected <- sapply(opp$rating,
function(x) calcExpect(roster$rating[rowno], x))
return(mean(all.expected))
}
Then setup the list that is updated after every event (i.e. ratings list, and optionally result after every event). Here we start with an empty ratings list, but if you have an existing rating list, you can easily start with that data frame as the first element in the list.
# start with a blank rating list; can always start with the latest ELO table
ratings <- list(data.frame(playerid=integer(0), rating=numeric(0)))
# optional for logging result for every round, for error checking
rosters <- NULL
Now the main meat: loop through the entire events data i.e. crosstab and processing each event, creating one entry in ratings (and optionally rosters) after every event.
You will notice that after I built the roster, I don't have different code lines to calculate ratings or expectation for players on "a" or "h" teams. This should make this code easier to adapt for events where there are more than 2 teams (e.g. a league).
for (i in seq_len(nrow(crosstab))) {
# get latest ratings
elo <- as.data.frame(tail(ratings, 1))
# take one row of data corresponding to an event
event <- crosstab[i, ]
# spread the row into a player roster
roster <- event %>% gather(key=no, value=playerid, a1:h5) %>%
mutate(ah = substr(no, 1, 1), # away or home team
score = ifelse(ah=="a", a.evt.score, h.evt.score)) %>% #win or lose
select(playerid, ah, score) %>%
left_join(elo) # get current rating
# unrated players assigned base rating
roster$rating[is.na(roster$rating)] <- seed.rating
# calculate expected and new ratings of event participants
roster$expected <- sapply(roster$playerid, compileExpect)
roster$new.rating <- with(roster, rating + k*(score-expected))
# calculate new overall ratings
new.ratings <- roster %>% select(playerid, new.rating) %>%
rename(rating=new.rating) %>%
rbind(elo) %>%
filter(!duplicated(playerid)) # remove old ratings of player
#update ratings
ratings <- c(ratings, list(new.ratings))
# Optional for error checking: update log of result every round
rosters <- c(rosters, list(roster))
}
The output would be a list ratings with 16 elements, and rosters with 15 elements. Element x in ratings is the ratings before event number x, while element x in rosters is the outcome after event number x.
Let's take for example event 2 (i.e. second row in your table).
> rosters[[2]]
playerid ah score rating expected new.rating
1 1696 a 1 2200 0.4913707 2215.259
2 371 a 1 2200 0.4913707 2215.259
3 4471 a 1 2200 0.4913707 2215.259
4 2119 a 1 2200 0.4913707 2215.259
5 274 a 1 2200 0.4913707 2215.259
6 1947 h 0 2200 0.5000000 2185.000
7 5745 h 0 2200 0.5000000 2185.000
8 3622 h 0 2200 0.5000000 2185.000
9 438 h 0 2215 0.5215733 2199.353
10 5444 h 0 2215 0.5215733 2199.353
Initial inspection seems that everything is in order: 8 players who didn't play earlier have starting rating of 2200, two players who were on the winning team earlier have rating > 2200. Expectation for new players in team "h" is 0.5, because they have same ratings as all players in team "a" (who are all new).
Ratings after event 2 would be the ratings before event 3 (which include players from both event 1 and event 2):
> ratings[[3]]
playerid rating
1 438 2199.353
2 1947 2185.000
3 2632 2215.000
4 2119 2215.259
5 3622 2185.000
6 3311 2185.000
7 4003 2185.000
8 726 2215.000
9 5444 2215.000
10 1696 2215.259
11 371 2215.259
12 274 2215.259
13 3784 2185.000
14 4471 2215.259
15 4177 2185.000
16 5745 2185.000
17 633 2215.000
18 2737 2185.000
At the end of it all, there are 33 rated players in ratings[[16]], which should match the total number of unique player numbers in your table.
EDIT: I missed out that the desired output is vector of player rating history (thanks @Tensibai for pointing that out). To do that, I create a helper function to pull out any players history by his id.
getPlayerHistory <- function(id) {
# pull all ratings of the player
temp <- lapply(ratings, function(x) x$rating[x$playerid==id])
# coerce into vector with same length as the list, forcing parts with no values into NA
vec <- do.call(c, lapply(temp, function(x) {length(x) <- 1; return(x)}))
return(vec)
}
You can call up directly e.g.
getPlayerHistory("5034")
[1] NA NA NA 2185.395 2171.403 2171.403 2171.403 2171.403 2171.403
[10] 2171.403 2171.403 2171.403 2171.403 2186.862 2186.862 2202.293
Note that there are 16 values in this vector, because they are ratings before an event. So the first NA is because there was no starting rating, the next two NAs are because player "5034" played for the first time in event 3, so the first rating available is before event 4. When the player didn't play in an event, his rating stays the same.
You can use the helper function to pull the entire rating history into a list.
idList <- tail(ratings, 1)[[1]]$playerid # get the latest ratings list
ratList <- lapply(idList, getPlayerHistory)
names(ratList) <- idList
Then you can get the same by calling the list.
> ratList[["5034"]]
[1] NA NA NA 2185.395 2171.403 2171.403 2171.403 2171.403 2171.403
[10] 2171.403 2171.403 2171.403 2171.403 2186.862 2186.862 2202.293
a.evt.scoreandh.evt.scoreas1. How do I interpret that? - RickyR_oldfor all players? Or do we just assume everyone start at 0 rating (in which case you'll see everyone in the team having the same ratings after round 1)? It would help if you provide a sample initialR_oldvector giving vectors for all unique ids in the table. - Ricky