### The recency effect in Elo ratings

Posted by Elliot Noma on February 13, 2015 · Leave a Comment

One of the defining characteristics of Elo ratings is their emphasis on recent performance.

To illustrate this, I simulated the performance of two teams that started the season with 1400 ratings. Over seven games, one team wins 5 contests and loses twice. I then created the 21 possible orderings of these wins and losses for the winning team and computed the terminal rating for each ordering using a K-factor of 32 (for an explanation of K, see my previous post). The histogram below shows that in all cases, the winning team’s rating improved from 1400, but the outcome was depended on sequence of wins and losses.

The next histogram converts these ratings (combined with those of their opponent who wins the other two games) into probabilities of a win in the next game.

All probabilities are above 50%, but even the highest predicted probability is less than the actual winning percentage of 5/7 = .714. To reach this level of superiority the stronger team would need to obtain a 1479.588 rating versus a 1320.412 rating for the weaker team. More than seven games are needed for the ratings to approach the actual performance. Alternatively, K values above 32 will move the ratings up quicker for a 5 win 2 loss record as shown in the graph below:

For K=32, the 21 sequences of 5 wins and 2 losses below show how recent wins elevate the rating and win probability, while a winning streak followed by two losses produces the lowest rating and win probability.

rating | Win Prob | game 1 | game 2 | game 3 | game 4 | game 5 | game 6 | game 7 |
---|---|---|---|---|---|---|---|---|

1447.8 | 0.634 | L | L | W | W | W | W | W |

1445.8 | 0.629 | L | W | L | W | W | W | W |

1443.9 | 0.624 | W | L | L | W | W | W | W |

1443.6 | 0.623 | L | W | W | L | W | W | W |

1441.7 | 0.618 | W | L | W | L | W | W | W |

1441.1 | 0.616 | L | W | W | W | L | W | W |

1439.7 | 0.612 | W | W | L | L | W | W | W |

1439.3 | 0.611 | W | L | W | W | L | W | W |

1438.5 | 0.609 | L | W | W | W | W | L | W |

1437.3 | 0.606 | W | W | L | W | L | W | W |

1436.7 | 0.604 | W | L | W | W | W | L | W |

1435.8 | 0.601 | L | W | W | W | W | W | L |

1435.2 | 0.6 | W | W | W | L | L | W | W |

1434.7 | 0.599 | W | W | L | W | W | L | W |

1433.9 | 0.596 | W | L | W | W | W | W | L |

1432.5 | 0.593 | W | W | W | L | W | L | W |

1431.9 | 0.591 | W | W | L | W | W | W | L |

1430.2 | 0.586 | W | W | W | W | L | L | W |

1429.7 | 0.585 | W | W | W | L | W | W | L |

1427.4 | 0.578 | W | W | W | W | L | W | L |

1424.9 | 0.571 | W | W | W | W | W | L | L |

The sequence of wins changes the final rating and different values of K create different orderings. In the seven game series in which one team wins five games, the highest rating is always achieved when losing the first two games and winning the last five games. The lowest rating for any K value is when the team loses the last two games. The intermediate ratings, however, can change from one value of K to another as shown in the following chart. As K increases from 1 to 101, there are seven switches in rating order that occur. The first change in order is between K=43 and K=44 when win sequences 1,2,3,6,7 and 2,3,4,5,6 swap positions.

Filed under elo, Ratings, software development