Analyses of the ratings - Spotting the issues

Like everyone has said before me, the total amount of rating gained by one team should be opposite of the total amount of rating lost by the other team. Else, rating will quickly inflate or deflate, like what we are seeing now. Furthermore, I agree that everyone in a team should win/lose the same number of points. I think @Haladon demonstrated very well what is wrong with the current logic of the rating system.

Theoretically, this could all be solved in the following way: Both teams should be assigned some number representing the skill of their team. Then the change in rating for each team is calculated by putting these numbers in the usual Elo formula that is used for 1v1. This is how it was done on HD and Voobly team ratings.

This leaves us with the following remaining question, though: what is a good way to assign a number to a team representing the total teams skill?

Currently, the logic would be to take the team Elo rating of the highest rated player in the team. I think we can all agree that this is not ideal. Surely a team of a 1600 Elo player and a 400 Elo player is not stronger than a team of two 1500 Elo players? In HD, the average Elo rating of the team was taken. However, I don’t think this is ideal either, for higher rated players tend to have a higher influence on the game. Let me give an example. Suppose a 2500 Elo player can 1v2 two 1500 Elo players. Then a team of a 2500 Elo player and a 400 Elo player would be able to beat a team of two 1500 Elo players, too. Yet, the team of two 1500 Elo players would have a higher average rating. Clearly, the average is not a good measure to take.

But what is? To find out, let us take another look at the concept of 1v2’ing. In the following analysis, the following question will be important: Suppose player A plays a 1v2 versus players B and C. B and C have the same rating. How much higher does the rating of player A have to be in order to have a 50% probability of beating a team of B and C? I don’t know the answer to this question. Let us call this rating difference ∆ and worry later about the exact value of ∆; so, if the rating of player A is ∆ higher than B and C, then A has a 50% chance of beating B and C. This means that if someones rating is ∆ higher than yours, that person can be said to be ‘twice as strong’ as you. But then someone whos rating is in turn ∆ higher than that person is twice as strong as that person. Thus, when someones rating is 2 ∆ higher than yours, that person is 2 x 2 = 4 times as strong as you; and when someones rating is 3 ∆ higher than yours, that person is 8 times stronger than you. What this means is that under this way of thinking, ‘strength’ or ‘skill’ scales exponentially with Elo rating, and equivalently, Elo rating is just the logarithm of skill. In fact, we can precisely say that

‘Skill’ = 2^(Elo rating / ∆)

since then if someones Elo rating is ∆ higher then yours, then indeed his skill is twice your skill, which fits our definition of ∆.

How does this fit into Team Elo calculation? Well, I suggest to take the average of the ‘skill’, not of the Elo, to calculate the strength of a team. Basically, I am suggesting we should take the average of the Elo, but on an exponential scale instead of a linear scale. This comes down to the following. Suppose four players are on a team. Their ratings are x_1, x_2, x_3 and x_4. Then I suggest we calculate their ‘Elo’ as a team as follows (where the log has a base of 2):

‘Team Elo’ = ∆ log(AVERAGE [ 2^(x_1 / ∆), 2^(x_2 / ∆), 2^(x_3 / ∆), 2^(x_4 / ∆) ] )

We would only need to find a reasonable choice for ∆. Probably just ask a top player at what Elo difference he estimates his chances to be 50% to beat a team of 2 players. A complicating factor could be that the value of ∆ may differ among different skill levels. Still, I think this rating would be an improvement.

As an example, suppose arbitrarily that ∆ = 600. Suppose two teams are facing each other. Team 1 has three 1500 Elo players and team 2 has one 1700 Elo plater and two 1400 Elo players. Then the ‘team elo’ of team 1 would equal 1500 (obviously) and of team 2 would equal:

600 log(1/3 [ 2^(1700 / 600) + 2^(1400 / 600) + 2^(1400 / 600)] )
= 600 log(1/3 [2.0885 + 1.834 + 1.834] )
= 600 log(5.7355) = 600 x 2.5199 = 1512

so the team with the 1700 Elo player is considered to be slightly stronger than the other team, but not by much.

%%%%%%%%%%%%%%%%%%

Unrelated to this I saw there was a discussion about using a different rating for different map types. I think that would be helpful. I know my own skill differs quite a bit on different maps, and this can lead to one-sided games. If I don’t ban any map and don’t select a preferred map, I can predict my rating will go down, as two thirds of my games will be against Arabia specialists on Arabia and one third of my games will be against Arena specialists on Arena. Similarly, if I ban both Arabia and Arena and select a non-standard preferred map like Four Likes, I can predict my rating will go up.

I think it is doable to use different ratings for different map types. In fact, I think it should be doable to give every player a rating for every individual map in the game. If you win on a map, not only would your rating on that map go up, but on similar maps too. As an example, a win on Islands would increase your Islands rating, and also your Team Islands rating, but only a little bit your Arena rating, since Team Islands is much more similar to Islands than Arena is. In order to determine how ‘similar’ different maps are, Microsoft would need a basic machine learning algorithm and feed it the results of all games played. The algorithm would learn that if it evaluates Islands and Team Islands as decently similar maps, it gets better in estimating the outcome of games (it will e.g. correctly predict someone who is good at Islands winning on Team Islands too).

With sufficient technical knowledge and skill, it should be doable to make a specific rating for every specific map in this way.

3 Likes