Analyses of the ratings - Spotting the issues

SwaggyOP · August 19, 2020, 5:00pm

It is speculated that they use the maximum of ratings instead of the average of the ratings to determine the ELO delta, which would probably be 5 lines of code to change, but still has been there for 6 months and despite being a massive issue

SouBotsito · August 19, 2020, 9:31pm

To make things better my friends who stack badly in team games and are 2900, they were 1850 voobly, they just did the most clownish thing, they just played few DM games and given the point distribution, they are now sitting at DM top ladder with 2400, without even being good at DM, that is how bad the current system and how easy to abuse it.

Also there are atleast 50 guys in top 100 1x1 that are not even 2k, they just created a nick started played team games, got a really high elo in teamgames and then played 10 1x1, even losing 9 of them they are still 2.1k sitting in the top, and yet there are guys saying TG ladder with 1k inflation harms no one.

breeminator · August 20, 2020, 1:47am

I doubt this is a straightforward bug, I suspect it is an intentional attempt to reflect how the strongest players dominate a team game.

Consider that Hera can win a 1v3 against 3 players with ~1.2k 1v1 ELO. Ignoring the difference between TG and 1v1 ELOs, think about how a 3v3 game would work if you had one team that has Hera plus 2 players who have 0 ELO and do nothing all game every game, and the other team has 3 x 1.2k players. Hera’s team has an average ELO around 800, yet is a stronger team than the team with 3 x 1.2k players, because Hera would single handedly beat all 3 players. I suspect using the highest ELO is a deliberate attempt to reflect this, and the reason it hasn’t been changed may be because they are well aware of the problems it would cause if the average were used instead. That’s not to say that the current system is satisfactory, just that what to change it to may not be obvious.

FinalBucket3743 · August 20, 2020, 10:57am

Another attempt to explain this as not being a bug, but maybe a feature: premade teams can talk to each other, so if one team has a much higher elo player telling his low elo friends in TeamSpeak/Discord what to do, they will play way better than without that high elo friend couching. Therefore using the average elo of these premades would cause a disadvantage for a random team facing that premade team. (still is argument is not really strong, because we do not know if the team with very different elo is premade, but with premades it is at least possible to achieve high elo differences)

I also believe that premades should only play against other premades, and not against randoms, because communication is way easier for premades, and with the whole censoring in chat it even gets worse… but thats another story that does not belong here.

breeminator · August 20, 2020, 2:04pm

Thinking about it some more, suppose we have m teams of n players.

Define Prior Team Strength to be a function of the Prior ELOs in each team:
PrTSi = f(PrELOij) i=1…m, j=1…n

Then when the result is known we need a function to relate Posterior ELOs to Prior ELOs, the result, and the team strengths:

PoELOij = f(PrELOij, game result, TS1, TS2, …, TSm)

With the requirement that

Sum(PrELOij) = Sum(PoELOij) i=1…m, j=1…n

It would be possible to satisfy this requirement with either the max or the average for the first function (or other functions can work, too, e.g. a function that gives more weight to higher rated players while not going so far as to only use the highest rating).

The second function is the one that will cause ELO drift if it isn’t defined to satisfy the requirement to hold the ELO sums constant. While it’s more than a few seconds of work, it shouldn’t be catastrophically hard to define suitable functions. Has anyone reverse engineered a full definition of the second function as currently implemented?

SwaggyOP · August 20, 2020, 7:46pm

So you’re right that writing a team Elo system from scratch is non-trivial, however I think you’re overestimating the devs.A few months ago there was a problem where sometimes they were taking the average of rating of team A and comparing it to sum of ratings to team B (which lead to games like 3x 800 players team A vs 3x 2400 players team B). I’m guessing what happens here is a similar dumb bug.

Besides the team Elo used to work in HD and Voobly, and thousands of other multiplayer games, so they probably just took the code from somewhere else (HD?).

My guess is that’s what they do currently:

Take the highest individual ratings of both teams: Ra and Rb
For each player i , with rating Ri, compute the single-player Elo delta for a match between a player of rating Ri and a player of rating Rb (or Ra if i is in team B)
Apply the delta to player i

Which is obviously flawed and explains the inflation. Taking the average instead of the max would by no mean be perfect but it would be far better and at least reduce the inflation significantly.

Regarding Hera / 1v3 discussions : completely agree, that’s a flaw of the system, but there would always be this problem no matter which formula we choose, because classical Elo is designed for 1v1 and is not additive. Hera having 2k Elo does not mean he should beat 2 players that are 1k. It just means that he will beat one 1k guy with 100% chance. To bypass this problem you would need to redesign Elo for N vs N players, which I’m pretty sure nobody does because as long as the Elos do not differ too much between players it should be a fair approximation to just compute the average of ELOs and apply 1v1 formulas (ideally with a correction to avoid inflation).

breeminator · August 21, 2020, 11:54pm

Did a bit of algebra on this. I know this isn’t exactly how the updating works, but I think it’s sufficient to show the issues.

Scenario - 2v2
team 1 players a, b, winners, a higher rated than b
team 2 players c, d, losers, c higher rated than d

Scheme 1: update each player’s ELO using the delta between their ELO and the other team’s average ELO:

a new elo = a + k * (0.5 * (c + d) - a)
b new elo = b + k * (0.5 * (c + d) - b)
c new elo = c - k * (c - 0.5 * (a + b))
d new elo = d - k * (d - 0.5 * (a + b))

sum of new ELOs =
a + k * (0.5 * (c + d) - a) + b + k * (0.5 * (c + d) - b) + c - k * (c - 0.5 * (a + b)) + d - k * (d - 0.5 * (a + b))
= a + b + c + d = unchanged from before

Scheme 2: update each player’s ELO using the delta between their ELO and the other team’s highest ELO:

a new elo = a + k * (c - a)
b new elo = b + k * (c - b)
c new elo = c - k * (c - a)
d new elo = d - k * (d - a)

sum of new ELOs =
a + b + c + d + kc - kb + ka - kd = changed from before
Further, as a is higher rated than b, and c is higher rated than d, the change can be expressed as
k(a-b) + k(c-d) and we can see that this will always be positive.

Using the highest rating can work if combined with a different scheme, e.g.

Scheme 3: update each player’s ELO using the delta between their team’s highest ELO and the other team’s highest ELO:

a new elo = a + k * (c - a)
b new elo = b + k * (c - a)
c new elo = c - k * (c - a)
d new elo = d - k * (c - a)

This obviously leaves the sum of the ELOs unchanged. A similar scheme to this is used in Forza Horizon 4, for example, where all team members gain or lose the same rating as each other.

So it isn’t just the fact it uses the highest rating that is the issue, as using the highest rating can be made to work with an appropriate method of updating the ELOs. But with the current method of updating the ELOs, using the average would work, whereas using the highest rating causes inflation. Each of these two schemes that avoids inflation has its own pros and cons.

SwaggyOP · August 22, 2020, 3:51pm

Nice analysis. But the problem with your scheme 3 is that by only taking one player’s rating you don’t weight the team’s strength correctly. For example take team A = (1800 , 1800, 1800 ) vs team B = ( 2000 , 1700, 1700 ) . The team balance should be rather fair here, yet with your system you’re penalizing team B a lot more than A.

I still think just taking the average should be close to optimal (that is: scheme 1, but applying the same delta to every player as you suggested in scheme 3). The main thing that is missing in the resulting system is that we can not use it to predict probability of winning, which is the main advantage of the Elo system in 1v1. But then, finding the perfect metric is more of a game sense problem than a mathematical problem. Let’s take team A and team B as in my example, who should be more likely to win ? I genuinely don’t know, but I think the team with 2000 has slightly more chance to win. So ideally we want to figure out how we can add player’s skill in this game and find a weighted average formula that would reflect it.

breeminator · August 22, 2020, 4:56pm

It was only intended to show that using the highest ELO as the team strength doesn’t in itself cause ELO inflation. Scheme 3 will avoid ELO inflation with ANY measure of team strength.

Scheme 3b: update each player’s ELO using the delta between their team’s strength (call this A) and the other team’s strength (call this B):

a new elo = a + k * (B - A)
b new elo = b + k * (B - A)
c new elo = c - k * (B - A)
d new elo = d - k * (B - A)

This avoids changing the sum of the ELOs no matter how the team strength is derived from the individual ELOs.

TougherTrack508 · August 22, 2020, 5:02pm

To prevent the kind of ELO drift which is being discussed here, you just need to decide on a ProbabilityOfWinning, and apply it systematically.

In 1v1s, the ELO change formula is something like new_elo = old_elo + 32*(HasWon? - ProbabilityOfWinning), where HasWon? is 1 if you win and 0 if you lose.

The default ProbabilityOfWinning formula is
ProbabilityOfWinning = 1 / (1 + 10**(eloDifference / 400))

One way to weigh the higher elos more would be to use a exponential average, eg with
teamElo = 400*log((exp(elo1/400) + exp(elo2)/400 + exp(elo3)/400)/3)
eloDifference = teamElo1 - teamElo2

As long as the ProbabilityOfWinning is symmetric (ie team1’s ProbabilityOfWinning = 1 - team2’s ProbabilityOfWinning) the sum of the Elo’s won’t change.

breeminator · August 22, 2020, 6:25pm

Yes, so that changes the ELOs of all members of the same team by the same amount by applying a single ELO difference to all players. At the moment, it’s applying a different ELO difference to each player. It could continue to do this if it uses the average of each team’s ELO, as shown by my scheme 1 example. Or it can change to making all members of a team have the same ELO change, with the opposite change for the other team, and then it can use a different weighting such as your example weighting. Conceptually, the latter makes more sense because it doesn’t make sense for each player on the same team to be regarded as having a different probability of winning to their teammates.

SwaggyOP · August 22, 2020, 6:45pm

We’re circling around. I think we both understand that taking the max is unsuitable anyway and is not something a developer should use in rating calculations…

WoodsierCorn696 · September 9, 2020, 9:34pm

So, will TG ratings be fixed in the next patch? I think the answer will be no. Still no reply from the devs.

SouBotsito · September 10, 2020, 1:33am

They are waiting for 4k players easy

Geojak92 · September 10, 2020, 11:21am

Maybe a elo system that isn’t purely based on skill but the more you play the higher you get is wanted by the devs? Might encourage some kind of people to play even more which might be good for their statistics. (same reason why we have events I guess)

Not that I agree with it

Kinquest · September 10, 2020, 11:31am

I think it’s an unintended side-effect of how the team ratings work at this moment. It is not what an ELO rating system is supposed to do. Playing more games should not automatically improve your rating.

It is not uncommon these days to see mediocre players with a 1000-1200 1v1 rating and a 2000-2400 team rating, simply because they play a lot of team games. A 50% win rate will automatically inflate your rating in the TG ladder after enough games played.

I don’t consider myself an expert, but my team rating is 2400, which would be ridiculous if it was a 1v1 rating.

WoodsierCorn696 · September 25, 2020, 7:11pm

Still no response from the devs.

Nheltarion · September 26, 2020, 10:36am

They don’t owe you one man. The devs can’t reply to every single thread in the forum just because some users want to

WoodsierCorn696 · September 27, 2020, 1:45pm

Something is broken for months, but devs can just ignore that? That seems like your reasoning.

SouBotsito · September 28, 2020, 6:40pm

They are maybe waiting for 4k tg ranks to do a reset.