Analyses of the ratings - Spotting the issues

I’m confused, because a while ago in this same thread it looks like you suggested exactly what the devs have done. Stating that it will stop elo inflation.

But your not happy with their solution, which was also your idea.

1 Like

A lot of people first thought this could be the solution.
You will probably also find a post of me saying that.

Only stupid people stick to their wrong opinions when they are corrected by others.

(ofc only if you were wrong you should admit it, not if you were right)

1 Like

YES, I am glad someone else looking into it! Someone else who seems eager to invest more time than me. :smile:

I guess an 800 and an 1200 should play against two 1035s.

I don’t think the relationship is linear at all. I do think that at higher ratings, just taking the average is more accurate than at lower ratings.

I would support if devs delegate the invstigation of this subject to @coolios9876 and offer my assistance in conceptional and analytical comprehension. I’m not statistician, but I have experience with interpreting and processing data (though I can’t use the usual Tools here, but I can explain what they do and probably even write code),

I think this topic is crucial for the future of AOE2:DE and solving the problem for other multiplayer games can be of high value aswell. Fact is. the Elo System isn’t designed for multiplayer. But it should still be viable if done right. But ofc it needs cautious analysis how. Which are set parameters and which are possibly variables that need to be adjusted to the game mutliplayer characteristics.

Just wanted to add some notes from other games which I thought we’re interesting:

From Rocket League

Party Skill and Matchmaking use a Weighted Average that blends the average skill of all players. The Weighted Average then skews that average to the highest-ranked player.

League of legends uses a team Elo system where each player is regarded as having the team score

The system was modified for team use, and basically, the concept is that a player gets a team Elo based on whoever is on the team, and if the player wins, it is assumed that everyone on the team was “better” than the guess

It knows pre-made teams are an advantage, so it gives pre-made teams tougher opponents than if each player had queued alone


Ok they don’t understand what happens, but they found a workaround somehow that is probably “close enough” to cause no further issues.

But that only works if people (like in LoL) are only matched with people of almost equal skill level.

1 Like

deleted - reposted below with hopefully a clearer description

Could you be please be a bit more specific what you actually made? Either you have to say in comprehensive words what are the results or have to specify which variable name stands for what and which method name stands for what.
This isn’t comprehensible to me.

(And I’m usually kinda good in understanding this stuff…)

Sure let me go back over it 1 sec

@Mercy9545 ,

I spent this morning playing about with this, the results so far are surprising me though needless to say I can’t rule out coding / method errors on my end.

I approached this by fitting a logistic regression model to the data which aims to predict the match result based based upon the difference in team elo (elo_diff_avg) and the difference in elo standard deviation between teams (elo_diff_sd).

That is:

  • elo_diff_avg = mean(team 1 Elos) - mean(team 2 Elos)
  • elo_diff_sd = sd(team 1 Elos) - sd(team 2 Elos)

Click here to see the full formula of the model I am fitting
eta = mu + Beta_1 * elo_diff_avg + Beta_2 * elo_diff_sd
p(team 1 winning) =  ( 1 + exp(-eta)) ^-1

The initial fit of this model produces the following results (term here represents the corresponding Beta coefficient for that variable):

  term           estimate std.error statistic  p.value  
  (Intercept)    0.00423  0.0101        0.417 6.76e- 1  
  elo_diff_avg   0.00613  0.0000996    61.6   0         
  elo_diff_sd    0.000608 0.0000649     9.36  8.08e-21  

The key bit here is the p-value for elo_diff_sd is highly significant meaning that differences in the dispersion of team Elo is strongly predictive of the result (i.e. a team of 1200, 1000, 800 is more likely to beat a team of 1000, 1000, 1000 given all else being equal). In an ideal state the dispersion of team Elo (elo_diff_sd) should have no predictive power (that is a p-value as close to 1 as possible).

In theory we can achieve this by adjusting how we calculate the team average Elo to try and calculate more balanced teams, in particular we can see from the fact that the estimate for elo_diff_sd is positive that this implies teams with higher dispersions are more likely to beat teams with lower dispersions. This tells us that we should weight teams so that the “team_elo” is closer to that of the player with the highest Elo in the team.

Two alternative formulas that achieve this are one that was proposed by @Mercy9545 and one by myself which are respectively:

rating_mercy = function(rating) k * log(mean(2 ^ (rating / k)), base = 2)

rating_mine = function(rating) log(mean(k^rating / 400), base = k) * 400

I then repeated the above model fitting process using these formulas for calculating team Elo, that is I replaced elo_diff_avg with the following for Mercy9545’s and mine respectively. (I have a feeling these formulas are identical but am too lazy atm to do the maths).

elo_diff_avg = rating_mercy(team 1 Elos) - rating_mercy(team 2 Elos)

elo_diff_avg = rating_mine(team 1 Elos) - rating_mine(team 2 Elos)

I then used an optimiser to find the value for k that maximised the p-value of elo_diff_sd (i.e. it tries to make elo_diff_sd as insignificant as possible). That is it will keep trying different values for k, refitting the above model each time, until it can find a value of k that maximises the p-value. This resulted in a value of 1116 for Mercy’s formula and a value of 1.282 for my formula.

Applying these functions to some hypothetical teams results in the following updated team Elo scores:

 team Elos           avg_elo_mean   avg_elo_mine  avg_elo_mercy
1000, 1000, 1000         1000           1000          1000 
1200, 1000,  800         1000           1008          1008
2200, 2000, 1800         2000           2008          2008
1400, 1000,  600         1000           1033          1033
3000,  500,  400         1300           1785          1785

I also repeated this whole process for a <2200 Elo cut of the data and a >= 2200 cut of the data which gave consistent results (i.e. the k-values don’t substantially change if we focus on the uper or lower Elos).

Needless to say this is kind of use of statistics is as exploratory/informal as it gets and there are tons of biases and issues that I’ve just swept under the rug but it gives an interesting starting point !


Looks like using the mean in and of it self is not that bad of an estimate, though a mild logarithm formula appears to be beneficial of bumping up the team Elo of obvious smurfs / extreme lopsided teams.


@casusincorrabil , I hope that’s better; if it isn’t please let me know and I will try again

Ah ok… Well the calcs of mine and mercy yield the same result…
It’s the same operation just with different log bases.

Is the TG ladder already settled after the change? If yes, then what is the general feeling? Does it feels like an improvement?

I haven’t been able to play in the last weeks, so I don’t have any experience with the new system. But from a theoretical point of view it feels bad.

I have had a very bad experience in the last 2 weeks, if i win vs guys higher than me but just one low guy in there i get +12 points, but if i lose to guys equal than me but with one guy lower i end up losing +22 points, so even winning more often it doesn’t look reflected in the elo cause i am going lower and lots of guys are experiencing the same, specially solo players, none of the matches have felt balanced, cause it randomly gives you a 2.1k player and a 1100 player vs a full stack team of 1800 and i am still losing +20 point from those matches.

So in conclusion as it was expected the experience as solo player has gotten worse, part of the issues wasn’t really the inflation but the match making priority creating imbalanced matches, if i keep losing 20 points every match i will end up being reported as smurf cause of my tg low elo not reflecting my actual level.


I have the same issue with it. Not sure how is the elo calculation now. I lose to a TG which their rank are 1600 1500 1400 when my rate is just around 1400. I lose the point that which is same to losing a game with same rank. very weird


Premade with big elo gaps Vs solo with about all the same rating are probably the worst games for the new system. Based on the response that is indeed what happened. So the system is still bad and still in need for a fix.

1 Like