Analyses of the ratings - Spotting the issues

UprightAtol98 · September 28, 2020, 8:58pm

I am hoping they will fix it with the big October patch.
The last patches were more or less hotfixes…

The question is, how?

Will they adjust the ranks (like everyone loses 400elo or 25% of their rank or something similar) or will they start over with everyone at 1000elo again?
Or perhaps they will change the system but not the current Elo. Meaning in some time the TG matchmaking will be more balanced but the current inflation will stay forever…

FinalBucket3743 · October 1, 2020, 10:11am

The broken TG elo is just so annoying for TGs with random people. I have 1600 TG elo, with roundabout 200 games (yeah not great, but thats not the point here…). In the lobby you can see the statistics of your teammates,. If they have like 50-100 wins they will be way better than me (the reached the 1600 elo with only half of the games I needed), and if the have like 600 games its just pointless to count for any help from them whatsoever. So if I have like 2 or even 3 of these 600 games guys in my lobby I always fell like I should alt+F4 out of this, and almost always regret after the game I did not alt+F4…

Please fix the TG elo. I am sure this would make the game way more fun

sidg62 · October 1, 2020, 2:30pm

Personally in my opinion ELO and win percentage isn’t an accurate estimation of a players skill level in team games, I’ve seen and been in games where extremely competent players with high win rates and ELO’s just GG at the first instance of any kind of harassment, and also games where people with a sub optimal win% literally provide excellent support and play on till they’ve completely either been beaten down or win the game. It really depends on the mentality of the player and not ELO, whether they’re willing to fight on or give up instantly.

WoodsierCorn696 · October 10, 2020, 10:07pm

Please dont look at the distribution of DM TG elo please.

1000 elo DM TG is at the same percentage of 630 RM 1v1 elo. Instead of starting in the mid range and move up quickly, you just starts down at the bottom. That is how much inflated that ladder is. And that really shows us the problem quite good.

For RM TG, 1000 elo correspons with around 750 1v1 elo. It is not that terrible as for DM, but still terrible design…

Elo is meant to represent someones skill. But in this game, things are messed up. That is just the whole point of this thread.

In a good matching system, everyone will have around 50% win rate, so win rate indeed doesnt say much.

I also dont really know what your point of mentality meant to this thread. It is not really about quitting early or fighting back. So for me it is not really clear why you bring this points up. Could you clarify those points?

WoodsierCorn696 · October 18, 2020, 10:15am

I really hope the devs will fix this issue in the next patch.

For those who dont know: We have a new number #1 DM 1v1 player by a very big margin! The number #2 on the ladder is about 250 behind!

Someone played 10 DM 1v1 games. He won al. His rating is above 2.4k, while never played against someone with elo above 1.3k. Do he really deserves the #1 spot? He didnt face any top DM players, but he does get #1 spot on the ladder by a big margin.

This just shows how bad the current system is. I really hope the devs will quickly fix this. I even think the devs needs to using the elo of other rankings as starting point if you join a new ladder. Might be better if everyone just starts with 1000 elo if you start on a new ladder…

WoodsierCorn696 · November 2, 2020, 8:45pm

I really hope that the big november patch also fix the issues mentioned in this thread about elo… TG elo is already useless and 1v1 ratings will also be starting to get contaminated as well.

WoodsierCorn696 · November 18, 2020, 12:16pm

We can now add also unbalanced quick game matches to the Q&A for the devs. See for example:

Empire wars no fair rank?

Quick game is meant to be unranked. So it probably uses unranked elo. This elo is the worst elo of all. It isnt reiable at all. This was already my biggest fear when they announced this new game mode. It will only lead to unbalanced games.

Based on some games of aoe2.net i am not even sure if the system uses some elo at all. I found some games that are really terrible matched. I looked at some 1v1 games and i foudn this match up:

Example:
Player 1:
1v1 RM: 1500 elo
TG RM: 2200 elo
Unranked: 2300 elo

Player 2:
1v1 RM: 600 elo
TG RM: 1200 elo
Unranked: 1100 elo

I have no idea why those two players are playing a game together.

I see many of this kind of match ups i the current ongoing matches based on aoe2.net. For all of these games aoe2.net gives no current rating for each match.

So my biggest fear seems to become the truth: The new game mode is only meant to get in a new game quickly. It isnt meant to be get into a balanced game quickly. If you want balanced games, then you still need to play in the ranked queue. And if you dont want those settings, you still have to use the lobby to found equally skilled players.

Since it is also an issue with elo, i add this to this thread as well. I hope that the devs will give us some answers to all open questions at some point in the near feature. I also hope they will fix the ratings as well.

Since the latest patch was a big patch, i hoped they would also fix the ratings, so TGs will be become playable again. But i havent seen anything related to this in the patch notes. So i hope the ratings will get some love in the next patch.

WoodsierCorn696 · December 1, 2020, 10:59am

New part for discussion: I think the issues with TG elo are clear. Question is: Do we really want still the Elo System?

Elo is a pretty old system. Currently there are newer and better rating systems, like TrueSkill (Used by Microsoft already in Xbox Live) or Glicko-2 (someone named this system as superior to Elo).
Elo is meant for 1v1, not for TG. So it will be always much less accurate if it is used for TGs. Other systems are better suited for TGs.
Do we need different ratings for each map? More specific ratings will result in more balanced match up, if you played enough games on that ladder. So if you split up the ratings to much, they do become less accurate. I am wondering how users think about this one. I can think of different ratings for open and closed maps for example.
In team games being premade is a big benefit over being solo in the queue. How do we deal with this situation? Some games go as far as having a specific rating for every different team. Not sure if we want this, but at least it is someway to deal with premades vs solo.

Temudschinn · December 1, 2020, 11:11am

The less drastic approach would be the one LoL (among others) uses: You have a solo-teamgame rating, a premade rating and a full team rating. So if you play alone and win, your solorating increases, if you play with a friend your premade rating is affected and if you want you can open a team and have a rating for that team.

WoodsierCorn696 · December 1, 2020, 1:35pm

I just dropped some possible solution, knowing they arent THEY solution. I just hoped people react to these, so we can have a discussion about how a new rating system will look like. Having a rating for each premade team is also in my opinion a step to far. You idea sounds pretty reasonable to be.

Haladon · December 1, 2020, 3:24pm

I would like to bring into focus another glaring problem in the current TG rating system. At the end of a game the rating you gain is based on your rating and the Max in the opponents team.
Suppose we have a TG with with (2000, 1800, 1800) vs (2000, 1800, 1800). This is a balanced game. In the winning team, the 2000 rated player gains 7 rating. The 1800 rated players get 13 rating. In the losing team the 2000 rated player gets -7 rating and the two 1800 rated players get -3 rating. (These are what I have observed with 200 rating difference)
To summarize:
(2000, 1800, 1800) -> (2007, 1813, 1813) on win
(2000, 1800, 1800) -> (1993, 1797, 1797) on loss.

The whole point of considering the Max instead of average is there because the game result is dominated by these higher rated players. But still the lower rated players in the game are rewarded with more elo rating for win. In most other games that have proper TG MMR systems, the whole team is awarded the same ratings.

In fact if the above two teams had 2 games together with the expected 1 win and 1 loss. Then the two high rated players would stay at the same place and the the 4 low rated players would gain 10 rating each.

What is this? For those 4 players: “Congrats on playing a couple of games with better players. Hope you learnt something from them. Here is your free 10 elo btw .”

Kinquest · December 1, 2020, 3:38pm

That’s a good summary of the cause of the rating inflation. In an Elo rating system, the points gained by the winning party should be the same as the number of points lost by the losing party. Otherwise the average rating will shift over time.

In a team, that means the sum of the rating points gained by the winning team members should be the same as the sum of the points lost by the losing team. In that way, the average rating of all players stays the same with each new game.

Mercy9545 · December 2, 2020, 8:03pm

Like everyone has said before me, the total amount of rating gained by one team should be opposite of the total amount of rating lost by the other team. Else, rating will quickly inflate or deflate, like what we are seeing now. Furthermore, I agree that everyone in a team should win/lose the same number of points. I think @Haladon demonstrated very well what is wrong with the current logic of the rating system.

Theoretically, this could all be solved in the following way: Both teams should be assigned some number representing the skill of their team. Then the change in rating for each team is calculated by putting these numbers in the usual Elo formula that is used for 1v1. This is how it was done on HD and Voobly team ratings.

This leaves us with the following remaining question, though: what is a good way to assign a number to a team representing the total teams skill?

Currently, the logic would be to take the team Elo rating of the highest rated player in the team. I think we can all agree that this is not ideal. Surely a team of a 1600 Elo player and a 400 Elo player is not stronger than a team of two 1500 Elo players? In HD, the average Elo rating of the team was taken. However, I don’t think this is ideal either, for higher rated players tend to have a higher influence on the game. Let me give an example. Suppose a 2500 Elo player can 1v2 two 1500 Elo players. Then a team of a 2500 Elo player and a 400 Elo player would be able to beat a team of two 1500 Elo players, too. Yet, the team of two 1500 Elo players would have a higher average rating. Clearly, the average is not a good measure to take.

But what is? To find out, let us take another look at the concept of 1v2’ing. In the following analysis, the following question will be important: Suppose player A plays a 1v2 versus players B and C. B and C have the same rating. How much higher does the rating of player A have to be in order to have a 50% probability of beating a team of B and C? I don’t know the answer to this question. Let us call this rating difference ∆ and worry later about the exact value of ∆; so, if the rating of player A is ∆ higher than B and C, then A has a 50% chance of beating B and C. This means that if someones rating is ∆ higher than yours, that person can be said to be ‘twice as strong’ as you. But then someone whos rating is in turn ∆ higher than that person is twice as strong as that person. Thus, when someones rating is 2 ∆ higher than yours, that person is 2 x 2 = 4 times as strong as you; and when someones rating is 3 ∆ higher than yours, that person is 8 times stronger than you. What this means is that under this way of thinking, ‘strength’ or ‘skill’ scales exponentially with Elo rating, and equivalently, Elo rating is just the logarithm of skill. In fact, we can precisely say that

‘Skill’ = 2^(Elo rating / ∆)

since then if someones Elo rating is ∆ higher then yours, then indeed his skill is twice your skill, which fits our definition of ∆.

How does this fit into Team Elo calculation? Well, I suggest to take the average of the ‘skill’, not of the Elo, to calculate the strength of a team. Basically, I am suggesting we should take the average of the Elo, but on an exponential scale instead of a linear scale. This comes down to the following. Suppose four players are on a team. Their ratings are x_1, x_2, x_3 and x_4. Then I suggest we calculate their ‘Elo’ as a team as follows (where the log has a base of 2):

‘Team Elo’ = ∆ log(AVERAGE [ 2^(x_1 / ∆), 2^(x_2 / ∆), 2^(x_3 / ∆), 2^(x_4 / ∆) ] )

We would only need to find a reasonable choice for ∆. Probably just ask a top player at what Elo difference he estimates his chances to be 50% to beat a team of 2 players. A complicating factor could be that the value of ∆ may differ among different skill levels. Still, I think this rating would be an improvement.

As an example, suppose arbitrarily that ∆ = 600. Suppose two teams are facing each other. Team 1 has three 1500 Elo players and team 2 has one 1700 Elo plater and two 1400 Elo players. Then the ‘team elo’ of team 1 would equal 1500 (obviously) and of team 2 would equal:

600 log(1/3 [ 2^(1700 / 600) + 2^(1400 / 600) + 2^(1400 / 600)] )
= 600 log(1/3 [2.0885 + 1.834 + 1.834] )
= 600 log(5.7355) = 600 x 2.5199 = 1512

so the team with the 1700 Elo player is considered to be slightly stronger than the other team, but not by much.

%%%%%%%%%%%%%%%%%%

Unrelated to this I saw there was a discussion about using a different rating for different map types. I think that would be helpful. I know my own skill differs quite a bit on different maps, and this can lead to one-sided games. If I don’t ban any map and don’t select a preferred map, I can predict my rating will go down, as two thirds of my games will be against Arabia specialists on Arabia and one third of my games will be against Arena specialists on Arena. Similarly, if I ban both Arabia and Arena and select a non-standard preferred map like Four Likes, I can predict my rating will go up.

I think it is doable to use different ratings for different map types. In fact, I think it should be doable to give every player a rating for every individual map in the game. If you win on a map, not only would your rating on that map go up, but on similar maps too. As an example, a win on Islands would increase your Islands rating, and also your Team Islands rating, but only a little bit your Arena rating, since Team Islands is much more similar to Islands than Arena is. In order to determine how ‘similar’ different maps are, Microsoft would need a basic machine learning algorithm and feed it the results of all games played. The algorithm would learn that if it evaluates Islands and Team Islands as decently similar maps, it gets better in estimating the outcome of games (it will e.g. correctly predict someone who is good at Islands winning on Team Islands too).

With sufficient technical knowledge and skill, it should be doable to make a specific rating for every specific map in this way.

WoodsierCorn696 · December 3, 2020, 9:04pm

@Mercy9545

The easy fix is indeed just compare the elo of every player to the team average. That why we stop the inflation.

You next question is a good one: Is the average really a good measure? Arent the better players more likely to carry the game, so they rating is worth more. You come up with a good solution. This solution feels to me as already a different elo system, since it just some log elo. It seems to me like a good alternative for the devs to look into. This also mean that match making needs to be done based on this newly Team elo. Not sure if we really need that change, since the differnce between the current average team elo and you suggested team elo wont be really big, if everyone is around the same rating (which happens most of the time).

I am not sure if we really want elo for every single map. It might be more accurate, but it also get a lot more difficult for people to understand the ratings. Every open map will have a rating. So you have a rating for Arabia, Cenotes, Valley, … But also a different rating for just ‘Open maps’. Also when playing Arabia, also your rating for Cenotes, Valleys, … change. That seems to be too much complex for me. Why not use some specific groups:

Open maps
Closed maps
Hybrid maps
Water maps

I dont think you really need more than these. In the end you also have some rating for 1v1 and TG and RM and DM. This means that you will already end up with 16 ratings. If you also want an overall rating for each ladder, then you already end up with 20 different ratings.

In the end i think it wont matter to much for the quality of a match to be more specific in rating than just those 4. From a mathematical point of view i understand more stats are better and more specific. In the end it is also a costs and benifit analyses and i dont think the costs are worth the investment.

From a mathematical point of view i do understand why you just want more stats as more = better in machine learning.

SouBotsito · December 3, 2020, 11:27pm

800 points to go

Go Go 4k!!!

Mercy9545 · December 5, 2020, 7:37pm

I am glad it seems like a good alternative to you. Yes, I would propose the system would be used for matchmaking too, and not only just for awarding points.

Indeed, my proposal is quite similar to the idea of raking the average of the Elo ratings. I am proposing we use the average too, just on an exponential scale.

I am puzzled though why the fact that my suggestion is quite similar to the average Elo approach makes you like it less as a solution we should look into. When everyone in the team is about the same rating, none of the problems mentioned in this thread arise anyway. But under the circumstances where the approaches are different - the cases where not everyone has the same rating - I would argue my approach is best.

I am speaking from some experience, by the way. Sometimes I queue up for a 2v2 with my buddy and he used to be like 500 Elo below me. We were getting matched with players who’s Elo was close to our average Elo, and I think this gave us an advantage. When points where awarded, the game did not look at average Elo, but instead points gained and/or lost was entirely dependent on the highest rated player in the other team. Neither has made much sense to me. The same system should have been used for matchmaking and for awarding points, and as I said, I think taking the average on an exponential scale is best.

WoodsierCorn696:

I am not sure if we really want elo for every single map. It might be more accurate, but it also get a lot more difficult for people to understand the ratings. Every open map will have a rating. So you have a rating for Arabia, Cenotes, Valley, … But also a different rating for just ‘Open maps’. Also when playing Arabia, also your rating for Cenotes, Valleys, … change. That seems to be too much complex for me. Why not use some specific groups:

Open maps

Closed maps

Hybrid maps

Water maps

I dont think you really need more than these. In the end you also have some rating for 1v1 and TG and RM and DM. This means that you will already end up with 16 ratings. If you also want an overall rating for each ladder, then you already end up with 20 different ratings.

In the end i think it wont matter to much for the quality of a match to be more specific in rating than just those 4. From a mathematical point of view i understand more stats are better and more specific. In the end it is also a costs and benifit analyses and i dont think the costs are worth the investment.

From a mathematical point of view i do understand why you just want more stats as more = better in machine learning.

I don’t think it is a problem if people don’t understand the ratings. Already almost none understands how Elo works, though most people have an intuitive grasp of it. Under my suggestion, the situation would be the same. People can intuitively grasp that when they win on a map, their rating on that map increases, as well as their rating on other maps that are similar.

The reason why I think it is bad to use any category to lump maps together is because it will never be good. For example, Mediterranean and Four Lakes are both hybrid maps, yet they play completely different and I know I am much better at Four Lakes than at Mediterranean, since Four Lakes suits my playstyle more. Plus, you would give humans extra work to categorize every map… Why not let an algorithm do it when it can do the same thing much more precise, efficient and without bias?

I do think the quality of the matches would be improved if you were matched based on your rating on the maps you want to play. That being said, I don’t think this should be as much of a priority as some other things, such as fixing teamgame matchmaking.

WoodsierCorn696 · December 10, 2020, 9:30am

I had two main points:

If you go to your system, do we really need an adjustment in the matching algorithm, so it uses ‘your average’ or can we just stick to the current match making. I dont think it will change much. Most games will be still the best match. So i dont think we really need to change the matching algorithm. Changing only the way in which elo is calculated seems already be fine to me. So in fact this isnt a reason to not go for your elo calculation, but only a reason to not change the match making rules.
From a mathematical point of view i fully understand why we want this and why this is better. I also understand how such system will works. I only think this seems like an overkill. You can already fix lots of issues in the match making with simpler solutions. So this option might be too complex. Do you really need a rating for every single map? Or can we also setting somewhere in the middle? Is that already good enough? I think that is already good enough. Currently you just have 1 rating for all maps and already most games seems pretty balanced to me. It is not like that in the current setup most games are imbalanced. Note that i speak of 1v1’s. It is different for TGs, but that is mainly because of all the issues with TG Rating. I also think TG rating will always be less accurate, because you can get a carry from an ally, so ratings are inflated for that reason and things like that.

Overall i am positive about your suggestion and i would love it if the devs will have a look into your solution.

SouBotsito · December 10, 2020, 9:34am

Why making things complicated, simply give half the points they are giving now, do a reset to clean that and to punish stackers, after the second game together a party will get less points per win.

And obviously the elo distribution should be according the overal not the highest player in the team, it worked like that for voobly, 2.2k was the max players there and the pros actually were quite active on that ladder, but here is a joke.

redKnightks · December 11, 2020, 3:57pm

I agree with @Mercy9545 that the elo system should be such that total number of points gained by one team should be equal to the total points lost by the other team. Also the elo change for the entire team should be same. This is what most other good multiplayer games do.

Now how to calculate the skill of a team from the ratings of individual players is a different matter. What exact formula to use can be better derived by looking at the real match data. All such formula can be compared to see which one predicts the win% the best. This is how they did it in Dota2. They used a simple average for the first few years of the game, to find out that the highest rated player has more impact on the team performance than other players, so moved on to introduce some sort of weighting system. The exponential weighting suggested by @Mercy9545 is one such formula.

Then there is the issue of measuring impact of partying up rather than playing with unknown people on your team. Starcraft II goes the route of having a separate rating for each party. Most other games have a separate rating for solo-queuing and for queuing with a party. If we don’t want separate ratings we need to have some way of balancing the effect of queuing as a party. This could be as simple as having the formula which calculates the party skill from individual ratings slightly bump the ratings if in a party.

One other issue is that we currently have the same ratings for 2v2, 3v3, and 4v4s. The gameplay in each format is slightly different from each other. But even if we assume that players play more or less similarly in each format, there is the issue that a difference of say 100 points may not be the same in 2v2 as in 4v4. Hence our formula should take care of this factor aswell.

WoodsierCorn696 · December 13, 2020, 4:01pm

@GMEvangelos Still waiting for a response from the devs on this subject. This thread is already linked in about 40 different other threads. This thread isnt linked in al reports, so there are even more different reports about this subject. So when will this issue gets priority of the devs? This thread was made in May. So already 7 monthts ago. Still no fix. Still not response from the devs. This thread wasnt also the first about this issue, so this issue is already reported more then 7 monthts ago. When will this be fixed?