I don’t think there is an issue here that can be easily fixed, at least to players’ satisfaction.
There’s another understated effect of how ELO should work for team games. Even if players only join team game queues with other players of the same exact skill, TG ELOs will need to span a much larger range than 1v1 ELOs: two people who are only 100 elo apart in 1v1 should be much higher elo apart in TGs. Intuitively, it’s because if you have 4 players on a team who are each 100 1v1 elo higher than their opponents, they are a lot less likely to lose the game than a single player who is 100 1v1 elo higher than his opponent. Let me try to explain the math of why, for example, someone with 1400 ELO 1v1 should have a much higher elo in 4v4:
The elo system assumes a distribution with a fixed standard deviation in terms of ELO, and that is reasonably good for a lot of skill-based games and sports like AoE2. From the wikipedia page: “Two players with equal ratings who play against each other are expected to score an equal number of wins. A player whose rating is 100 points greater than their opponent’s is expected to score 64%; if the difference is 200 points, then the expected score for the stronger player is 76%.” What this means is, a player who is 100 ELO above another player in 1v1 should win 64% of games against them. That is, a 1100 1v1 elo player should win 64% against someone else that’s 1000 elo, in 1v1s. Intuitively, I think of a game as a sum of individual decisions with somewhat-random outcomes, I’ll call this the “performance” of the players. By chance, a player with a lower elo can win against a player with a higher elo (by “performing” better by chance), and the ELO system measures how likely it is to happen.
Similarly, the system attempts to do the same for team games. Let’s assume a system where we only have 4v4s. Four 1100-TG-ELO players should win 64% of games against four 1000-TG-ELO players, and four 1200-TG-ELO players should win 76%. However, if you take four players who are each 1100 1v1 ELO, and match them against four players who are each 1000 1v1 ELO, then the probability of the lower-skilled players winning drops significantly below 36%, to something much closer to 24%. This is because, roughly, you have to average the “performance” of each team, and having four players, each with a 100 1v1-ELO advantage over the opposing players, would significantly decrease the likelihood of their average losing to the opposing team. In particular, you can think of the outcomes of TG matches as taking standard errors of the “averaged” performance of 4 players instead of standard deviations of the performance of 1 player; in this case, the standard error of a team’s average performance goes as Sqrt(1/N), where N is the number of players on each team, so the higher skilled players will need to settle at an even higher TG ELO to compensate for this. In summary, (tl;dr:) players who are apart by 100 ELO in 1v1s should be apart by 200 ELO in 4v4s, and this is part of the ELO system. This is assuming players only queue together with others of their own skill level, their 1v1 elos are accurate, and they ONLY queue for 4v4s.
The same argument can be applied for 3v3s, in which case every 100 ELO in 1v1s should be 173 ELO in 3v3s (because sqrt(3) = 1.73…), and for 2v2s, in which case 100 ELO difference in 1v1 should be 141 ELO difference in 2v2s. This means, if we center elos on 1000 ELO, then theoretically, a 1400 ELO 1v1 player should have a TG elo of 1566 ELO in 2v2s, 1693 ELO in 3v3s, and 1800 ELO in 4v4s. This is why it’s normal for TG elos to be way more spread out than 1v1 elos: if you are 200 ELO below someone in 1v1s, you should expect to be ~400 TG ELO below him if you both play mostly 4v4s. Similarly, someone who has 2000 ELO TG and only plays 2v2s should be more skilled than someone who has 2000 ELO TG and only plays 4v4s, assuming people never mix queues.
However, if the game tracked 2v2s, 3v3s, and 4v4s separately, it would be way too many stats to keep track of (a player would need to play 40 games just to initialize all 4 rankings), so the developers decided simply to combine all TG ELOs into one statistic. Furthermore, players intentionally queue with friends of varying skill, and other players get randomly combined with each other, so you can expect a lot of unpredictability with how TG elos change. There’s also the problem that over time, players who are lower ELO are more likely to either abandon the game, or just play less often, and players who just join the game start off with 1000 ELO. The last factor contributes to TG elos drifting higher (the “inflation”), and the first few factors exacerbate this drift. But I still roughly believe the overall math: for example, looking at the plot from the github link, since the median TG ELO is around ~1400 and the median 1v1 ELO is around ~1000, you can expect a ~1500 ELO 1v1 player (500 ELO above average) to be ~2400 ELO in 4v4s (500 * Sqrt(4) = 1000 ELO above average), and they both indeed map to the same percentile (95%).
This is why TG elos can vary so much compared to a player’s performance, and I always take them with a grain of salt. I don’t think the ELO system needs to be revamped – I can’t imagine a system that avoids all the systemic issues I mentioned last paragraph while still being useful to players, and I think it’s intended to just be a rough estimate of a player’s TG skill. I just expect that if I face an opponent that’s 200 ELO higher or lower than me in a 4v4 TG, he’d feel about 100 ELO higher or lower than me in 1v1s, and that has felt accurate on average.