Why the game is less balanced than the stats would lead you to believe

breeminator · October 29, 2021, 11:36am

I’m not sure you’ve grasped the effect yet. You can’t remove it by only using data from certain maps, as the player’s ELO is still affected by the games they play on other maps.

One way to remove it would be to maintain your own ELO for all players, which only uses the results from the maps you want, and account for the effect of the ELO differences.

coolios9876 · October 29, 2021, 11:42am

O sorry I miss-understood what you were referring to.

If i’m understanding you right you mean that a player could play games on Arena with any civ, lose, then come and play a game on Arabia using their fav civ and win because their Elo is lower thus boosting the observed win-rate for their fav civ in the “open” map group…

I need to think more about this as I can’t think of any easy way of accounting for it. I’m wondering if it would be possible to have some sort of player-map affinity / random effect in the model to try and control for this…

EDIT: I am thinking that for each player with > 10 games we could set the “affinity” to be their win-rate for that map class (for players with <10 just set to a default 0.5). I.e. if I won 70% of my arabia games and 30% of my arena games by affinity would be 0.7 and 0.3 . In the model for each match we can then include a term for the difference in player affinity to hopefully mop up some of this bias (it wouldn’t be perfect mind you but should help).

Buonaventura · October 29, 2021, 11:52am

You’re right, that’s brilliant math.

So basically, all Frankish players can be less skilled than people at equal ELO with other civs.

That means that the only way to measure balance is to build a detailed 2-D map with the win rates between all pairs of civilizations, and also to measure APM or other metrics of skill (idle TC time in early game, camera jumps, etc.)

gnarfk · October 29, 2021, 11:52am

I had once the same kind of argument on team games, regarding “solo queuing” and “premade teams”.
The only issue is for people who play sometimes solo and sometimes in premade.

JokerPenguin593 · October 29, 2021, 12:52pm

Would this be like a propensity score matching approach? Sounds cool.

With propensity score matching you fit regression models to predict the exposure (i.e: civ picked) and then add that predicted score in a model.

Since we have more than 30 civs, I would try ro make a gross prediction based on only one aspect, like “meta and non meta” .

The issue is to define that aspect…

Another option would be to make one propensity score per civ and adding all of them to the regression model, but this may result in collinearity issues with lots of overlapping. However, this could lead to interesting analysis, like the winrate of franks when played by people that picks non franks cavalry civs.

Edit. Wait. The simplest approach would be to adjust the “true civ winrate models” by only the propensity score of that civ per model! No more assumpions are needed with regards the other civs in those models

Buonaventura · October 29, 2021, 1:40pm

I think the issue is way deeper than that. Basically, the problem is that players from strong civs (like Franks) have far greater ELO rankings that they would receive with other civs.

The only way to measure the shift between ELO and skill is to track metrics like APM, or reaction times (idle TC in early game, time housed, etc). In this way, you could check if “lazier” players using OP civs are being ranked higher or not. Because at the end, the ELO system guarantees that everybody has a win rate of 50% approximately.

JokerPenguin593 · October 29, 2021, 1:54pm

Not a big deal: the “propensity score” should take in account all things that may “determine” your civ pick:

Step 1:
You compile all available data per player

Step 2:
Then, for each player, you calculate the probability of picking each civ (so 39 models per player) according to the things that may “determine” that civ pickrate:

Elo (Some civs are picked more often in higher or lower elos)
Map (some civs are picked more in certain maps)
Previous player pickrate with that civ (players tend to pick the civs they are more used to use)
Previous player winrate with that civ (idem)
You can also add the APM or other interesting data like time housed, Idle TC time, but I don’t think they are easily available in the data source that @coolios9876 uses. The more information for the statistical models, the better.

Step 3:
Calculate the winrate per civ, adjusting by that civ-propensity-score per player.

This way, the regression model will calculate the adjusted winrate per civ once we have in account the things that determine each civ pick rate (ELO, map, and players that only play random civ or a little pool of civs, etc).

Then, @coolios9876 could also calculate the averaged winrate against all civs as usual.

coolios9876 · October 29, 2021, 2:26pm

I need to think about this more. I’m not too sure how propensity scores would work here though (incidentally we use these a lot at work though mostly in combination with IPT and SMR weighting).

At least I’m not sure that what you (@JokerPenguin593 ) proposed above would address the issue identified by @breeminator. As an example say we have 2 players who both purely pick franks but player A has a 50/50 wr on both arabia and arena whilst player B has a 80% wr on arabia and a 20% wr on arena. Player A will bias the frank win rates to 50% on both open and closed maps whilst player B will positively bias open map wr and negatively bias closed maps (because their arena loses lower their elo and give them easier matches on arabia). Regardless in both cases these players would have a Frank propensity of 1 because they are both pure frank pickers.

I think the “true-model” would have independent Elos for each map/player but we can’t actually observe that. I guess my idea was to use each players observed map-win-rate by including that as a “propensity-like” score in the model to act as an Elo-map interaction offset. My only fear with this is that we are essentially making a covariate based upon the result (i.e. bleeding information from the outcome into the predictors) which is typically a no-no. Would need to use cross validation to assess whether it helps the model or not.

WoeIsToWho · October 29, 2021, 2:32pm

Yeah, we’ve been there and done that. Suffice to say, your argument that the stats are worse than they look, I disagree.

High pick rates for certain civs and low pick rates for other civs spike winrates in ways usually unassociated with the actual balance of the civ, forgetting civ matchups and map options in their entirety. Familiarity in a game this complex is a very important thing, and a streamlined strategy is a big boon.

Play rate is one of a ton of factors that fill into the overall winrate of civs, and it’s incredibly hard to account for all these metrics at once, practically impossible to isolate single causalities, which is why I continue to say Winrate is a terrible metric for being the primary justification for pushing buffs/nerfs. These charts give the benefits of the most superficial insights to those with no experience, and little else.

breeminator · October 29, 2021, 2:39pm

I edited my post, but you might not have spotted it. I think one way to handle it is to only use the starting ELOs from the game, and maintain your own ELOs for each map/player as you process the game results. If the ELO difference in each game is also accounted for, it shouldn’t be a huge problem that the game is matching people differently to how we would ideally like.

Buonaventura · October 29, 2021, 2:41pm

While the statistical model you mention seems interesting, you cannot measure the notion that players with lower skill at competing on equal terms against players with higher skill (better APM, less time housed, etc), without actually measuring skill data.

No matter how advanced the statistical model, you cannot determine if low-skill levels players are over-achieving, without tracking skill metrics. It’s like determining the velocity of an object looking only at the distance, without clocking the time spent travelling.

There are fundamental degrees-of-freedom that are not being tracked. How can you say that players with lower X are achieving Y, if nothing in the system measures X at all?

JokerPenguin593 · October 29, 2021, 2:43pm

Even with civ-picker players with a propensity score of 1, the final model would give use more accurate winrates adjusted by those civ-pickers and, in case of players that are not pure civ-pickers, by other variables.
Maybe it wont be the perfect prediction but it will be closer to reality. And, if you use this at work, it shouldn’t be hard for you to give it a try (maybe one weekend of coding?)

True. Instead of using win-rate, you can limit yourself to map-based pickrate. The pickrate also would somehow adjust by the 50%winrate issue for most picked civs.

Or machine learning… but I cannot help you with that now because I have not taught about it.

coolios9876 · October 29, 2021, 2:46pm

Sorry yes I had missed your edit, this is defo an interesting idea. Do you mean something like this:

Filter data for specific map category i.e. “open”
Set starting Elo for each player to be the Elo of their first match in the filtered dataset
For each player process all of the matches in the filtered dataset manually adjusting their Elo as if these were the only games they were playing
Set the Elo for each player to be the final calculated Elo

When fitting the model would you then use the final Elo (naively assuming Elo is constant) or the Elo as it was at the given match ? My gut feeling is the former would lead to more accurate results as you get a better estimate for the players true “map-elo” and less bias from historic other-map matches (at the cost of assuming constant Elo)

JokerPenguin593 · October 29, 2021, 2:50pm

Of course, I acknowledge this issue. But sometimes statistics are limited and we should discuss them prudently.

Even in the absence of that skillfullness data, the new models would improve our discussions for sure.

Still, APM maybe hard to use. Some civs need inherently more APM than others (archer or cav archer ones), and APM also depends of the civ you are fighting against (i.e.: enemy onagers increase my APM). Franks is a civ that both don’t require APM (your tanky knights can resit anything) and don’t increase enemy APM a lot outside of monks in castle age. Compared to APM, housed time, or TC iddle time seems more objective regardless the civ played (although huns and persians may have an issue with this).

Although if we want to compare different APM across players that play one particular civ, APM may be a good measure to adjust for.

breeminator · October 29, 2021, 2:54pm

Yes, that is exactly it. Whether it’s better to use the final ELO, or ELO at the time of each game is a tricky question in its own right. I would probably go for the ELO at the time of each game, as my feeling is the extent to which a player’s skill can genuinely change over time is more important than the lag in their ELO catching up to skill changes.

coolios9876 · October 29, 2021, 2:59pm

I guess I could just fit both models and see which one minimises the log-loss or something I tend to just produce stats over quite short time periods (to not blend the effects of different patches together) my gut is saying that the time periods are short enough that changes in player Elo over time should be minimal. But yer will be interesting to see !

coolios9876 · October 29, 2021, 3:08pm

Sorry just to make sure I am understanding you, are you more proposing then the use of propensity scores to down weight single civ pickers instead of just whole-sale removing them ?

Assuming i’ve understood you properly I guess I’m just struggling to work out how to calculate and apply the weights. I think you are meaning that you weight based upon each players civ-propensity i.e. if I use Franks 90% of the time then my game using franks should have a 1/0.9 weight (obvs need to apply scaling). I think my issue with this is that it means if I then have a game with Malay who I use only 1% of the time it will be massively upweighted even though my match with them isn’t representative of their actual WR because of how Frank biased I am.

Ideally I imagine you want to down-weight all matches from single civ pickers and up-weight all games from random-pickers. That is to say that I think you need to weight the player as a whole rather than their individual matches. You also have the issue of how to merge the weights as matches have more than 1 player. Options include taking the mean or the max but at this point I’m struggling to really know what to do / how to interpret it .

I was originally toying around with the idea of just assigning the weight for the player to be 1 / % of games played by their most used civ. But even this isn’t perfect as theres a large difference between someone who does 50% civ A + 50% random and 50% civ A + 50% civ B . But yer in the end I just went with removal as it was easier to mentally reason about

JokerPenguin593 · October 29, 2021, 8:24pm

Yes, but the propensity score may also manage other problems like elo, map-based civ-picking, etc.

How do you predict winrates? Do you use multivariate regression models? If so, couldn’t be enough to add the propensity score as a covariate in the model so it is adjusted by it?
And in your example, you use pickrate instead of propensity score. Pick rate can be explained by a lot of things that may confound as well the winrate.

What I want is to simulate a situation in which players pick the civ at random, not only weighting civ pickers down.

I don’t fully understand this. I guess your data is match-based, and I was thinking of player-based data. I would like the propensity score to be player-based, so you have to calculate it previously and merge that database with your match-database. Thus, each match would have 2 propensity scores, one per player and civ.

You should make a loop to analyze each civ adjusting for civ propensity score.

AllergicTable49 · October 29, 2021, 8:40pm

Thank you for this awesome point of view, you’re totally right.

Anyways, another reason why civ pickers are killing the game.
And why developers should wake up and nerf the top 10 civ already.

coolios9876 · October 29, 2021, 8:53pm

I think I understand. The issue is I don’t think you can weight an individuals matches to get create a pseudo-random picker as the observations aren’t independent. Like normally with PS’s you would have a full spectrum of independent observations that might be biased to have only occurred at certain covariate values so you can use weighting to up-weight uncommon occurrences and down-weight common occurrences in order to get a pseudo-random distribution.

I think it might be easier to consider this at a WR per player by Civ level. Imagine we have 2 civs A, B with win rates 55% and 45% respectively. If the player played 95% of their games with A and 5% of their games with civ B they would like have observed win rates of something like 51% with civ A and 42% with civ B (I’m making the numbers up but they should demonstrate the point). I’m not 100% certain but atm I just don’t think its possible to weight this to create a pseudo-random civ picker because the civ B observations are impacted the fact the other observations were civ A.

Yer the true “observations” are on the match level as we are predicting the match result based on the civs involved in the match. This means any sort of weighting has to be done at the match level. So yer if we create weights on the player level we need to some how combine them so that we can apply them at the match level.

I do indeed. I think the issue though here is that the “desired” propensity score would be non-linear as in a true random-picker would have a play-rate of 1/39 but here we would want to penalize both ends of the distribution (those who don’t pick enough and those who pick too much. Whereas normally in PS you target 1 end of the distribution i.e. in a trial we would upweight those in the control arm who are most like the treatment arm (i.e. who have a PS as close to 1 as possible). There probs is a way to transform it to accommodate for this but I don’t know of it off by heart.