A lengthy, breathless explanation (with examples) as to why proposing balance changes based off winrate is a terrible idea

Your example kinda fits the old coustillier (you would see pros managing to hit and run them with CA despite cousts being able to one shot said CA) but those kind of units don’t even get nerfed because of winrate (Burgundians winrate was around 50% because their bad eco would compensate for the coustillier OPness) but rather because most of the playerbase voiced their complaints and frustration, so they outright hotfixed it out.

I wouldn’t trust even stats above the 1200 group because big mistakes can still happen (I swear I’ve seen 1400s try and beat mass genitour with xbows)

As of the low sample size of tournament stats, I would see it more as an argument to give up on stats even more and just directly ask the player why they choose the maps, or find other reasons (like how players training with Turks and Malay on Bypass lead to almost all matches on this map during HC4 being a duel between those civs)
And ofc using common sense never hurt. Ie.when Hera ranks slingers and kamayuks alongside ele archers in a tier list don’t believe him 11

Just because mistakes can/do happen doesn’t mean the data is useless! It just means that the data is noisy (i.e. has more random variation). A general fact of statistics is that you can reduce the effect of noise in your data by simply collecting more data (assuming the noise is unbiased). Simply put, the more noisy your data is the more of it you need to extract a meaningful signal. That is to say that I strongly refute the idea that just because people make mistakes means that their data is useless.

Also if you lift the ELO cutoff any higher you are effectively ignoring >70% of the player base. At a certain point if >70% of people are all making the same mistake you have to wonder if its the players or the game…

I also strongly disagree with the idea of just “ignoring” the stats / WRs. They provide a ton of information. The issue is that they only provide top level summary assessments; care needs to be taken to look into where and why balance issues exist and whether they even need to be addressed.

No single metric / method should be used for balance as they all have their pros / cons; but likewise that doesn’t mean we should just blanket disregard a metric just because it isn’t perfect. WR’s (when constructed properly) can be amazing for identifying problem areas, but they need to be used in combination with expert review, scenario assessments and intended design/purpose to inform actual balance changes.

2 Likes

“often” and “frequent” are synonyms. Anyways 4 out of 35 drafts is not often. Ya they happened to show up in consecutive drafts but their overall preference is quite low. At least a dozen other civs got drafted twice as much as Vietnamese.
Like I said, a ton of civs like Japanese, Lithuanians, Italians, Celts, Malay didn’t get picked in this Empire wars settings. But if you look at draft % of these civs in other tournaments like Hidden cup, TOC, Kotd-3 its quite good in at least one of these. Its ok for some civ to be preferred for some tournament formats but not others. However Vietnamese are one of those civs that are almost never preferred in most tournament formats. Sure one or two players might pick it but you won’t see them in most drafts in most 1v1 tournaments. That’s the problem.
People just don’t talk about it because no one ever picks. People just call it balanced or good and no matter what format of tournament you bring, they almost never pick it. And this is just quite the opposite for civs like Franks, Britons, Vikings, Mayans. They just have a ridiculously high draft rate in all tournaments and very high play rates in ranked 1v1 at high level.

1 Like

Your conclusion sounds about right but is there anyone who does this work rigorously? Like we can’t even know if FE themselves do it since they don’t communicate about it this much, and it surely doesn’t happen in 99% of balance posts.

I am not sure about this, my personal impression is that different units/civs translates differently at lower levels more due to lack of knowledge than due to lack of skills.

For example, like a lot of mid elo players, I used to struggled a lot when I played archers versus players going knights. The take I used to get from this was “xbows are not good at my level, knights are OP, I’m never going to be able to be able to use xbows versus knights because I don’t have APM, help devs” or stuff like that. And guess what, stats apparently confirm this since cav civs do extremely better at my elo than archer civs.

Reality was that the issue was not about micro but about decision making/general knowledge of the game. After watching opponent that beat me going archer while i was playing cav, getting good advicing from watching coaching sessions and from some higher level player on reddit, I figured out that my issues was:

  • I did not use monks and pikemen correctly, I added them in wrong order and generally I did it too little/too late.
  • I did not use the proper time windows to pressure my opponent, expecially late feudal/early castle age/early imp.
  • I did not set up my base correctly and I was super exposed to raid/mobility.
  • I did not macro correctly behind my archers play.

After applying those changes that have nothing to do with skills/micro/APM, I ended up having an ok archers versus cavalry match-up, even though I’m the same unskilled player as before (I gained maybe 50-100 ELO because I always go random and I brought my archers play on par with my knight play) and against arabia changes that are favouring cav, and turned out that the famous take “cavalry is OP if players have low APM”, that stats seem to confirm, may be complete bullcrap.

My personal take from this is that some things are good/bad at lower levels because people don’t study this game enough and don’t get enough knowledge (in fact, when I play cav vs archers I often see my opponents make the same mistakes I used to do and I crush them easily if they’re on my skill level).

So I think it’s most of the time pointless to make changes based on mid/lower ELOs, people on those levels will just underuse those changes just like they underuse what currently is in the game, so it’s in my opinion very unlikely to get the expected results.

It’s also extremely risky to base these changes on stats, since they’re so aggregated that it’s extremely easy to get crappy takes out of them, misunderstanding the underlying dynamics that generate those stats and ending up making wrong changes.

2 Likes

The thing about win rate stats from ranked is that they’re heavily biased towards Arabia. Just two or three maps constitute 90% of the ranked games at higher elo. You’re right that balance shouldn’t be on tournament picks alone but it should be a mix of tournament stats, win rates and play rates on multiple maps. If one or few pros pick or favor some civs, thats fine. Its their bias. But if almost all the pros pick some civs in almost all tournaments or never pick a civ in almost any tournament that says something.

This is just true for some rarely picked civs. Like Teutons have 100% win rate in RBW 5, doesn’t mean they need a nerf. But in general lets say a top level tournament has 20 matches overall, a civ A gets picked or banned in 15 matches. And then lets say in a year there are 10 tournaments with varying settings and the civ is continued to be picked more than 70% of times in all these tournaments. Now lets say this has been happening for 10 years. What does that say about civ A?

But no civ is currently the same that it was 10 years ago

Yeah, no question that there’s a lot of civs that take the plethora of drafting picks. That’s inarguable. Compare the Vietnamese to those civs, and you will certainly believe the Vietnamese to be a bad civ. Fact of the matter is, we know those civs get drafted in every setting, in every tournament, and picked consistently in the top level, the average game, and in low ELO. One could easily argue that these civs taking up the lion’s share of the drafting picks may indicate them to be blatantly OP, and if you were to argue that, I think it’s reasonable to believe the Vietnamese are a middle of the road civ.

However, both of these trends could also indicate something a lot less important and something we need not concern ourselves with: Popularity. Maybe these civs are picked so readily and so often because of the great volume of practice players have with them, due to their common use? Certainly it could be a mix of both, no doubts there, but that concern tells you what I’d needed answered before you and I would be able to come to an agreement on a civ like the Vietnamese, which is the opposite case.

I’ve defo seen a few posts where people have suggest changes based upon the stats + scenario data and done the maths on how their proposals will impact or fix the suggested issue. That being said its defo few and far between.

I honestly think the biggest problem is the lack of “design / intent” that we get from FE. A great example of this is the Vietnamese; is their role anti-archer (as implied by their bonuses / UU’s) or is it as a generalist civ (as implied by their very open tech tree)?

If the former then they are probably are balanced ok as the WR data suggests they beat / have an advantage over all other archer civs (with exception to Britains, Mayans & Chinese). However if their role is the latter (the generalist) then I would argue their WRs indicate they are not performing that well in that role and that they need to be looked at.

Its just very hard to make suggestions or to know what to even look at when the intent of the civ from the developer is unknown :frowning: I find the lack of communication from the devs very frustrating. In most other games I’ve played you would often get weekly blogs explaining various different changes / developments behind the scenes which help curate the conversations that we have on the forums.

fact of the matter is, developers can surmise and plan around a civilization’s expected strategy, but the meta will always be made by the players using it. I promise you the developers didn’t expect the Incas to turn into trush supreme seven years down the line.

All game stats data is drawn from a joint distribution of P(civ stat, player behavior, situations,…). That is the data we see is generated as if drawing from such a distribution. Except what we care about is the existence of a player behavior, situations joint distribution such that P(civ stat = ‘good’ | player behavior, situation) is relatively high. This is more or less necessary and sufficient to makes a civ balanced.

Once you’ve found that behavior, situation distribution you can check if it needs tweaking by analyzing the barrier to entry so to speak. Really high skill floors aren’t great for example.

Unfortunately players who try only 2-3 strategies on a civ and then give up on that civ if they don’t “feel good” are everywhere. And these 2-3 strategies tend to be correlated. This makes the joint distribution of player behavior and situation heavily biased to the point you’d be lucky that enough samples of the desired data are even generated in the first place.

This is very much a garbage in garbage out problem.

Biased in what way? The only issue I see is that it means we have an uneven coverage of the probability space but then this becomes an estimand question of how much do we weight all the possible strategies vs those that are actually used by players in practice. I accept that the data we have isn’t detailed or sufficient enough to evaluate all specific individual plays but I argue it is more than good enough to evaluate the marginal WR rate distribution of various civ v civ match ups which gives us a guide to what areas need further investigation.

Monk as a core of an army is practically never done despite it’s very high potential skill ceiling and strength. Namely, because people don’t want to have to micro mass monk. That’s a huge blot in the overall meta of the game that is basically missing from evaluations, as one really easy example generally. There’s tons of things, with tons of civs, that are basically the monk thing but on a smaller scale within the meta of each civilization, and that breaks down even further into differences in matchups that would look different should those missing bits be implemented into the overall evaluation.

biggest mistake i see when people talk about civ balance is that they don’t understand the settings the game actually uses

people are seeing the map before they pick the civ. this is true on ladder. this is true in tournaments.

so there are 1-dimensional civs (like franks, cumans, poles) which have these so-called “weaknesses” (eg. few bonuses for water maps, no bonuses for dark age), but that never matters because the game is so dumbed down that you have perfect information about the map when you are making the decision. the only way civs can be balanced under the strengths/weaknesses model is if the maps are actually random instead of telling you everything you need to know just from the name and then letting you choose where you spawn

it doesn’t mean franks/cumans/poles are the best civs in the game. but they will never be balanced with the current ladder settings where you can only pick t hem when there’s a 0% chance that it’s a water map and when you can guarnatee a 100% chance that you spawn in the back and don’t lose anything by having a slow dark age

That certainly makes some of the most winning civilizations better, but on the opposite side of the same coin, it also makes the less well-rounded civs better as well. I think in a world where you didn’t know what map was coming and it was relatively even distribution, Vikings would very clearly be the best civ in the game.

Selection bias by players causing confounding in any analysis.

Ideally the player behavior we observe is correlated with the expected strength of that behavior. This is more or less the unbiased method of selecting a behavior. However most players do not know the strength of many strategies and are heavily biased toward particular ones. Or the practice they have with one is going to bias their selection.

Thus our marginal distribution, instead of reflecting unbiased strategy selections will reflect biased selections. In particular we would expect that it will reflect the civs which execute meta strategies well will have high win rates.

The meta fundamentally confounds inference because it leads players to become significantly better at some strategies than others. Instead of the causal relationship being:

  • my civ, opposing civ, APM limits → strategies.
    It’s
  • my civ, opposing civ, APM limits, habits/practice/knowledge limits → strategies.

The first is player independent and as such allows for identification of civ design issues. The second is not player independent. Without a way to remove that confounder by subselecting your data to only people who have thoroughly explored a civ and thus have significantly less biased selections you will never be able to separate player issues from civ issues.

This is why more data doesn’t help. You need to fix the data generating process to get better data.

Its not noisy. Noise is deviant behavior. Lets say civ A dominates civ B according to overall stats but some players consistently lose with civ A to civ B, that’s data noise. The data from low elo doesn’t correlate well wrt overall game play and balance in general, simply because of the lack of skills and good decision making. This is what @ApplaudedPoppy2 points out.
Upto 1100 elo, you might have about 60% of players and Goths might have huge play rate and win rate. Doesn’t mean they need a nerf. Doesn’t mean anything actually. As they play more and improve they’ll realize that its not as good as it seemed before. This is why those stats need to be ignored. They do provide a ton of information but not information relevant for balance changes.

1 Like

Well I initially thought that before DE too. But just take a look at civs like Lithuanians, Bulgarians, Tatars, Cumans, Khmer and Sicilians. Lithuanians were one of the most popular in Redbull till they lost that starting bonus after which they were never picked. Bulgarians were never picked or played before the blacksmith and siege workshop bonus and after that they’re very popular. Tatars literally were never played in any competitive tournament till the castle age sheep bonus and then suddenly one of the top picks now. The strongest validator of this popularity trend is Khmer - never picked except for Viper in voobly and DE till the farm bonus was introduced. And then suddenly one of the most popular and early drafted civ. Then after the nerfs again down to being moderately seen.
So its not like Franks, Mayans, Vikings, Britons being popular only because of the practice. Its mainly because of having more than one powerful eco and military bonus in the game. Either those civs need to be nerfed to reduce the popularity or priority gap (unfortunately so many people are hurt when balance suggestions to their dear beloved Franks, Britons are proposed). Or weaker civs need to be buffed to make them more competitive.

2 Likes

I think on top of that it is also a number of reasons… many longer term players stubbornly refuse to admit the game can be balanced more, accepting whatever balance is currently in effect as being good enough and simply not caring whatever the player base is going through , likely also because most of them don’t actually play.

Others relying on whatever someone has said (usually a pro) to base their whole counter argument on. Whether they do or don’t understand what is going on becomes irrelevant as they try to find reasons to support what their fav caster has said…

There’s a few members here that their only purpose seems to be to argue against any suggestions , OP is a perfect example of it.

One of his biggest hills he fought over was that Malay eles were the best eles in the game, long sprawling posts (similar to this one) about how good Malay eles are … and then Malay eles were buffed (greater discount)

1 Like

Imagine the level of narcissism to assume even a majority,nevermind everyone thinks exactly the same…

I’m not going to waste time contesting any more of this argument. It’s literally pointless…

And will continue to support and use stats to justify logical changes…since it’s worked before and will continue to work

2 Likes