Power Creep, and a suggestion to measure it

By mostly buffing civs instead of nerfing, you make the game more and more snowball, because of someone has strong bonuses and gets to ride with them, they crush. If all have equally “weak” bonuses, then snowballing with them is less likely and you get more enjoyable balanced games.
Tbis is running aoe2

Incas, Persians, reverted slav nerf, teutons, archers that ignore armor even if the civ Overally sucks, these are all striking examples.

I kinda hate where aoe2 is going. Reminds of of LoL.

Back a few years a civ with fully upgrades but no bonuses on units would do kinda OK. I feel like today those same units would get crushed by super buffed or discounted or mega strong UU left and right.

IF goths were made today, they would have discounted And fully upgraded.

Composite Bowmen are not that strong. They are highly tuned to fight specific units, but get absolutely stomped on by a lot of units, mostly anything that comes out of the Archery Range.

They are a specialist, who pay for their power with weaknesses elsewhere. If this was true powercreep, they would have the same range as Arbalesters, but they don’t.

Some of you are acting like there are no nerfs going on at all in patches, which is just flat out incorrect. Plenty of civs get their bonuses nerfed/removed or options removed in each patch. In the last released patch, both Romans and Poles got nerfs, and in the PUP Persians have as well.

4 Likes

Oh, people are actually calling it that? That’s cute.

How is it affecting the game?

I think we basically have this already with Byzantines – they have a broad tech tree, no eco bonus, and niche unique unit and techs that don’t get used much. In terms of win rate, they tend to be either near the middle or slightly towards the bottom of the ranking.

This sounds actually pretty powerful to me. If this was used as a “yardstick” civ, I think you’d actually see more powercreep, at least initially.

You’re fundamentally mis-understanding the purpose of the “yardstick” civ. It isn’t meant to be “ok I want every civ to be as strong as this yardstick civ, now test every civ and make changes accordingly.” It’s “we need a civ that is as bland as possible and never ever changes, and as the other civs change we can measure them against the yardstick, so we can understand how they’re changing overtime”.

Also IDK if the bonuses for the yardstick would be necessary. I would want the yardstick to be approximately as strong as the average civ, not because there’s anything intrinsically better about that specific power level, but I’d be concerned that a civ w/o any bonus would be so uncompetitive that it’d be curbstomped by any civ in the game regardless of which balance patch is being talked about.

So if you test say 2020 japanese and 2023 japanese, and each has a 99% win rate against the yardstick, it’s hard to know if if any power creep has occured. the difference is performance is so small against such a relatively weak opponent it’s hard to see. It’d be like trying to compare the power of a hand gun and a sniper rifle by shooting both at a piece of cardboard.

If you think those bonuses are too strong, well maybe. but it wouldn’t cause more powercreep, because the yardstick isn’t intended to dictate change, only to measure change.

I’m usually not one for dunking on the devs, but I have to concede this point. I’ve often wondered why they don’t do something vaguely similar for pathing. set up a scenario, with dozens, maybe 100s of vill tasks and military units going to and fro, and just timing the average completion of tasks. I don’t have a super specific setup in mind, but I figured if you inadvertently introduce a bug it’d be represented as an inexplicable uptick in average task completion times.

3 Likes

introducing bug in such an old engine is inevitable. what is not inevitable is to play a game, test and find out and go back to the drawing board until things are fixed.

its a simple process of quality check and theres really nothing to defend here.

Incas has been unnecessarily buffed. Maybe comparing the performance in tournaments + winrates before/after the change is a good measure. Incas, as the example, was honestly fine even before the buff.

Same with Persian and Slavs getting there previous nerfs suddenly reverted.

Yes, you’re right, I don’t understand. I had interpreted the following:

to mean that you want to adjust every civ’s balance to try to get a 50% win rate against the yardstick civ.

What is the point of carrying out this time-consuming measurement process if you’re not going to take any action at all based on the results?

Presently we only have the ability to measure the strength of a civ relative to the other civs, but we don’t have any absolute measure.

If for example we have CivA, CivB, and the yardstick civ, CivY.

CivA = CivB + 2. CivA is a strong civ and CivB is a weak civ. But civs are probably best when around a specific “power level”. For completely arbitrary reasons let’s say that’s 15.

So a well tuned civ is a 15. Anything between 14-16 isn’t bad, everything else is not ideal.

But we don’t know how strong CivA or CivB are, only the difference in strength between them.

CivA = CivB + 2 could be true if CivA = 16 and CivB = 14, or if CivA = 420 and CivB = 418.

So then the yardstick civ is an educated guess at creating a civ whose power level is 15. CivY ≈ 15.

Let’s just assume that this is the best we’ll ever be able to do. We can’t 100% know if CivY = 15, but we still estimate CivY ≈ 15. CivY will still serve it’s purpose if it’s actually a 13 or a 17, or anything in between. So long as it’s reasonably close. If it was a 1 or a 50 it’d be useless.

Ok now we test CivA and CivB against CivY.

CivA = CivY + 3, therefore CivA ≈ 18, and CivB = CivY + 1, therefore CivB ≈ 16.

So now next year, after some changes to both balance and meta, we look again at CivA and CivB and see their relative win rates. We now find that CivA = CivB + 3. The difference between them has grown.

But is that because CivA has gotten better or is it because CivB has gotten worse? Without some fixed reference, this is all we’d now. The difference has grown but it’d be hard to tease out why. Thankfully though we have a test to answer this very question.

So I repeat the test and see that now CivA = CivY + 4, when previously CivA = CivY + 3. CivA has gotten stronger, whereas CivB has remained the same. So if I want to return CivA and CivB back to their old relative difference AND keep the average power of the civs in line, I should nerf CivA to bring it back down to CivY + 3, instead of buffing CivB to be equal to CivY + 2.

Note, that I didn’t try nerfing CivA to be equal to CivY, only to where it originally was.

You can see that if we erroneously buffed CivA we’d increase the average power of the civs, and you could do this many times over the history of the game. CivA = CivY + 3 now, CivY + 4 next year, and CivY + 5 the next. But because we now have a measure, you couldn’t accidentally do it. You could choose to do it I suppose, but you’d be doing it with intent, not because of insufficient data.

Now if you were very confident in the test AND the power level of the CivY you could hypothetically try balancing all civs against it. That isn’t what I’m advocating, but nothing would prevent you from doing that. All I’m advocating is measuring power creep to prevent it’s furtherance.

Now you could rebalance using historical data. If in 2020 the average civ was CivY + 1, but in 2023 the average civ is CivY + 3, you could rebalance if you thought 2020 power levels were better. But you aren’t rebalancing the civs based on CivY. CivY only exists so you can understand the power levels from a fixed reference point.

Probably ill-advised for me to respond to this but here we go.

I understand the idea of having something static to compare civs to, and I understand what powercreep is. I didn’t need a long-winded explanation of that, and I certainly didn’t need one packed full of fake mathematics. (I’m a mathematician, so the fake maths made me pretty cross.)

What I don’t understand is why you think it would be worthwhile to make measurements against the yardstick civ but then not make any decisions based on them. Although it turns out that’s not what you’re proposing:

Here you describe using comparisons with the yardstick civ to make balance decisions. But then you immediately contradict yourself:

Yes you are! In the previous sentence you’ve literally just described an approach to rebalancing civs based on the yardstick civ!

Given that what you seem to want to do is compare current civ balance with past civ balance, wouldn’t it make more sense to play off past versions of civs against the current ones?

1 Like

I think the whole purpose is to NOT currently compare.
But the whole approach is doomed from the beginning.
When you want to use matehematical tools to confutate the perception, you need to first precisely define the alternatate. And its very silly to think you could just propose a very obscured anstract concept and think that other people will do the work for you and construct a mathematical consistant construct from it.
Especially when the intention is to confutate.

And btw ofc you CAN construct these kind of mathematical models. But that’s why it’s important to chose your evaluation methods wisely.

I then proceeded to at length describe exactly what I would do with that data.

After then having provided what you’d asked you then respond,

I’m a developer, and it’s useful to use what we call pseudo-code to quickly describe a concept, the flow of the logic, without going into all the nitty gritty. I certainly wouldn’t go up to another developer, have them outline a solution, then tell them “well you outlined it in pseudo-code, so that is the exact same thing as being conceptually bereft of merit”. I find it illuminating you didn’t object to the actual concepts, just the manner in which they were written down. Presumably, I can only infer, that was the most objectionable thing in your eyes, for otherwise I’d have to believe that you found this of little consequence but inexplicably thought it should be your opening argument anyways.

Um actually this is the previous sentence, two sentences actually. I really struggle to understand how you so fundamentally misunderstood this as well. You can rebalance a 2023 based on the 2020 performance of that civ. You only understand the difference based on testing against CivY. The point isn’t to try to make 2023 civ more like CivY. Becoming more like CivY does happen in that case but it is incidental. any change would make CivA either more or less like CivY. CivY is just as useful if it were stronger than CivA2023 and you weakened CivA to be more congruent with CivA of 2020. You wouldn’t measure an object with a ruler, find that’s ## ###### ##### decide actually it would be better an some other length (6", 8", 5", 9", whatever), make said adjustment, and then conclude the object in question has now been made more ruler like. The only reason I want CivY to be reasonably close to the power level of current civs is for the same reason, you wouldn’t measure a grain of rice with a yardstick, or the length of city block with a ruler.

Admittedly this wasn’t in the original post so presumably you didn’t notice it. I think a Civ v Civ would be better at understanding power creep on a per civ basis, but civ v benchmark would be better at understanding power creep roster wide. But again, I’m not a statistician so perhaps my intuition is faulty on this specific part. Basically instead of comparing the 35 civs that were out at the release of DE against a common yardstick, it’d be against 35 different yardsticks, their 2020 variants. I didn’t think that’d be useful for a roster wide average delta, but you could do a mean that way. Again, I think, could be wrong, not a statistician. Maybe the Civ2023 v civ2020 test is better in all ways conceivable. Presumably not, few things rarely are, but I just don’t know enough to conclude definitively. If civA > civB && civB > civC === CivA > CivC then you probably could average the various 2023civ vs 2020civ results, but that isn’t the case which I why I believe the results couldn’t be averaged. But for like the fourth time, maybe on wrong on this point, and 2023civ vs 2020civ results could be averaged and be meaningful.

What?
All I see is very crowded pseudo-intellectual blubberish.
Very obscure vague constructs that are not at all linked to any developmental stuff.

The only mathemetics I see are suprficial basic idioms that have no significance. They don’t show a method, they are just idioms. Not wrong, but without any functionality.

What is written between the lines in the blubberish is the interesting part. Because what you imply is that any change of a power level of a civ over time would signal a “power creep” to the devs. Factually saving the current relative power levels of all civs for all time.
And ofc you now say you didn’t said that. But you actually did. You just neede that blubberish to not say it explicitely.
And I doubt you are a dev that actually works in coding. It’s way to artificially obscured for that.

Not true.

Strange how I pre-empted that criticism with some code right here. Note the double ampersand is comparison, a single & is for passing variables into a function as a reference so that the variable in the global scope is brought into the scope of the function. If you were writing js you could just define the variable in the global scope and use it anyways, that’s more of a php thing, tho theoretically you could just use global in a php function so you wouldn’t need the ampersand to pass your argument as a reference. Also note how I used triple =. = is assignment, == is equal (tho that’s a bit loose as things can be of different types so null and “”, or 1 and “1” can be equal), but I used === which is identical so the data types have to match so 1 == “1” but 1 !== “1”. presumably those variables are of type int or float, depending on what the tester function would return. The exclamation point means not, so !== means not identical. You might make the mistake of using the ! and using ===, but that’d be incorrect. Considering I didn’t have any characters before my variables, it looks like this pseudo-code is more congruent with js. php variables need a $before them. I suppose they could be php constants. This code doesn’t really do anything tho. It just does a comparison and the true or false it returns isn’t used or captured in this example. And honestly this comparison would probably be useful as an xor, where one of the two things being evaluated must be true, but if both are true the overall statement returns false, but if we again assume this is js and not php, then there isn’t a xor comparison. You could write it where you compare each half to true or false, then combine with and || (that’s how you write “or”), and compare the other way around.

if((civA > civB && civB > civC) === true) && (CivA > CivC === false)) || ((civA > civB && civB > civC) === false) && (CivA > CivC === true)) {
//do a thing
}

oh, don’t mind the double slant, that’s just for quick commenting. You could alternatively use a pound sign, or anything between /* and */ would be commented out. IDK, i prefer the double slash. Well, that is for js and php. CSS only the /**/ works. Html is completely different using . Maybe it’s because it’s kind of similar to a element? maybe that’s why html comments are formatted way. I’ve actually never thought of that before.

going back to php tho, you could re-write that evaluation to

if(($CivA > $CivB && $CivB > $CivC) xor ($CivA > $CivC)){
//do a different thing, but this time in the server side language php.
//also not the $ before the variable names.
}

if we wanted to get super wild in php you can have variable variable names.

$a = “CivA”; //don’t wanna forget that semi-colon at the end there.
$b = “CivB”;
$c = “CivC”;

Then if you use $$a for example, the first variable will evaluate to “CivA”, so then you get $CivA.

if(($$a > $$b && $$b > $$c) xor ($$a > $$c)){
//doing the same as above but in a needlessly roundabout way.
}

If you take objection to the actual concepts at hand, I have no qualms about discussing those, but I don’t think ad hominem attacks have served this discussion well.

Maybe you could try to formulate what you are actually want to achieve here then.

For me it’s obvious what you want to achieve here.
Ofc you can just say I would be wrong. But you still fail to formulate any alternative WHAT you actually want to achieve by saying the others would be wrong by interpreting you.

And I see no reason to put any work into something when it’s even not clear what should be achieved with that. When there is nothing to achieve with that, why you want it to be done?

And btw I had a look in the stats. Most civs that didn’t received massive changes have been stagnant since aoestats is live again. Only a few had changes in between patches due to map and meta shifts. But they usually weren’t longlasting.
Even civs like Britons were as bad as they are atm since years.

Maybe there is a minimal powercreep. But it’s so marginal that it’s completely undetectable due to themassive noise from all kind of different influences.

I’ll admit, I feel like I’ve explained myself already at length, repeatedly, in numerous ways, with diverse examples…I’ll also admit I also feel that way when I’m explaining something to my fiancee and 20 minutes later she’s ready to strangle me cause she re-phrases what I was explaining in two sentences and wonders why it took me 20 minutes to explain something so simple.

Let’s give it another go.

I think decisions are better when better data is in hand. What you choose to do with that data is based on the data and your overall intent.

If you want to stop power creep, you can.
if you want to completely revert power creep, you can.
If you want to alter the rate of power creep, you can.
If you want to re-balance at some other arbitrary level of power, you can.

No one is a slave to the data. You CAN do anything with that data.

There might even be answers to balance/power creep questions we don’t even presently have the requisite insight to ask for lack of this data.

That’s Level 1 of why I guess. knowledge and data is good, because it helps make better decisions.

I believe that there is some power level where collectively the aoe2 civs best complements the rest of the game. weaker and the game isn’t as fun, stronger the game is also not as fun. if civs had the weakest bonuses conceivable the game would be boring. If civs had insane bonuses that bordered on being cheat codes, I think some of that depth would go away. I think there is a reason most people play on relatively standard settings. Sure 256x tech game might be fun once in a while, but i don’t think you’d want to play 1000 games like that. Of all the conceivable permutations of aoe2, there is one where fun and re-playability is maximized.

Level 2 of why. I want the game to be as good as it can possibly be.

Honestly I don’t KNOW where that is. The theoretically optimal civ may be weaker than AoK balance, or 10x stronger than current. Even this sentence assumes the civs are stronger now than they were back in AoK. No one can definitively prove that to be the case. Such a fundamental question of how a game we’ve all played for decades, one that is tweaked until the cows come home to give the best possible game-play, can’t be presently answered.

I personally suspect we’ve surpassed the optimal civ power level, but IDK by how much. That’s just an hypothesis tho, a guess if you will. I could be sooooooo wrong. Honestly I really don’t care about what the answer is. If your deity of choice came from the heavens and told us the optimal civ power level was where the average 2023 civ was winning 20% of games against the theoretically perfect balance, I’d have no issue with that. I don’t have a particular horse in this race. I wouldn’t be surprised if my word choices throughout this topic have been influenced by my assumptions (namely that some nerfs are probably optimal), but fundamentally I don’t propose this so that we can start nerfing more intelligently. this is so we have the data to design the game more intelligently, whatever that specific action may be.

Level 3 of Why. I don’t think the game is at present as good as it could be.

I think we should try to determine what is the power level at which the civs best complement the game then strive for that. I don’t know how you design something with intent w/o some way to measure what you’ve designed and also not know if you’ve succeeded at what you set out to do. Even if you KNEW what power level civs should be, how would you measure it? How would you know all the civs are now at that power level?

Alternatively maybe I’m completely wrong, and instead what would be best for the game is some quantifiable rate (linear, exponential, logarithmic, something else, doesn’t matter) of power creep over time, to keep people engaged. I’m not convinced by this line of reasoning, but this is something you could do. However again how do you quantify if what you’ve designed, your intended power creep, is on point? Again you need some data that measures power creep, not data that measures balance.

Or, or, or, maybe both of those are wrong, and what is actually the best is that power creep is optimal when somehow proportionate to the number of civs. Maybe as you add each civ, it’s harder and harder to differentiate them, so each civ on average needs stronger but more niche bonuses. I’m agnostic towards this line of reasoning. In the limit this would eventually be true. You can’t have aoe2 have 1000 civs at their current power level and all of them feel unique. I believe that presently there is still enough design space that using power creep is not necessary, but each subsequent civ will require more creativity than the last on average. Whatever is the truth there, again how would you know you’ve achieved your civ number to power creep optimal ratio if you can’t measure power creep?

Level 4, I’m not even convinced, where power creep is concerned, we know for sure what would make aoe2 as good as it could be.

Regardless of what is actually the truth, there is something optimal out there, and we definitely, 100% guaranteed, won’t know what it is unless we can first measure power creep, and we won’t know if we’ve arrived at that optimum power creep if we can’t measure it.

Level 5, data would help us answer what optimal power-creep design within aoe2 even is then allow us to measure our progress toward that optimal design. And presently we don’t have the requisite data.

I’ll be honest, IDK if i’m 100% sold on conceptualizing my reasoning in “levels”. Maybe they’re steps, maybe they’re complimentary points. Maybe it’s a cycle. Don’t read too much into the word levels. I had to call them something. Call them pizzas for all I care lol.

If that still leaves things insufficiently clear, I’d ask you to imagine how difficult balancing civs would be if you didn’t have access to civ win rates.

Now further suppose we lived in a reality where it wasn’t even clear to the fanbase what optimal civ balance should be like? (As an aside, I think our fanbase has concluded that every civ having a 50% winrate against every other civ is optimal. I don’t think that is remotely possible especially amongst different maps. We won’t make the game bland and make every civ the exact same, it’s a secondary goal, but it’s a north star toward which we strive.)

Hypothetically, maybe it’s optimal that some civs are worse and some are better so long as no one knows which are which? Maybe civs should be very balanced amongst a region, but are either better or worse against another region? Maybe civs should get cyclically stronger and weaker? Maybe civs that are further along in the timeline should beat older civs. maybe civs from older dlcs should lose to newer ones (this isn’t some sly reference to the PTW claims about the recent DLCs, just a hypothetical). Now imagine you were tasked with re-balancing the game so that it was a good as it could be. How would you proceed? I suspect, what would be very helpful is some good data.

We are very much in a similar place with power creep IMO. I know if I were tasked with “re-powering” civs to make aoe2 as good as it could be, I’d want some good data.

To rephrase as succinctly as possible, I believe average civ “power level” is changing over time but it’s not changing intentionally. I think it’s changing accidentally due to a present inability to measure “power level”. Having an aspect of the game changing unintentionally will eventually produce and adverse result. I’m fine with changes or no changes in civ power level, so long as it is intentional and measurable. This suggestion is to give us some ability to measure power level.

I suspect a fixed and slightly lower power level would be best for the game, but have no issue with another approach so long as it’s supported by the data and the fan base.

Welcome to a preview of marriage. I have no experience with this sort of thing, but based on how my parents act, misunderstandings like this are going to be very common. Get used to it.

I’ll explain why I object to what you’ve written, and you can decide for yourself whether this is about the content or how it’s written.

My first objection is that you used a numerical scale without defining it. The closest you came to a definition was this:

But civs are probably best when around a specific “power level”. For completely arbitrary reasons let’s say that’s 15.

I initially assumed that probably meant that you were defining CivY = 15, but later that turned out not to be the case:

We can’t 100% know if CivY = 15, but we still estimate CivY ≈ 15. CivY will still serve it’s purpose if it’s actually a 13 or a 17, or anything in between.

So your “definition” of your scale is based on a single point, the supposedly ideal power level of a civ, that might not even exist, and even if it does, you have no way of knowing what it is.

But even if that ideal level does exist, and you have some way of identifying what it is, you still haven’t defined a scale – you’ve just defined what 15 means on your scale. So when you say things like

CivA = CivB + 2

or

CivA = 16 and CivB = 14

These statements have no meaning. You haven’t defined – either directly or indirectly – 14, or 16 mean on this scale, or what a difference of 2 means. At best, you have defined what 15 means, and that’s it.

My second objection is that you’re making the assumption that the power level of a civ can be described in a meaningful way by a single number. That’s a huge assumption, and a quick glance at the current winrate data suggests that it is unlikely to be true. For example, Bohemians are currently the number 1 civ on the ladder, so presumably they have the highest power level, or one of the highest. But they have their worst win rate against Bengalis, who overall are one of the lowest ranked civs.

Even if the example above is a quirk caused by having too little data, it’s easy to imagine – for example – a situation in which Civ A usually beats Civ B, Civ B usually beats Civ C, and Civ C usually beats Civ A.

The result is that the vast majority of what you wrote is meaningless. You might try to argue that you were only using the scale figuratively – but, for example, if I interpret what you’ve written figuratively, I don’t know whether

CivA = CivB + 2

means “Civ A is slightly more powerful than Civ B”, or “Civ A is significantly more powerful than Civ B, but the difference is still within an acceptable level”, or “Civ A is significantly more powerful than Civ B, and the difference is not within an acceptable range”, or “Civ A is significantly more powerful than Civ B, but whether the difference is acceptable depends on other factors”, or… I could go on. The point is that your numerical statements are not precise, but also don’t have a clear figurative meaning.

I actually have a third objection, now that I know that you’re a developer and so are presumably quite numerate. There’s a good chance that you’re trying to make your argument look more convincing by making it look like mathematics. It’s not actually mathematics, but that wouldn’t matter to most readers, who probably don’t have much mathematical experience anyway. Obviously I don’t know if this is your actual motive, but it seems very likely to me.

Anyway, that’s me done with this thread. I agree with your fiancee that your explanations are overly long, and I don’t have the time or inclination to read the rest of them. It’s a shame, because I thought the topic of this thread was potentially interesting, but our interactions have ended up universally negative, from both our perspectives.

1 Like