Thursday, August 28, 2014

Arena – Man vs. Monster

Adding Monsters to the Simulated Arena of Battling D&D Fighters, So As to Assess Average Advancement and Ability Scores at High Levels


Previously, I presented a simulation of a D&D gladiatorial "Arena", where we generated a random population of 10,000 original D&D fighters and paired them in battles to the death for a few generations, to see what kind of level, hit point, and ability distributions would arise (link).

Note that this gave rise to a fairly predictable system: Once one fighter graduated from 1st to 2nd level, almost all of their later fights would be against 1st level opponents (96% or more of the population), whom they would almost surely defeat and gain XP from (generally 10 xp × 20 prize factor = 200 xp), reliably increasing in level from that point on. This gave rise to a fairly smooth set of numbers in level and ability distributions over 1,000 cycles of combat (what we might guess as 250 years of arena competitions).

One limitation of that simulation was obviously that normal D&D does not predominantly battle PCs versus NPCs, but rather a regime of mostly nonhuman monsters. It would take additional programming work to add that capacity to the simulation.

New Method

At this point I've added modules to the Java program for the simulation to allow input of a variety of basic monster types and determination tables from CSV text tables. (As a side-benefit, this also allowed use of more exact class XP tables, which were previously approximated by a simple formula.) Now, each combat consists of a given gladiator (fighter) pairing off against a monster of a random level as determined by the tables in OD&D Vol-3, p. 10 (see below). The level of the NPC fighter is used for "Level Beneath the Surface", that is, as though the character were exploring a dungeon equal to their own level. Only the simplest melee-type monsters are included from the following tables: Kobolds, Goblins, Skeletons... and so on up to Superheroes, Lords, and Giants. The program doesn't currently handle any special abilities, so the gladiators do not face off against any monsters with poison, paralysis, petrification, energy drain, breath weapons, spell casting, etc. (For a listing of the specific monsters and game statistics used, see the file Monsters.CSV, linked at the bottom of this post). In order to better resemble standard D&D play, I changed the fighters' presumed armor from chain & shield to non-magical plate & shield (the exact type didn't matter when all opponents are equal in this way, but now it certainly does against predefined higher-level monsters).


Here are some of the chief lessons from this exercise. (1) There is enormously more variation in the system: a 2nd level fighter can be paired off against anything from a Zombie to a Lord or a Giant. (2) The wandering monster tables in Vol-3 are far too dangerous for this exercise as written, and almost no one can advance beyond 3rd level with this system; for example, when one encounters a Lord or Giant at 2nd or 3rd level, that fighter is almost surely destroyed. (3) Experience awards are many times more variable; for example, killing a Giant gains about 5400 xp including prize award (compare to the predictable 200 xp from most fights in the old system). If a 2nd level fighter does manage to accomplish this (through a set of cosmically fortunate rolls, or abnormally low monster hit points, say), then that would be enough to jump over two levels immediately (if it were not capped at a one-level jump). This, then returns us to point #1: there is enormously more variation in the system.

So when I started this simulation at the old parameters (10,000 population, 2,000 cycles), the most common thing was for no one in the entire population to be above 3rd level at the end. But sometimes, there would be one lucky star who managed to graduate past the danger area, and then continued cruising to 10th, 20th, or even 45th level! Of course, anyone in that situation is clearly an outlier that can't tell us anything about overall distributions or ability averages.

Clearly I needed a much larger population (to get better data about higher levels), and a smaller cycle length (to prevent the lucky few from shooting off the end of the scale and becoming deities). The simulation runs below are thus done for a population of 100,000 and only 200 cycles (what we might guess as 50 years of real-life gladiator combat). Here are the results of that, confronting the "normal" random monster tables in OD&D:

As you can see, extreme violence is inherent in the system. Of the 100,000 population, only about 2% have survived to 2nd level, and less then a half-percent have made it to 3rd level or above. The numbers are flat and single digits from 9th-15th level, so that surely looks like random noise to me, and the ability score averages swing up and down without any pattern at that point (the NPCs got there through dumb luck, not any particular ability advantage). So this doesn't tell us much, and I don't think that we expect D&D character levels to be so intensely constrained as all that.

However, before I go on, I must call out the one figure who achieved 16th level in that particular simulation: What a character! Strength 17, Constitution 16, hit points 92, he's clearly the beefiest fighter in the list. And also: Intelligence 5, Wisdom 6, a drooling barely-aware moron (IQ 50?). Charisma 10, so not a completely dislikable sort. Even with those physical abilities, I'd say he could only get to this point through a whole gauntlet of insane (really, really dumb) luck along the way. What if the most powerful NPC fighter in your campaign world was this same, blessed-by-the-gods, illiterate mass of muscle? Call him "Arnold" or "Jean-Claude" or "Groo" if you like.

Anyway, this excursion into nonstop brutality against most of our gladiators is not exactly what I wanted. So I went looking for a simple way to re-interpret use of those tables (have solo fighters play against a lower dungeon level, lower monster level, etc.). The easiest way seemed to be this: simply subtract 1 or more pips from the d6 roll on the monster level determination table. In fact, that's what I already have noted for my games, so it seemed like an obvious choice.

As it turns out, subtracting 1 from the d6 die makes a fairly small difference; achieved levels might increase by about one, is all (perhaps 2nd level fighters get stomped by a Superhero or a Minotaur instead). Even subtracting 2 is not a lot different. Here's what happens when I subtract 3 from that initial die-roll (i.e., limit results 1-3 only): 

At this point, I think you at least start have something that looks like a potentially legitimate D&D fighter population: about 80,000 1st-level fighters, 14,000 (about one-sixth) 2nd level, a third of the 3rd level, a third of that 4th level, and then diminishing reductions after that. There is a clearly increasing pattern to the favored abilities (Str, Dex, Con) that we can use to gauge proper values for new PCs or NPCs -- advancing a bit more slowly than in the former Man-vs-Man case, which makes sense because the gladiators are not contending directly with each other (in which case those abilities are the only distinguishing factor), whereas now the overall luck in monster draw is more telling.

The other thing I like about this is that, very broadly, it replicates the figures stipulated by Gygax in OD&D Vol-2 for proportion of a group of men that are higher-level leaders (see: Bandits, p. 5, and back-referenced in other places). Consider the following comparison:

While not perfect, the numbers are about the right order of magnitude. (And they look better if we add in the simulation numbers at 3rd level and 7th level to complete the picture.) It suggests that this is a place where we can choose to throw our anchor for the population distribution, before assessing ability scores achieved by the selection method. (Note that the more we modify the die roll, the greater the advancement proportions become. If we subtract 4 from the level die-roll, numbers at higher levels increase, and form a better match for the AD&D numbers where Gygax somewhat inflated leader proportions. For brevity, I'll omit showing that comparison here.)

Before I conclude, I'll make a few points about the need to soften those wandering monster tables. I've long seen the need for that already in my own games (link1, link2), and the same was indeed carried out by all later writers, including Holmes, Moldvay, and Gygax himself in AD&D (arguably over-compensating in the DMG). Even though some writers have expressed positive views of how tough they are (see the comments under link2 above).

But I must emphasize how incredibly charitable I'm being in all my interpretations towards our NPC gladiators, even in the original simulation that massacres almost every one with fail. There are no monsters included here that have poison, paralysis, petrification, breath weapon, hit by magic, etc. (any one of which could destroy a lone fighter of practically any level). I've very gently interpreted monster statistics at every turn (see Monsters.CSV below); for example, all the giant animals are given just a single attack for 1d6 damage, white apes are given just 2 attacks (when it could justifiably be 4), NPC fighters are given no bonuses whatsoever for abilities, feats, or magic items, etc. Hit dice are all still 6-sided as per the Original D&D boxed set. There is no penalty to XP for battling creatures under your own level (which arguably would be worse for solo fighters, as opposed to parties that can gang up on one monster of like level for full XP). And yet for all that, the system is practically a sure-fire-killer unless we soften the initial roll for monster level (or the like) in each pairing.


What do you think of that? Is it what you expected when we switched from man-vs-man to man-vs-monster? Can you think of any more justifiable way of determining random pairings of D&D fighting gladiators versus monsters? Is it helpful for the game?

Want the data files and Java code used for the simulation? See here:


  1. This and the previous are both wonderful mental exercises. I love using data to draw conclusions about the structure or composition of a game world. However, I can't help but wonder if there's something missing from the simulation.

    Pit a PC fighter against an NPC fighter in an arena and the fight will play out like the simulation suggests. Add in some terrain or obstacles, and the dynamics change. Give one side a motivation other than killing the opponent outright... or maybe the PC has a magic weapon... or maybe one of them is ill, suffering from filth fever...

    In other words, there's too many variables. You removed monsters wil special abilities because you can't accurately reduce their combat significance to equations. The same applies to intelligence and creativity. NPCs are people just as the PCs are - in the context of the game world. They're capable of doing things that the simulation can't account for. Yet the very possibility - of breaking the "rules" of the simulation - means that the final distribution might be more or less than the numbers indicate.

    Is there a way to add those variables to the simulation? I'd like to think that there is...

    1. Well, almost all of that is true. The actual reason I didn't include special abilities is due to limited time for programming development. That's always a balancing issue in any statistical study: the value of the partial information obtained, vs. the cost in time and resources of carrying out the study. So we could certainly add all those other things, at increasing expense, and still always be in a situation where more could be done.

      In summary: Possibly diminishing returns.

    2. ... Which is not to say that if there's a great interest I wouldn't add more details to the simulation.

  2. Ah! The distress of abridgements for brevity. At least, are the quantities of the succeeding levels of the "-4" results also about a third of the previous level, or are they around half?

    This does show more or less what I'd expected to see. I took a brief glance and noticed that if the dungeon level is used at half (round up) of the fighter's level, that might approximate the reduction in the d6 roll. Is there an easy way to test that variation (my coding skills are not great, and nonexistent in Java)?

    Ozymandias: I expect that the assumption in play here is that cleverness and guile work to each party's advantage, on average, and so cancel out for the purposes of statistical exercises like this one.

    1. Great suggestions, those are excellent ideas! With the -4 modifier, the first couple steps are about 1/3, then a couple at 1/2, and then the relation continues to diminish after that.

      When I run it with dungeon level = 1/2 fighter level (round down, no modifier to the roll), there are actually many more fighters stuck at 1st level (94K out of 100K), 3K at 2nd level, and then steps of about 1/2 after that. I'm guessing that's because no one gets the big lucky XP boost out of 1st level? And the number of leaders wind up much lower than the Vol-3 bandit specifications (about 1/4 what's predicted there).

      See this PDF, esp. pages 3-5. Thanks for the suggestions!

    2. If you're rounding down, what chart are the 1st levels rolling on? I picked "round up" so that they'd get to roll on the dungeon level 1 chart. Basically, the progression I imagined goes levels 1-2 roll on dungeon level 1 (up to monster chart 4), 3-4 on dungeon level 2, 5-6 on dungeon level 3, 7-10 on dungeon level 4-5, and so on.

      Hm, if 1st levels are rolling on dungeon level 1, maybe having to face up to monster chart 4 might have a deleterious effect. This does bring to mind, though, that most characters explore dungeons in parties. Perhaps, then, the reduction of the d6 roll might be the best approximation of that unless and until it could be modified for parties encountering monsters on expeditions (yeah, this is starting to get elaborate, and possibly not worth the effort when simply reducing the d6 roll can approximate encountering critters in parties, both by reducing the strength of the critter per fighter and the experience awarded similarly).

      On the whole, after consideration, I am thinking that your method of subtracting 3 or 4 from the monster dice result is the best approximation that suits most or all of the important variables. So, barring further complicated coding that will likely only verify what you already have, I think that you've probably reached a reasonable approximation of the results, confirming for the most part Gygax's (and Arneson's?) intuitive approximations. Go figure that an insurance adjustor would manage that!

    3. Good ideas. I let the program round down (truncate) just because that's what computer division does by default, with a cap to use the 1st row at a minimum.

      Here's what happens if you round up instead (levels 1-2 roll on dungeon level 1, etc.): It actually gets even more top-heavy (fewer high-levels), because the 2nd level fighters are that much further away from the lucky big XP scores.

      But this does suggest other ways of modifying the "dungeon level" input. Like, I'm fond of 3E's formula for modifying difficulty by -/+2 per half-or-doubling of party size, which I've found to be surprisingly robust in the past. So if average party size was 4 then a solo fighter would be at -4 difficulty levels, or if average party is 8 then -6 difficulty. When I run this subtraction method through simulation, the model does seem best at around the -4 subtraction level, where the numbers do tend to decrease by about half per level. (Although I think the -3 of the d6 roll is a somewhat better model.)

      I've updated the link above with these runs, see pages 5-14. Also I added a "ratio" column to make it a bit more obvious where the steps are decreasing by a half or a third or whatever.

    4. That's excellent stuff. With those ideas in mind, then it does seem, again, as though -3 or -4 works best as a simulation. What it comes down to is that a certain percentage (20% or 40%) of level 1s will make it to level 2, and then some fraction make it to each higher level. It might be worth running the simulation a number of times (I am thinking on the order of 500 separate runs should do it, since that would be a total of 100,000 cycles to match the size of the world of fighters, which should almost completely eliminate random fluctuations; actually, it should work with fewer repetitions of the experiment, but I'm not sure of exactly how many - any statisticians want to weigh in?) and averaging the results of the various ratios per level.

    5. Well, of course, I'm a statistics professor myself. :-) At the 95.44% confidence level, the maximal margin of error for a proportion like this is E = 1/sqrt(n) = 1/sqrt(100,000) = 0.3%, or about +/-300 men in any category (or less if we use the actual sample proportion in any one category, but this serves as an upper bound). Statistically there's no need for separate runs, just crank up the sample size to wherever you're happy with the margin-of-error.

      Although I think I misinterpreted the results of the half-level-round-up method above; it's a bit more top-heavy because the 3rd level fighters are now running into 5th-level monsters (minotaurs and superheroes) and getting cut down at that point. But the difference from round-down is fairly small.

    6. Ah, very good, and why didn't I remember that? :) Now the little bit of statistics I do know is coming back to me.

      OK, ±300 is a pretty huge margin, especially at the higher levels there, where sample sizes are around 200 (or less!), but wouldn't we calculate the margins on the order of 1/sqrt(200) for those? That'd be an error margin of around 7%, or ±14 or so up at the higher levels (10-12) under the -3 assumption. That's an assumed range of 186-214 (matching the observed numbers, close enough), which leaves a ratio range for levels 11 and 12 of about ±0.15. For myself, I'd like to see that range narrowed a bit. If the sample size were increased 10x, then that should result in a variation of ±2.24% or ±44.7, giving a range of 1955.3-2044.7, or a ratio range of ±0.046 or so. Since, I think, the ratios are very nearly the most important result of this (since they can be plugged into a spreadsheet to give results of value for a given setting, for fighting men at least).

      Now, what will be the real project for the future is to figure what variables affect the results, so that we can build a model (a Monte Carlo simulation being out of the question, as we've previously discussed) that might include spellcasters and perhaps other classes.

      Hm. Thieves could perhaps be simulated by giving them a chance to avoid combats based on their abilities and thus gain experience from the imputed treasure without fighting. Which brings up the possibility of such things occurring for fighting men, now that I think on it. Not that I can think right away of how to simulate that statistically. Sorry to open another can of worms! D&D is not a simple game, which is perhaps why we love it so.

    7. Well, how you'd really compute the margin of error is by E = z(alpha/2)*sqrt(p^(1-P^)/n). At the 95% confidence level, z(alpha/2) = 1.96, p^ is the sample proportion, n is the sample size (100,000 here). For example, in a sample bin of bin of 200, it works out to E = 0.0003 = 0.03%, or about +/-30 men out of 100,000.

      So I did a run of the d6-3 model with 1 million men involved, and linked below you can find the results. Includes the margin of error and 95% C.I. for actual population range based on this Monte Carlo simulation. (e.g., it's 95% likely that the number of 12th level fighters per million is somewhere between 1690 and 1854, etc.)

    8. That does seem to clear up most of the difficulties, except that annoying anomaly in CON at levels 10-11. Judging by the previous results, it seems that both should be 15. That's why there are meta-analyses!

      I should add that I'm actually somewhat surprised that there is such an uneven result in the numbers of characters at each level. I'd expected quite a bit more smoothness in the result, even if not perfectly smooth. I wonder if the AD&D optional rule increasing "to hit" chances by 1 per level instead of 2 per 2 levels would smooth things out at all. I doubt it, since the general tendency of the increase is still the same. I forget if D&D includes the multiple attacks vs. low hit die monsters rule that AD&D has in the alternative combat system. I know that it was assumed in the Chainmail system.

      I'm sorry to keep bringing up issues. I hope that they improve the simulation, though I am not actually sure that it can be improved much further.

    9. Yeah, the simulation already includes the +1 per level increment to-hit. It doesn't include any extra attacks for these fighters.

      My guess is around level 10-11 then mean Constitution is right around 14.5, and one or the other in this run happened to come out a shade higher or lower than the population by standard sampling error. (I'm almost surprised there aren't more of those visible.)

      If you look at the stepwise ratios in graph form, it becomes evident that they're following a linear upward trend; i.e., the chance of continued survival increases with advancing level (which is sensible). I wouldn't expect it to be perfectly smooth simply due to the coarseness of the d6-based monster level table (and arbitrary power of monster on each level chart).

      See the last link which I updated with a chart for the stepwise ratios. Thanks for the questions!

    10. Wait, where am I? Oh, sorry, I forgot for a minute! This is Delta D&D, which has its own precise methods. The numbers should generally hold for other D&Ds, of course.

      I see that there is a general upward trend in the ratios, but I am trying to account for the fluctuations. With a sample size that large, it seems as though it should be smoother, rather than jumping around like it does (why 91% at 7th level? Or, for that matter, 50% at 6th?) Then I notice that the trend sort of plateaus from 7th through 12th level, jumping around 89%±12%. I want to understand why it does that. (I am currently assuming that the sudden drop at 13th level is due to other factors related to the simulation parameters. Does the simulation allow for higher levels and none of the SimFighters make it, or is that a hard cutoff?) I figure that, by understanding why the simulation gives the sort of results it does for Fighting Men, we can work out roughly what the other classes should look like, even if we can't simulate them for various reasons, taking into account their varying experience point charts and abilities.

      No problem! As you can probably tell, this subject interests me considerably.

    11. Looking at the -3 charts for the 100K and 1M runs, I see the following correlations:

      Level 100K 1M
      2 18% 17%
      3 31% 30%
      4 29% 28%
      5 69% 71%
      6 52% 50%
      7 88% 91%
      8 85% 83%
      9 68% 77%
      10 86% 85%
      11 101% 101%
      12 101% 87%
      13 3% 4%

      So, I think I am most concerned about the big variations at 9th and 12th levels, but more importantly why the numbers go up and down so reliably. That isn't simple variation due to sampling error, that's consistency (except for 9th and 12th levels, which I expect are due to sampling error, which might be smoothed away by further runs to increase the sample size).

    12. Glad it caught your interest! The simulation does handle arbitrarily high level advancement (e.g., the run at a normal d6 roll produced some 16th-level fighters). Generally what happens is some fighters survive to higher levels, and can then reliably cruise along earning steady XP; the limit is however many cycles it runs (currently 200 combats). If we let it run more they'd achieve any level you like (e.g., the initial 2000-cycle run that produced 45th level fighters).

      Notice that if you compare those step ratios to the "monster determination" chart, they sort of track each other. Level 6 is conspicuous because it's the only tier that suddenly opens up a new monster level on 2 die pips (with -3 modifier max is 3; and either 2 or 3 reveals a level-5 monster, impossible at lower levels); so this sudden upsurge in difficulty would be expected to cut down the survival rate to a low level. Level 7 is the same tier, so once you survive 6th level, you won't see any new threats at 7th, and survival swings back up. So that coarseness in the encounters gets reflected here. Also effects of XP awards and increasing requirements play a role.

    13. Yeah, going over it, I noticed some of that. That doesn't explain the 9th level drop (which occurs, though we don't yet have a clear idea of how much), though, and only partially explains why 11th and possibly 12th increases slightly over the previous level. The deep dive at 13 I imagine has to do with the maximum number of encounters that can occur and therefore the maximum amount of XP.

      It might be interesting to know the average "age" of characters at the various levels, by counting the number of encounters they engage in. Which brings up the idea of giving different SimFighters different numbers of encounters per year, and letting them go through a typical adventuring career of however many years, but I'm not sure that there's a simple way to do that, and even if there was a simple method to set the length of a "typical adventuring career", or to determine the distribution of average number of encounters per year.

    14. I agree that "aging out" characters after a particular time is one of the things I think would be interesting to add to the simulation, if I get time and there's other people interested. I'm prone to possibly using what Gygax stipulated in the old Conan article; not simply death or minor ability-score losses, but steady loss of levels with advancing age.

  3. Awesome, I love these tests.

    It would be interesting to track what the lowest level fighter was to kill each of the monsters. Does a 1st level guy ever kill what ever the toughest creature it can face?

    1. I'm pretty sure the answer to that is definitely "yes". At some point in a long run someone runs into an 8 hit point giant and can kill it with one sword blow, for example.

  4. Great stuff, this.

    Most dungeon-diving isn't performed solo, though. (As you allude to above...) Usually you gang-up.

    So I wonder if there isn't a sliding scale that relates (the presumably inverse-relation of) the number of adventurers to the number of pips subtracted.
    ... sort of a Lanchester's Laws for Dungeon Diving.

    1. Good point. See above for discussion under faoladh's post: I have found in the past that the 3E formula +/-2 challenge levels for each double-or-half of party numbers works surprisingly well. So that argues for run a fighter level N at dungeon level N-4 or something, which actually does make for a halfway decent model.

  5. I'd be interested in seeing how the results change once the effects of Charisma are included. That'd give the fighter a party to properly tackle dungeon levels equal to his level, plus we could see how Charisma differs by level

    Did you vary the number of monsters based on the level on which they're encountered? I'd imagine that would impact the results. If you add Charisma later, remember that larger parties attract more monsters

    1. The simulation is currently always one fighter versus one monster. If we were to simulate larger parties, but those parties fight more monsters, then it seems like that would just cancel out and effectively have one fighter per monster anyway.

    2. I think that, in general, the modifier to the d6 roll simulates the matter of multiple party members and multiple monsters.

    3. The additional monsters would keep the larger party size in check, but not completely negate its effect. Actually, the additional fighters may become a hindrance at higher levels, since they won't progress as quickly as the leader and replacements would be at a lower level than survivors

  6. Slightly off topic, but mathematical, so hopefully you'll appreciate it.

    The often used mapping of Int onto IQ by multiplying by 10 does not match how IQ works. In IQ, the mean is 100, and the standard deviation is 15. The StdDev on 3d6 is approximately 3, so a much better mapping is to peg Int 10 at IQ 100, and then adjust up and down by 5 for every 1 Int.

    This means that your "drooling, barely aware" fighter actually has an IQ of 75, and a (wisdom quotient?) of 80. Not great, but hardly crippling (although as a fencer I'd say Int and Wis are probably far more useful, and strength is really only good for hacking through the thick hide or shell of a particularly tough enemy).

    Interestingly, this also means that the Int ranges of up to about 30 as seen in 3e much better resemble IQs seen in the real world.

    1. Thanks for posting! On that particular assessment I actually come down on the other side of the issue, as I've written about in the past.

      To summarize the main points: (1) multiple D&D rulebooks explicitly state that Int × 10 = IQ; (2) there's no need for the fantasy population to have the same distribution (stdev) as the real world; (3) you otherwise miss several categories of real-world IQ deficiency; (4) you have problems converting D&D animal Ints of 1 or 2 to comparatively high IQs; (5) the classic D&D max Int of 18 likewise misses several categories of real-world high IQs.