2017-09-18

Turnbull's MonsterMark System

Following on from the discussion of systematized EHD (equivalent hit dice) last week, let's look at a much earlier attempt at the same idea. In the first three issues of White Dwarf magazine, Don Turnbull presented a measurement he called "The Monstermark System". This would be through the summer and fall of 1977, that is, exactly 40 years ago as I write this. (Thanks to Stephen Lewis for the tip-off to these articles!)

In the third article in the series, Turnbull writes:
 Although it has been said by quite a few D&D addicts that the Greyhawk system of experience points, which is based on monsters' hit dice, is too stingy I don't think this is something which can be considered in isolation...  So, circuitously, back to experience points. In my view they are intended to reflect risk. A character gets experience for meleeing with a monster because there is a finite, non-zero, risk that he will be killed or at least suffer wounds which could contribute to his eventual death. He gets experience for gold because he has taken risks to grab it... He should not, however, get experience for finding a magic sword or that seven-spell scroll since these things will assist him in getting experience by other means... Since the whole point of the Monstermark is to measure the risk inherent in tackling a particular monster, experience points should bear a linear relationship to M...
I fully agree with those observations, and my motivation for EHD is exactly the same: to provide a measure of risk, from of which we can support a simple, linear calculation for experience points. We both assume a protagonist fighter with a fixed armor type, shield, and a sword; we both give the fighter one attack per round. Now, the basis of his system is this: for the default fighter, compute the expected amount of damage he would expect to take fighting the monster (assuming the combat never ended from the fighter's death). In this case, the calculation is done by first computing the number of rounds the monster would expect to live (D); and then multiplying that by the expected damage per round (analogous to the DPS -- damage-per-second -- statistics in MOORPGs) for an overall aggression level (A). In the first article, Turnbull presents it like this:


This seems like a solid, undeniably valid base measure of monster risk level. As long as the monster has no special abilities. Which is, as you know, almost none of them. As soon as a monster has special abilities, then Turnbull is forced to step out of the methodical expected-value analysis and revert back to a purely discretionary set of multipliers, hoping to estimate the power of various abilities, to get the final MonsterMark score (M). As he writes, "All this is very subjective and I would be surprised not to meet with different views, but the following bonus relationships seem to give results which instinctively 'feel' right:"


Now, if you take nothing at all but one thing away from this blog, I hope that it's this: these kinds of a la carte scoring systems for game entities are always a lost cause.The inter-relationships of different abilities and powers are too complicated to be encapsulated in such a system; the true acid test can only be made by systematic playtesting (which is very hard).

Consider a few short counterexamples -- A giant rat given magic-to-hit defense is effectively unbeatable by the PCs it normally fights; but a very old red dragon, given the same ability, would have little effect against its high-level opponents (surely wielding magic weapons already). If ghouls have possibly paralyzing attacks, then it makes a huge difference if they have one attack for 1d6 damage, versus three attacks for 1d2 damage (even with nearly the same expected damage). Centipedes and carrion crawlers, with a base damage of zero, even with poison or paralysis, would generate a product that is still zero by this multiplicative system. And so on and so forth.

Nevertheless, Turnbull pushes forward with the tools he has, first presenting a table of basic humanoids without special abilities (of which there's really only a half-dozen), and then separate tables for various other categories of monsters from OD&D, the Greyhawk supplement, and a few magazine articles current at the time. For a few examples of his M scores: orcs get 2.2, ogres 29.9, trolls 158.4, and red dragons 675.5 (by comparison, I give those creatures EHD values, respectively, of 1, 4, 9, and 32; and no, I don't think that going into decimals here is a great idea). Ultimately he recommends giving XP of 10 times his M score, which is generally about double the low Greyhawk XP awards for these sample creatures (whereas I still prefer 100 times the EHD level, in the spirit of Vol-1).

There are 73 monsters for which Turnbull & I both are willing to give measurements. Consider the correlation between our assessments:


That's not very close at all. The data points are scattered all over the place, not close to any regular relationship; knowing one measure only allows you to predict about 50% of the variation in the other measure. On average, Turnbull's Monstermarks are about 20 times what I find for EHD levels, but that doesn't tell us much. He assumes plate armor for fighters whereas I assume chain (for reasons given last week), but that can't explain the low correlation either. Let's look at some specific cases for why this is.

The most obvious problem for Turnbull is this: The Monstermark system cannot handle area effect abilities at all. His model tries to do accounting on the hit points from breath weapons (in the 2nd article), but he steadfastly assumes just a single deathless fighter in melee against a given monster; so, if a red dragon breathes fire, then only damage to that one fighter is accounted. But that doesn't reflect the true risk or utility of area-effect weapons like that; our PCs don't adventure in solitude but in groups of some size. The examples of dragon combat in both OD&D and AD&D show three PCs being incinerated at once from a single breath attack; so the damage/risk multiplier should really be at least several times higher than Turnbull counts. Likewise, petrification weapons get no distinction for delivery by touch or wide-area gaze -- the cockatrice (touch), medusa (gaze), and basilisk (both!) each get an identical 2.5 multiplier for their abilities. This alone probably accounts for a massive skewing in many of his scores, downward from the true risk level. In contrast, my Monster Metrics program runs up to 64 opposition fighters simultaneously against any given monster, and they suffer appropriately from area or gaze weapons.

Some examples where the Monstermarks seem clearly too low:
  • Basilisk (EHD 25, MM 128), with its combined touch-and-gaze petrification, which only gets the same multiplier as a cockatrice does. 
  • Medusa (EHD 13, MM 56), likewise with her area-effect gaze petrification.
  • Carrion Crawler (EHD 14, MM 120); as noted above, the multiplication system from zero damage should come out to zero, so I think he just made this up from whole cloth (note the round number). 
  • Harpy (EHD 9, MM 22), with her mass charm song ability, shouldn't be weaker than an ogre.
Another rather egregious issue is this, although it affects only two creatures: Summoning abilities are entirely left out of the accounting. As noted before, we find these abilities to be among the most potent in the game! But the Monstermark system actually overlooks them entirely, giving no bonus at all for them.
  • Vampire (EHD 39, MM 440), given no summoning abilities.
  • Treant (EHD 33, MM 420), which actually appears in Turnbull's first table of "simple human-type monsters" without any special abilities, and yet its tree-controlling ability allows it to effectively triple its own brute strength. (As an aside, consider a vampires-vs-treants scenario, in which we find two of the most powerful opposition monsters in the game due to their parallel summoning abilities.)
Meanwhile, there are some other monsters with nothing but brute strength that appear too highly scored -- like the Fire Lizard (EHD 14, MM 758), and Hydra with 10 heads (EHD 18, MM 707) -- but I think that this is only an artifact of the special ability monsters being relatively too low. Also, the Mind Flayer's score seems ridiculous (EHD 20, MM 700), granted that he doesn't even note its mind blast power, and was probably again just a raw guess (another suspiciously round number).

Now, there are two other cases that literally jumped off the chart above, such that I felt compelled to remove them as outliers -- and on inspection they are rather obviously in error. These were:
  • Roper (EHD 16, MM 3,750). This is clearly a mistake. Turnbull notes the creature in part 2, p. 15: "These calculations make the Ropers the most fearsome beasts we have met so far; I don't recall ever meeting them down a dungeon, and I devoutly hope I never will." The problem, if I'm reading his attack notation correctly, is that he's applied the Roper's 5d4 damage factor -- which should be just for its mouth -- to every single one of its 6 ranged tentacle attacks. That really would be horrifying! While the Roper is a tough customer, it obviously shouldn't be worth the same as 5 or 6 Red Dragons; that doesn't pass any kind of sanity check.
  • Flesh Golem (EHD 21, MM 1,920). In this case, the problem is that Turnbull shows a radically different AC for the monster than I see in the books: My copy of Sup-I (with correction sheet) gives it AC 9, as does the AD&D Monster Manual. Turnbull shows it has having an AC of -1, which is obviously the diametrical opposite. I'm not sure where he got that from, maybe from a wild guess before the Sup-I correction sheet was available to fill in that statistic? 
There were some other things I had to leave out of the analysis, such as those other golems and elementals that are hit by only +2 or better magic weapons, which have undefined EHD in my model. Turnbull gives medium and large elementals a score of 1,000-2,000, stone golems nearly 13,000, and iron golems just shy of 33,000 (but again their ACs are treated as much harder than in the rulebooks, namely AC -3 and -5, so there are multiple reasons to leave them out of our comparison).

In conclusion, while the motivations are exactly the same, the scores that Turnbull & I come up with a radically different, effectively incommensurable. (If you want the full data, my Monster Database from last week has Turnbull's MonsterMarks entered in hidden column Q.) Of course: while Turnbull's instinct was noble, he didn't have the immense computing power all around us to simulate playtests the way we can today. Now, maybe someone will come back to critique my work in another 40 years -- someone who has access to a complete game engine with all the special abilities, full wizard spell selection, mixed-class PC party simulator, and hard Artificial Intelligence to optimize the best tactical choices on each side -- and in that light my suggestions might look totally naive. We can only hope for such continuity and progress.

24 comments:

  1. I think one of the things I most appreciate about these posts is that you don't have a "I got this all figured out" approach.

    ReplyDelete
    Replies
    1. It would be hard to not take away some humility from all the prior work that other people have done. :-)

      Delete
  2. I commend Turnbull for his attempt in 1977 but it falls short. Of course you have the advantage of computer support to create simulation programs. I rely on your data a lot for the encounters I run in my hosted games. great job, and interesting comparison.

    ReplyDelete
  3. Nitpicker brigade: the expected damage of 3d2 is 4.5, higher than 1d6's 3.5.
    That's all I've got though, I love this post.

    ReplyDelete
  4. The challenge of deciphering the late Mr. Turnbull's equations was a useful challenge for a 12-year-old struggling with mathematics in 1977.

    ReplyDelete
    Replies
    1. Similarly, stuff like this is probably what spawned my math degree. :-)

      Delete
  5. I misread the post title at first as something about a "Monstershark" system and got a little excited, and then a little let down.

    Not that this isn't cool; just that it can't beat a Monstershark system.

    ReplyDelete
    Replies
    1. LOL. Someone needs to get on that. Watch out for monstersharknados.

      Delete
  6. Treants don't have blood and are literally made out of stakes so there would be no competition. I guess the vampires would just fly away! (Ents - the perfect vampire hunters?)

    ReplyDelete
  7. Seeing your EHD research I wonder if the spells didn't need a research to see if their spell level is correct, balanced, whatever you want to call it...

    ReplyDelete
    Replies
    1. Maybe. Hard for all the non-damaging/non-combat options, I would think.

      Delete
  8. Many thanks for all your work on this and for making your monster database available. The monstermark articles were great in their day, but I agree with you on the inevitable shortcomings. A multiplier for special abilities just can't ever work: a (hypothetical) monster with save-or-die poison doing 2HP damage is in reality not twice as dangerous as one with the same stats doing 1HP damage, whatever multiplier you use. The damage is almost incidental for most cases, compared to the chance of death from a failed save, which is the same for both. Worse problems with the lich at the other end of the complexity scale.

    Turnbull also knew this, as you note with other examples, he basically made up a MM for the wight and then revised it in response to a letter in a later issue.

    He also didn't use a factor of 10 for EXP, but used it as an example, coyly not telling us what he used in his famous dungeon. Having said that, it's not too bad.

    Looking back, although I really enjoyed the articles at the time, they were always more of an interesting basis for discussion than a practical way of working out EXP for monsters, which was always small change anyway compared to treasure awards. It's great value to me, was as a proto-Monster Manual. A really convenient listing of monster stats all in a consistent table and all in one (or two) places to look up in a hurry. Your MonsterDatabase file may be its electronic heir!

    ReplyDelete
    Replies
    1. Excellent points! Thanks for the wishes. :-)

      Delete
  9. On the topic of the wight, I notice that they get an EHD=3. (The ghoul gets EHD=4; I agree with that, it's a 2HD killer with Greyhawk stats.)

    I don't know, but wights feel scarier/riskier to me than a normal 3HD monster. It must be the level-drain. I can see that it may not be a decisive factor in fighting a one-off battle, but it is the long-lasting nature of the damage afterwards, I suppose, how you are weakened even after you win. Plus the chance of converting new troops, do the level-drainers do that in your program as well as the summoners?

    Anyway, I'm not sure how to take account of that, but you may have some better thoughts than I do.

    ReplyDelete
    Replies
    1. ..oh, and obviously this all changes if a cleric is present. I know you don't use them, but with them it's all a bit binary: easy win or, if not, just run like mad before it touches you.

      Delete
    2. Those are also good catches. My metric is purely based on the chance of overwhelming the opposition fighters in a single combat. So you're right that "long-term suffering" as from level-drain or mummy rot (there's a post coming specifically on that) may not show up on the radar super well. Even so, my current thinking is that an immediate hors de combat from paralysis does seem worse than getting, say, one-third the way there from level drain. Even if psychologically everyone hates level drain, maybe getting outright killed is worse.

      Also the simulator doesn't handle victims coming back as undead. It's interesting that my intuition has never been that the happens mid-combat -- now I see it's undefined throughout OD&D and AD&D, except for the vampire in the MM (specifies "1 day after the creature is buried"), which I guess is where I developed that expectation. Also if people get charmed (by vampires, et. al.), then they're likewise just out of the combat in the current simulator.

      Good point about the clerics, of course the threat level I show is blind to that issue. As you say, among the list of reasons I dislike them is how they pretty much shut down the scariness of undead.

      (Side note: Consider that my OD&D ghouls only get 1 attack per round, the 3-attack ghoul in Greyhawk/AD&D is even worse than that!)

      Delete
    3. Also: Rust Monsters (just ran into this as an issue today).

      Delete
  10. There's a simple way to get around those pesky abilities and magic that seems to skrw our expectations of how dangerous a monster should be...

    Award XP for dealing and receiving damage in combat.

    Boom. Problem solved.

    ReplyDelete
    Replies
    1. I think that's just the same mistake Turnbull made as a basis for his system? (Aside from the accounting nightmare.)

      That would seem to encourage people going out and slaughtering big, less-dangerous things like herd animals, and being negligibly awarded for perilous low-hp monsters like cockatrice. Etc.

      Delete
    2. As written, perhaps. But there are other tweaks that can be made. For example, bump up the hit points of all monsters. Not a lot; a small boost will have a significant impact; the result being that a brownie or leprechaun, which have spellcaster levels, will have a small number of hits points but it won't be, like, 2hp.

      The "accounting nightmare" isn't all that much more difficult than anything else we do with this game. Especially if you put it in the players' hands. You want XP? Keep track of your damage during combat.

      There is something about the save or die effects of some creatures; but in the case of a cockatrice, even if we adjust its hit points up a little bit, we need to consider the creature's role in the world. It's more of a nuisance. It's dangerous and it will wreak havoc on a local ecosystem, but unless the PCs are traveling through total wilderness, it's fair to say that the local inhabitants have already rid themselves of the beast.

      In other words, I think the argument that we'd be encouraging people to go out and slaughter less-dangerous creatures is spurious if we examine more closely what PCs are actually going to pursue in the course of a session.

      Delete