Overview of the Diplomacy scoring conundrum

Diplomacy is one of the most played and researched of modern designer boardgames. Regardless, many interesting theoretical issues remain. One I’ve been occupying myself with is scoring games – or more generally, evaluating player performance. I have some vague notion that this’ll be useful when we have tournaments here in Finland, but mostly I just find this issue an interesting theoretical problem. It’s so challenging, in fact, that I don’t have any ready-made answers – I can formulate the question, but I don’t have a perfect response.

Formulating the question

The general, vague form of my question is this: when you’ve played a game of Diplomacy, who won and who lost, and can anything more be said of the player performance? Furthermore, if we have a number of players participating in a tournament, how much and what sort of play do we need to find out the best player of the bunch – and ideally, the ranking of the rest as well?

To be more specific, we have a number of interconnected questions here:

  1. The rules of Calhamer Diplomacy state that the game can end in either one player winning and everybody else losing, or with a consensual tie amongst the still surviving players at any point of the game. Is this ideal? Can anything more be said of the player performance? Is a victory more desirable outcome than a tie, is a scarcer tie (more losers, less tied players) more desirable than a wider one, is getting killed late more desirable than early?
  2. Diplomacy is rarely played to the end due to the length of the game. If we have to end the game early, can anything be said of the player performance? Given that we want to have tournament environments and we have to be able to compete with unfinished games, how do we move from the perfect Calhamer arrangement into a compromise solution that allows us to score player performance without actually resolving the game?
  3. Given answers to questions 1 & 2, assuming that we are playing a tournament with several rounds of Diplomacy play, how do we combine the results into an overall one? Do we give numerical scores to individual games and then manipulate those numbers? How many games are needed to find a substantial winner out of N players? How many games are needed to rank all players in some meaningful manner?
  4. In addition to these basic questions I am myself concerned with the following generalized issues:

  5. Given that we are able to run tournaments where players play a preset number of games of Diplomacy (question number 3 sorted out, essentially), can we generalize this into an environment where each player plays a variable amount? If one player participates in one game while another plays two, how do we stack these performances in relation to each other?
  6. Given that we have figured out question 3, can we generalize this into a system wherein the tournament consists of a number of Diplomacy scenarios (in the sense of this post) instead of sequential plays of the Calhamer scenario? What if these scenarios include different numbers of centres or players or other variables?
  7. Finally: given answers to questions 4 & 5, what would an universal scoring system look like? Such a system would need to take as input the performance results of N players playing a variable number of games in different groupings. The system wouldn’t need to always provide perfect results, as the input might be incomplete, but we would have to be able to know how to achieve such results with further input.

Despite my language here I don’t want to suggest that the answers to these questions necessarily flow top-down – in fact, it seems probable to me that the last question, if it can be answered meaningfully at all, would need to be considered with each choice on the lower steps. Perhaps these questions are best considered as each internalizing the question before it as a special case – the special cases can be answered in different ways, but as we require more general solutions the choices valid for lower steps of the pyramid break down.

Now, some thinking on these matters has certainly gone down before. Especially questions 1-3 are very practical concerns for Diplomacy tournaments every week all over the globe. As far as I know nobody else is concerned with questions 4-6, but luckily I don’t need to account for my Diplomacy time to anybody else. Before going into my own musings on those generalized questions and playing variants in a tournament, let’s look into what has been said of the first three questions:

1: What should one try to achieve in Diplomacy?

My inspiration for writing this post was that I recently reread Objectives Other Than Winning, a 1974 piece by Allan Calhamer himself. Calhamer writes about the recent practice among a certain subset of Diplomacy players, this being the tendency of the players to satisfy themselves with “second place” achievement in the game on the basis of centre count. I find myself in full agreement on this topic with Calhamer, his argumentation is very cogent: Diplomacy, when played to the finish, only recognizes a winner or a draw. There is no “second place”.

However, I also fully understand why crediting other goals has become commonplace among the players of the game: Diplomacy is a hard game, long and exhausting. It is psychologically easy to judge a player’s performance in the game in different ways even when the rules have no criteria for it. After playing a whole night it might be somewhat dissatisfying that only one player gets official recognition for his play – the game designer can scream until he’s blue in the face, but that does not take away the second place finisher’s satisfaction in at least having beaten the other five players at the table.

Personally, though, I condemn recognizing any results apart from a win or a draw in a Diplomacy game played to the finish – allowing a player to intentionally play for the second place (based on centre count or whatever) without strict censure breaks the game, as it becomes trivial for the leading player to promise the second position to his most dangerous enemy. He will even keep such a promise, as it detracts not at all from his own success.

If players desire to differentiate between performances a bit more, then I suggest looking at elimination dates; the structure of the game is such that I find no problem in claiming that a player who got eliminated earlier played a “worse” game than one who survived longer in the game. Surviving on the board should be a prime concern for all players anyway, as you can’t win after you’ve become eliminated.

Another viewpoint is that Diplomacy is, despite its wargame stylings, a semi-cooperative game. Each draw result, which are actually rather common in a well-played game, is a cooperative victory for the players participating in it. Players even have a chance to “improve” the victory by eliminating non-crucial participants from the draw to sharpen it – this might not matter for a single game in isolation, but when comparing results between several games (to which I’ll come soon), it’s clear that a smaller draw is a stronger result.

2: How to evaluate an unfinished game?

OK, so I don’t see much ambiguity in question one, Calhamer’s right all the way. However, #2 is a much, much more complex beast. When time constraints force us to cut a game of Diplomacy short, as happens most of the time in real life, what can be said of the performances of the players?

I should address the most importan thing immediately, and that is centre-count: practically all modern tournament scoring systems count centres on the board to find out which player did better and which did worse after a set number of turns of play. After having judged several tournaments under these sorts of systems I find this scoring playable but ultimately unsatisfactory. There are two issues with it:

  1. Centre count does not reflect the goals of the game perfectly. Thus it influences the way the game is played.
  2. An arbitrary cut-off for the game influences the play in major ways at the end – so much so that I’m tempted to consider tournament Diplomacy with its last year mad centre grab a variant rules-set for the game, and not necessarily one that benefits it.

The problem here is that if we want to be absolutely faithful to the logic of the game, then an unfinished game provides us with no data on player performance – after all, if the game had continued, any player not eliminated could have gone on to win it. From this viewpoint the only way to use unfinished games as data points is to only count eliminations of players and play however many games it takes to rank the players on the basis of who gets eliminated. Not only would this be prohibitively slow, but it would also strongly favour Powers positioned to avoid early fall – and, of course, it’s clear that the game’s purpose is hardly served if the only concern of each player is survival, not dominance.

Centre count is deservedly the dominant form of scoring unfinished games, considering that centres are something like 80% of the success in high-level games. It might be that an approximate solution is inherently the only possibility in this matter. Almost the only other solution that even comes to mind for me is to use judges to score the game – a judge could take a glance at the board and tell the players who won, who became second, etc. all based on relative strategic positions. Almost always this’d produce the same results as centre count, but philosophically it’s quite different.

The fundamental problem with centre count calculation is that the purpose of skill in Diplomacy is to balance tactical gains with diplomatic losses. Thus a typical mid-game position has a dominant Power being resisted by weaker Powers around it. When the game is frozen and centre-count executed, the materially dominant Power benefits from having its strengths considered, while players with a more laid-back and careful play suffer; this might come as a surprise for players who only play time-limited games, but on full games of Diplomacy a grab of material must by necessity be balanced by the diplomatic considerations.

From the viewpoint of diplomatic balancing, one could well consider compensating limited games with some sort of inbuilt subgame that kicks in near the end of the game and allows the players to force the game to reflect their overriding diplomatic concerns around the time the game is frozen and scores tallied. A sort of peace conference or congress, I’d imagine it – perhaps the players could form voting blocks to deduct points from their enemies, to get simplistic about it, votes based on centre count… or players might get the opportunity to form a progressive series of “unbreakable” alliances during the last couple of years in the game to reflect the organic concerns they have when the game finally ends. Something like that might be worthwhile to explore, although it might also be too complex to justify itself in a pure Diplomacy tournament.

However individual, unfinished games are scored, it seems that such a scoring could only gain validation by staying within the spirit and goals of the full game.

3: Building tournament systems

Question #2 is really the stumbling block in these matters, but I want to look into one extra possibility: almost all Diplomacy scoring systems provide us with numerical scores for each game played, and for good reason – it’s much easier to compare games, combine the results of several games and thus produce more data to figure out player rankings when you the results are numerical. However, what if they weren’t?

In principle we could play elimination tournaments: instead of scoring a couple or three games and seeing who played best overall, have each game drop players from the tournament until you only have enough for a final table, then play that and see who gets dropped and who doesn’t. Elimination tournament, when played with complete games or near so, could be extremely Calhamer-faithful; one could decide that all players participating in a draw continue in the tournament, for instance, while any elimination drops a player. The final round could give us a bunch of winners if it ended in a draw: anybody still standing at that point would be an equal victor of the slaughter that the tournament metaphorically became. The weakness of this set-up is, of course, that elimination is a slow business that leaves the eliminated players with less interest in the tournament and play after their defeat.

In principle I am very much in favour of having a top table in tournaments – this is basic procedure in most of Europe, but I understand Americans just compute scores to find out an overall winner for the tournament. In principle I find it more satisfying to put the players who compete for the top positions up against each other, though. Another mathematically pretty favourable practice around here is a 2/3-tournament format wherein there are three initial rounds, after which the best two results of each player from those rounds determine which players get to the top table, which is played as a fourth round.

Ultimately, though, the tournament system is not nearly as tricky an issue as the scoring system. One depends on the other.

An effort at universal scoring

Now, getting back to my own concern, scoring arbitrary length variant scenario tournaments: most existing Diplomacy scoring systems depend on centre count, which makes them largely useless from the viewpoint of variant scenarios: different numbers of centres and potentially different dynamics in gaining and losing them make it difficult to compare results. Different numbers of players open up the question of challenge: it is more difficult to win an 8-player game than it is a 5-player game, but how much so?

As a rough sketch, here are some basic ideas for scoring universal Diplomacy:

  • A game performance is more definitive and valuable as ranking information when it is more complete (that is, played closer to the end), played with more players, eliminated more players and survived longer in the game.
  • When counting victory points in our universal tournament, we can determine that the definite solo victory of a Diplomacy scenario for N players is simply worth N points. This is intuitively obvious and simple, a victory is always more definite when it is achieved against more players, assuming that all of those players have equal opportunity for victory. In a full-length game of high-level Diplomacy this is pretty much always the case due to the self-balancing nature of the game, so we don’t need to know anything more than the number of players that participated in a game.
  • More intricately, we can determine that a K-way draw (considering a solo victory as a 1-way draw) allows all draw members the same number of victory points. I have two simple notions here: we could give each player N/K points, thus splitting the solo victory from above into equal parts for each participant of the draw. Or we could determine that the value of the draw is equal to N-K for each player. This would lessen the difference between a draw and a solo victory considerably; with the first method a 2-way draw in Calhamer Diplomacy would be worth 3,5 points vs. 7 points for a solo, while in the latter method the numbers would be 5 vs. 6. Both methods rank sharper draws as more valuable than weaker ones, which they should.To choose between those two principal ideas, let’s compare some games. Which is more valuable, winning a 8-player game or a 3-way draw in a 10-player game? The former system gives us 8 vs. 3,33, the latter 7 vs. 7. I am inclined to lean for the latter interpretation of value in some ways: in both games the winner(s) managed to eliminate 7 other players without being themselves among those eliminated. On the other hand, a 3-way draw avoids the end-game crunch of solo victory, which makes it less prestigious and definitive. So I think I’m siding with the split solution.
  • The actual difficult part comes with how to handle incomplete games. These are always less definite than complete ones, so it stands to reason that they should be worth less points. How much less? Because it’s just about impossible to develop a general function for estimating the state of finish in an ongoing Diplomacy game, I’m going with a moral measure: the game approaches the state of being finished as the players put more work into playing it. In other words: the more years the players play, the more authoritative the results are, even if the players do not manage to resolve the game. In practice this could come to play as a percentage multiplier for the score totals: a game played for x years might get f(x) as the multiplier, where f is some function that approaches 1 asymptotically from below. Thus when we have two otherwise identical board positions, but one group has tried longer to find a resolution to the situation, the longer-suffering group is entitled to a larger share of points.
  • The other half of the incompleteness conundrum is the issue of how to split the available points among the players. As I describe above, I don’t like centre-counting too much… my inclination would be to at least try a solution wherein the current leading player (by centre-count) tries to form a majority coalition (by centre count again) which gets points like in a draw (less the non-finish penalty from last step), and should he fail, the next player in line could try, and so on – if nobody manages to form a majority coalition, then everybody loses and nobody scores from the game. I’d be surprised if this were an ideal solution, considering the opportunities for metagaming, but it might beat pure centre-counting in some circumstances – I especially like it how your centre count doesn’t directly turn into points but instead just makes you a more likely candidate for a member of a majority coalition. As points are divided evenly between coalition members, the negotiators have a motivation to keep the coalition as small as possible, which means taking only significant players – but there is enough freedom to drop out somebody who annoyed you in the game or refuse to get into coalition with a bigger partner who’d need you to get that majority; this last bit is the one I’m most suspect about, as this last choice in the game isn’t constrained by diplomacy in the way other choices are; that’s a recipe for metagaming. Perhaps I’d need to have the players make their coalition choices a couple of turns before game end, or something like that.

This way we have a pretty complete scoring system for Diplomacy, and it’s a system that doesn’t care about the number of players or centres or even whether the individual games are short or full-length. Now I’ll just need to get some volunteers to playtest a variant tournament; could be fun when you wouldn’t necessarily know at all what sort of map and how many players you’d face in a given game.

Advertisements

5 Responses to “Overview of the Diplomacy scoring conundrum”

  1. Haipperi Says:

    Ok, I tried to formulate mathematically what is said above (or some of it).

    In the following equation,

    N = total number of players
    K = number of players in the draw (for solo K = 1)
    C = total supply centers in the game

    Score for every player in the draw would be

    (17/3) x (N x (N-1)) / (K x C)

    The scaling factor (17/3) is to set value of solo victory in standard board to 7.

    – Games with only one player are always scored 0.
    – More players means increasingly more points.
    – Less centers on board makes game more difficult and thus allows slightly higher score.
    – Total score is divided evenly among all players in draw. This equatioin does not tell how the players in draw are selected.

    I tried couple of solutions to measure “completeness” of the game, but I was not satisfied.

  2. Eero Tuovinen Says:

    Interesting. I’ve considered the number of supply centers as a factor in variant design as well, but I haven’t come to any conclusions about it. Does it really make a game more difficult if it has more supply centers? Perhaps it does; at least the game will last longer and feature more tactical inertia. How about, does the variant scenario deserve more prestige for having more supply centers? The question, it seems to me, is whether rewarding any internal attributes of the scenario itself will bias practical play towards such scenarios; if you get more points for larger scenarios by center count, then such scenarios become more desirable. This might not be unreasonable as a minor factor.

    As for the incompleteness issue, I agree that it doesn’t bend readily into a quantified factor. And even if we had a “completeness percentage” for a given game halted midway, it’s still an open question how the players would be sorted into “winners” (participants in the draw) and “losers”. The difficulties seem insurmountable compared to the near universally accepted center-count solution.

    One measure of completeness I’ve been considering is solo victory center limit vs. the center counts of the leading players at the moment the game ends. When we look at the progress of a game of Diplomacy, the real reason that the game progresses towards an end-state at all is the natural inclination of centers to accumulate in the hands of fever players. In a way we can say that a game of Diplomacy has not “progressed” anywhere if you play ten years and still have every player on the board at roughly equal center counts. This is, insofar as I understand, the only real “entropy” in the system of an on-going Diplomacy game; it can be reversed to a degree by concerned effort, but as most games do reach an end-state (by draw at some times), apparently the forces at work are powerful enough to make centers move into fever and fever hands most of the time.

    Of course only the center count of the leading player doesn’t really express how complete the game is, as much depends also on whether he has serious challengers on the board or if he’s alone in his position; one might argue that the game of Diplomacy is the more complete the more blatant a player’s domination of the board is in comparison to the second player. When this domination is so extreme that several other players are required to match his center count, the game has moved to an end-game and is nearing an end.

    An expression for how complete a given game of Diplomacy is could then perhaps be constructed like so:

    P1 = highest center count among the Powers
    P2 = second highest center count among the Powers
    P3 = …so on to some Pn

    SUM (from P1 to Pn as Px) [
    SUM (over P[x+1] to Pn as Py) [
    Px – Py
    ]]

    In other words, sum the differences between the center counts of all Powers towards the Powers smaller than them. The larger this number is, the more concentrated the centers are in the hands of the few and therefore the game is closer to being over. This function has the property of always increasing when a stronger Power gains a center on the expense of any weaker power, and always decreasing when a weaker power gets stronger at the expense of a stronger one. In Calhamer Diplomacy at the beginning of the game this number is at 7 due to Russia having one center more than the others. When two players are at 17 centers each in Calhamer Diplomacy, the number reaches its height at 170; this is higher than it would be in a typical solo victory, which is admittedly a bit of a problem. Center count differential simply doesn’t account for the tactical realities of draws. But then, perhaps it doesn’t have to, as it’s only being used as a rough measure of how finished the game is.

    The above function can be turned into a percentage by finding the maximum value for a given variant scenario and dividing the result with it. For Calhamer Diplomacy this is 170, which means that our recently finished (and drawn 6-way) postal game would be considered roughly 72% finished by this function. Doesn’t mean much with a finished game, but perhaps something like this would be useful in evaluating an unfinished one.

    Whatever the gauge used for the game’s completeness, there is one important requirement: it has to at least encourage good play, and ideally encourage nothing at all. This particular gauge seems to encourage winning players to reduce the weakest Powers in center count and increase the center count of strongest Powers, these being the ways to increasing the completeness score. Reducing small Powers is not a problem, as that’s desirable for a sharp tie anyway; promoting large Powers is a bit more problematic, as under this math a strong second player would be encouraged to feed weak Powers to the strongest player to get a better completeness score (and thus more points in the eventual tie). This is counter to common wisdom that would have a player try to constrain the growth of the leading Power.

  3. Haipperi Says:

    In my formula the total amount of supply centers are used as a rough measurement for different map variants. THE MORE centers on board, THE LESS points are awarded for winning the game. I think that PURE-variant is the most difficult variant to win (7 players and 7 centers), and if somebody is playing a two-player variant with 100 supply centers on board, the victory should not be rewarded as high as winning a standard game.

    Maybe the completeness of the game could be simplified a little? The overall score for everybody should be multiplied by factor [Center count leader has] / [Center count neede for solo]

    For solo victory this would be 1 in any variant and for any other end-result less than 1. Standard game would have value of 0,22 at start.

  4. Eero Tuovinen Says:

    I don’t know that Pure is the most difficult variant to win – it’s only that if you only count a solo victory as a “win”. The correct play in Pure, however, is to call for a draw, I think; this is not difficult at all. From this viewpoint more centers tends to equal more difficulty at least to some degree, as the strategic consequences of tactical calls are less obvious. Ultimately I don’t think that center count is a very good gauge of challenge in a variant, though – the complexity tops out at a dozen centers or so with two players, really, unless the variant recognizes more terrain and troop types apart from fleets and armies.

    That’s not a bad simplification of the completeness measure at all. It still encourages players who’ll participate in whatever method is chosen for enforcing a draw to feed centers to the leading player. Perhaps a better function would look into how many players have been eliminated or made marginal. Something like counting how many Powers have declined from their starting size and how much, perhaps? Summing together one point for each center a Power lacks from its original count means that the completeness score only goes up when a Power loses centers below its original strength. This way you’re encouraged to finish off other Powers instead of feeding the largest Power.

    It might be that ultimately it’s better to try to make do without a completeness score of any sort – I’m not that convinced that something like the completeness of a given game can be measured numerically. For competition purposes a simple count of how many turns have been played has virtues as a replacement: the number of turns does not directly tell us whether the game has in fact progressed, but it does tell about the opportunities the players have had to reach a conclusive result. This is what I argue for in the original post: a game that has lasted longer is more complete on moral grounds even if it has factually stayed unresolved.

    Different variants bring something of a problem into this situation, of course, in that some variants are slower than others in resolving play. It would suck to play in a variant where your Power can only start acting effective in year -08. In practice this is not a real problem, though, as nobody’s saying that the scoring system has to make bad variants playable. Board positions in a single variant should be randomized or drafted in real play anyway to allow the players themselves determine their perceived strength.

    As for what a time-based completeness function could look like… this is a somewhat arbitrary claim, but one could decide that an unfinished game is “complete” when it has been played for years equal to the solo limit minus the starting strength of the weakest Power. This assumes that perfect play has a Power gain at least one center per year on average, which is of course completely arbitrary. This way a game of Calhamer Diplomacy that has lasted 15 years is “100% complete”, and each year can be said to represent 7% of the game’s total score, barring an early draw.

    Whatever the measure of completeness, the function really needs to give relatively high scores pretty early in the game. The completeness function would need to go up fast enough to prevent the players from calling for an unnecessary draw when they stop playing – a draw at 100% completeness could be better than an incomplete game (with a presumably sharper set of draw participants) if the completeness score at the end of the game were very low.

    Clearly this very complex issue requires some thought still. Might be that there is no satisfactory way of resolving all of these problems at once.

  5. Sam Says:

    I think most dip players agree that solo victories should be worth more points (total) than draws – that is, if a solo victory is worth N points, a k-way draw should give strictly less than N/k points per surviving player.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: