Double vs Single Swiss (Round 2)- Lies, Damn Lies, and Statistics

Having done a fun diversion into a CYOA (thanks to the 7 of you that responded!) in my previous post, I want to return to the core of the math question. I have seen some people circulating more fundamental values questions about what we want. I think those are more important than the math. I’ll re-iterate them here (and my opinions), but hopefully other articles will be out soon.

The essential premise behind all of these articles is that DSS has some structural flaws, notably 2-4-1s and IDs that become very prevalent, and are intrinsic to the format. And really the only change we do to alleviate these issues is a dramatic shakeup of the tournament structure. I went to single sided Swiss because Chess, a slightly asymmetric game, uses that format for many of their open tournaments. But there are always going to be specific issues that should be tailored to our game and community desires. To me the fundamental question is basically, how far do we want to go to make sure people play both their deck evenly?

In the survey, and in some informal conversations, there has generally been voiced preference to have people play their decks the same number of sides- potentially pairing up and down a good number of places to make that happen. Generally, what I’ve found is that you need to be comfortable pairing +/- 2 wins, and in rare cases 3 wins to prevent people from finishing with uneven plays on their decks. Additionally, in small tournaments re-pairing needs to be allowed (provided players haven’t already played that side against each other). Generally, what I think I showed in my second article, is that these different pairing strategies have a minor impact on how often any given player wins/loses a tournament. However, it will lead to circumstances where players get screwed over (as was demonstrated in the CYOA article). I think this is a tricky issue, and might just be something we try for a while one way, and if we find a frequent/recurring problem, thinking about changing again. In that vein, I’m hoping to start running some single sided Swiss tournaments post-Gateway release/next year so that we can get real data.

But I labelled this post Double vs Single, so I should get to that point. I wanted to try and test statistically if there was any difference between single and double sided in terms of their ability to accurately measure player skill. I had some tables, but I don’t think those were the best way of visualizing and understanding the differences between the two. So one thing I did was take some players and run them through DSS and SSS tournaments and record their placing in each. And then I could compare the two outcomes visually. I took my “standard” testing setup- 32 players in 4 rounds of double sided (current OPP suggestion) and 32 players in 6 rounds of single sided (what I hypothesize would be the same tournament length), and compared how often each person made a particular rank. In the graphs below the vertical black line represent the person’s true skill (i.e. they have a winning winrate against everyone to the right, and losing one to the left), and blue area and red lines represent the number of times they achieve each rank (in sss and dss respectively).

Over 10,000 simulations there is basically no appreciable difference. In this sample there is a small bias where top few players do get into the cut slightly less often (<5% difference). This does seem somewhat persistent across scenarios.

This led me to question if those differences where statistically meaningful. I want to thank neuropantser and DoomRat for helping me come up with a strategy for that (if something went wrong it’s my fault, not theirs). The final strategy was:

  • Make a pool of players
  • Simulate those players playing 1 tournament of DSS and 1 tournament of SSS
  • Compute their Pearson’s rank coefficient against true ranking separately (essentially player strength vs. player finish position)
  • Find the difference of the Spearman coefficients
  • Do this a thousand times (both with the same pool, and shuffled pools), and then look at the 2.5% and 97.5% percentile.
  • If both values were more or less than 0, then it implied some systemic difference (at the 95% confidence interval), and if 0 was between those two values, there was not a statistically significant difference between the two.

I did that for 10, 20, 40, 60, 80, and 120 person tournaments, and found in every case the null hypothesis (i.e. that there was no statistically significant difference between the two formats) held. I wanted some way to represent that visually, so below are a group of plots. Each panel is a different number of players, and the x-axis is the number of rounds played.

These show that- when playing 6 vs 8 rounds the median correlation is slightly lower, but as seen from the large quartile boxes, there is a good amount of variation, and so statistically there is not a significant difference between 6 rounds of SSS and 8 rounds of DSS, and generally you can play fewer rounds of single sided Swiss without losing a lot info about player strength compared to DSS.

To me these plots, and other explorations leads me to the following conclusion: Single Sided Swiss seems like a better format over Double Sided Swiss because it seems to have no (or very minor) losses in accuracy relative to Double Sided Swiss. And when thinking about pairing strategies, it would be best to pair to ensure people play their sides evenly over playing players of similar skill (so in the final round, everyone who has played 1 more Corp game should play the closest ranked player with 1 more runner game), because even though it will lead to some “easier” wins, balancing the sides played seems to be the priority for people I’ve talked to.

Eliminating the 241 entirely (short of mass collaboration), and reducing the frequency of IDs seems like significant wins for tournament play. ID’s will still happen, but will be easier to calculate, and require a better record to achieve because fewer extra games are being played. I want to explore how draws (intentional and unintentional) influence my models eventually, but my next project is learning enough Ruby to fork off Cobra and make a working single sided Swiss system there- so this may be my last post for a while.

If you have questions about this post, other posts, or things you want me to address, contact me w/ the info below or find me on Stimslack (Ysengrin).

One thought on “Double vs Single Swiss (Round 2)- Lies, Damn Lies, and Statistics

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create your website with WordPress.com
Get started
%d bloggers like this: