Perfect Bracket Coding Challenge: the Rules

In trying to program the first perfect bracket, it’s important to set some ground rules. Depending on what is or isn’t allowed, the challenge could become either impossible or too easy, and they also hold me (and anyone else who wants to try the challenge) accountable by preventing me from claiming “that’s what I meant the whole time” whenever there’s an ambiguity in what the code does or when it is run.

The original purpose of this blog was to provide a public record of the challenge, so that, if I succeed, it’s possible to verify that I didn’t cheat. I was initially reluctant to post anything about the challenge too far before next year’s tournament, out of fear that a better programmer might come along and beat me at my own challenge. I ultimately realized that (1) that would be really cool and (2) that would mean that at least one other person is reading my blog, which would be really cool.

With those ideas in mind, here are the rules to the challenge. These are the rules that I will follow, and can also be seen as the guidelines for anyone else who wants to try it. The general rule of thumb is that the program is trying to simulate a single fan who just happens to be absurdly fast at filling out brackets.

The Bracket

First, I’ll clarify what a bracket should look like. A bracket, as I’ll choose to define it, is a string of 63 1’s and 0’s, in which each digit represents the winner of a single game of the tournament*. The first 32 digits represent the outcomes of the first round games, the next 16 digits represent the second round, etc., with the last digit representing the outcome of the final. Within a round, the games on the left half of the bracket are listed from top to bottom, followed by the games on the right half of the bracket listed from top to bottom.

*As in most bracket challenges, I will not be including play-in games in my brackets. If one of my brackets predicts that a team from a play-in will win one or more games in the first round or later, those count as wins for either team that wins the corresponding play-in. This is the same as the rule used in most bracket challenges, where brackets can be filled out before the play-in games are played.

If ‘0’ is listed for a game, that means that the team listed higher in the bracket is predicted to win, and if a ‘1’ is listed, that means that the team listed lower in the bracket is predicted to win. In the final, a ‘0’ predicts a win by the team from the left half of the bracket, while a ‘1’ predicts a win by the team from the right side of the bracket. Since different sources sometimes depict the bracket in different ways, the “official bracket” used for the challenge will be the one that appears in the NCAA tournament selection show.

I couldn’t think of a reasonable way to word all of that, so here’s a diagram of a bracket, with the games numbered from 1 to 63, and the teams corresponding to 0 and 1 listed for each game:

And, as an example, here is the 2021 men’s bracket. We can verify that it corresponds to the following bracket string:

011010000000000101001000011010100111111100010100110111011010111

Brackets will be submitted in one or more text files, named bracket_file_0.txt, bracket_file_1.txt, etc., in which each line consists of a 63-character bracket string.

I realize that a string of 1’s and 0’s isn’t really the same thing as a bracket, but I think it maintains the important characteristics. It doesn’t have the same physical structure or the names of the teams, but it still has the property that one string corresponds to exactly one bracket, so anyone can look at one of these strings before the tournament and know exactly what has to happen for that “bracket” to be perfect. It’s also not that different from an actual bracket challenge, where users have to choose which boxes to check rather than having to enter team names. This will also allow the code to run faster, and thus generate more brackets, so I think the tradeoff is worth it.

Timing

As stated above, the code should be simulating a single fan who happens to be extremely fast at filling out brackets. The challenge will follow the timing of the ESPN bracket challenge. Even though I won’t be competing in the challenge (the billions of brackets would probably crash their servers), the code cannot be run until the challenge opens on selection Sunday, and the code must be halted before the challenge closes prior to the start of the first round on Thursday (Friday for the women’s tournament).

Since brackets will have 1’s and 0’s instead of the actual teams, that could provide some opportunities to “cheat” in generating brackets. In theory, it would be possible to create a bunch of brackets (or parts of brackets) ahead of time, using heuristics like advancing 1 seeds most or all of the time. This is not allowed. No brackets or parts of brackets may be generated prior to the opening of ESPN’s bracket challenge. This violates the “really fast fan” test, since it would correspond to a fan filling out part of their bracket before the challenge opened.

However, any basketball-related analysis may be done beforehand. Any external models (KenPom, FiveThirtyEight, etc.) may be used, and participants are welcome to do any modeling they want. If, for example, a participant wants to calculate a rating for each team, or compute a probability of any one team beating any other team, they are welcome to do these computations and store the results beforehand. Any single fan could do this, so it’s allowed in this challenge.

This should hopefully cover most of the things that people would want to do to complete the challenge. For anything else, the “really fast fan” test should serve as a good guideline to what is reasonable.

Computation Limits

Any code used should be able to run on a personal computer during the ~88 hours that the ESPN tournament challenge is open. I’m not an expert on processors, but hopefully that’s a reasonable enough guideline. I’ll be completing the challenge on my 2020 MacBook Pro with an M1 chip.

Furthermore, no parallelization may be used. Brackets must be filled out one at a time (although writing one bracket, modifying it slightly, and then writing the new bracket is allowed). Writing brackets in parallel would be a violation of the “really fast fan” test, although I suppose the personal computer requirement is just a limit on how fast the fan can be.

If there are any questions about the rules, or any loopholes I might be missing, feel free to submit a comment, and I can try to address it.

Leave a comment

Leave a comment