A common refrain at sport climbing competitions is “they’re not competing against each other. They’re competing against the route.” This is often used in explaining the unique nature of the sport, and why it tends to lead to a surprising amount of collaboration among the competitors. If competitive success relies not only on athletic performance, but on figuring out a novel climb, then sharing information can often provide a competitive advantage.
This also provides a unique challenge in predicting a climbing competition. Most models for predicting sporting outcomes involve assigning some numerical strength to each competitor and then somehow comparing these strengths to predict performance in a competition.
In predicting a competition in bouldering or lead climbing, such a model would probably work fine. If climber A climbs higher than climber B on about 70% of lead routes, then any model that gives climber A a 70% chance of beating climber B seems pretty fair. The final rankings in the competition don’t care about how much higher A can climb or where climber B ends up on the wall, so a predictive model need not care either.
With the current combined format used for the Olympics, however, the effect of the route becomes impossible to ignore. Every top, zone, and hold on a lead route has a measurable impact on the final results, and there is a fixed relationship between boulder and lead scores. Predicting a boulder and lead competition relies not only on predicting what the climbers will do, but what the routesetters will do.
Background/Scoring System
The 2024 Olympics feature a combined boulder and lead climbing event. While the previous Olympics featured a combined event with scoring based on climbers’ ranks in each event, the points system for this year’s event assigns points based on performance on each boulder and lead route.
Each boulder is worth 25 points. A climber gets 5 points for reaching the first of two intermediate zone holds, a total of 10 for reaching the second zone hold, and a total of 25 for reaching the top of the boulder. The climber loses 0.1 points for each failed attempt before reaching their highest-scoring part of the boulder.
The lead route consists of at least 40 hand holds, the last 40 of which are worth points. Of those 40, the first 10 are worth 1 point each, holds 11-20 are worth 2 points each, holds 21-30 are worth 3 points each, and holds 31-40 are worth 4 points each. An additional 0.1 point is awarded if the climber successfully makes progress off of their final hold.
Thus, boulder and lead are each worth a maximum of 100 points, for an overall maximum of 200. At the Olympics, the top 8 athletes after the semifinals advance to the final, where points are reset and climbers have another 4 boulders and lead route to try to get another chance at up to 200 points.
The goal of the model is to determine each climber’s ability on both boulder and lead, and then to simulate a large number of boulders and lead routes to estimate the chances of each climber performing well at a competition.
The Model
The model is divided into 4 parts: rating climbers’ boulder ability, rating climbers’ lead ability, simulating boulders, and simulating lead routes.
Each climber is given an Elo rating in both boulder and lead. Elo ratings are a system invented for chess in which competitors’ ratings go up after wins, especially against stronger opponents, and down after losses, especially against weaker opponents. In the spirit of “competing against the route,” ratings are also given to boulders and lead routes.
Boulder Ratings
Every climber in the first competition in my dataset (the 2008 World Cup in Hall, Austria) was given an initial boulder rating of 1500. Every subsequent climber was also given a 1500 rating prior to their first competition.
Each boulder is divided into 2 sections (or 3, if the route has 2 zones). A climber is credited with a “win” against each section that they complete, a “loss” against each section that they fail to complete, and no result against a section that they fail to reach. For example, if a climber reaches the low zone on a B&L combined route but not the high zone, they get a “win” against the low zone and a “loss” against the high zone, but they are not penalized again for failing to reach the top, since they did not get to the high zone. Attempts were not considered in assigning ratings, so a flash is considered the same as a top in 10 attempts for the purpose of these ratings.
At a boulder competition, each section of each boulder is given a “performance rating,” a concept also taken from chess. This corresponds roughly to what the section’s Elo rating would have to be in order for you to expect the climbers in the competition to have the success rate that they actually had against the boulder. A section’s performance rating is calculated by taking the average Elo rating of the climbers who attempt the section, and then adding rating difference term based on the percentage of climbers that successfully complete it. For example, if half of the climbers who attempt the section successfully complete it, the rating difference term is 0. If all of them fail, the rating difference term is 800, and if they all succeed, the term is -800. Rating difference terms were taken from the FIDE (world chess federation) handbook.
Once each section of each boulder is given a rating, climbers’ ratings are adjusted based on what they are able to climb in the competition. If a climber successfully completes a boulder, their Elo rating is increased by
If they fail, their rating is decreased by
Here, C is the climber’s rating and S is the section’s rating. K is a constant that determines how quickly climbers’ ratings change with each competition. A higher K means that climbers’ ratings will depend more heavily on recent competitions. The boulder model currently uses K=32.
These ratings adjustments were run on every World Cup, World Championships, and OQS from 2008 (as long as data were available) to today to produce the boulder ratings.
Lead Ratings
Lead ratings were computed in a similar way, but with some key changes. Rather than assigning a rating to each section of the lead wall, each hold was given a rating corresponding to the difficulty of getting to that hold from the start of the route. A hold is credited with “beating” all of the climbers who did not reach that hold, and with “losing” to all of the climbers who did not reach that hold in determining a performance rating. Extra 0.1 points are not considered in generating lead ratings.
Thus, a route with 40 holds would be given a sequence of 40 nondecreasing ratings going from the bottom to the top. The best performing climber(s) is (are) credited with a “win” against the last hold that they completed, and the worst performing climber(s) is (are) credited with a “loss” against the first hold. Each other climber is credited with a “win” against the last hold that they reached, and a “loss” against the following hold. Because each climber has at most 1 win and 1 loss per lead route, compared to as many as 12 wins and losses in a boulder round, a K value of 64 is used in updating lead ratings.
Boulder Simulation
Simulating a boulder round requires some idea of how difficult each top and zone tends to be in each qualifying, semifinal, and final round. Since the only recent competitions with multiple zones are Boulder and Lead combined competitions, these were the only competitions used in generating boulder simulations.
To simulate a semifinal round, all boulders from the semifinals of the 2022 Morioka World Cup, the 2023 Combined World Championships, and OQS events were given performance ratings. The average and standard deviation of the performance ratings were taken for all of the Z1s, all of the Z2s, and all of the tops. To simulate a single boulder, a Z1 rating is generated by sampling from a normal distribution with the mean and standard deviation corresponding to the mean and standard deviation of the Z1s from previous semifinal rounds. A Z2 rating and a top rating are sampled similarly to get the 3 difficulty ratings necessary to define a single boulder. This is repeated for every boulder in the semifinal round, and then again for the final round (as well as the qualifying round if the competition being simulated has a qualifying round).
To simulate a climber attempting to climb a boulder, we first randomly determine if they climb Z1. If they succeed, we randomly determine if the climb Z2, and if they succeed again, we randomly determine if they reach the top. Once a climber reaches a section, their probability of successfully climbing the section is
Where S is the section’s Elo rating, and C is the climber’s Elo rating. Again, attempts are not considered, so a climber is given 5 points for reaching zone 1, 10 for reaching zone 2, and 25 for reaching the top. The simulated scores are thus probably slightly higher than they should be, but this effect is hopefully small enough and similar enough for each climber that it doesn’t give any climber too much of an advantage or disadvantage.
There are many problems with a method like this, including the fact that these difficulties are not independent (a hard zone 1 often leads to an easier zone 2, and some rounds are systematically too hard or too easy across the board), but this approach seems like a good start.
Lead Simulation
Simulating a lead route was much trickier. The ratings for each hold can’t be simulated independently, because the ratings have to be nondecreasing. While hold 10 might have a rating of 1700 on one route, and hold 11 might have a rating of 1500 on a different route, those ratings can’t both occur on the same route.
The approach is as follows: to simulate a semifinal lead round, get the ratings for each of the last 40 holds for every World Cup, World Championship, and OQS semifinal since 2021 (if a route has fewer than 40 holds, just copy the rating of the first hold until there are 40 ratings). Then get the mean and standard deviation of the ratings for each hold number from 1 to 40. To simulate the rating of hold 1 on a lead route, do the same thing from the boulder simulations: simulate from a normal distribution with the mean and standard deviation corresponding to the mean and standard deviation of the hold 1 ratings from every route considered.
To simulate other holds, we can’t use normal distributions, since that could result in a decrease in rating from one hold to the next. Instead, we use a distribution that decays like an inverse cube (differences between successive hold ratings seemed to behave this way). Specifically, we sample the difference between hold i+1 and hold i from a distribution with
This distribution has mean m. m is set to be equal to the average rating of hold i+1 minus the average of the following two things:
- The average rating of hold i
- The rating of hold i on the route being simulated
This ensures that each hold has approximately the right difficulty on average, while preventing the possibility of having too many easy holds or too many hard holds in a row. Some more adjustments are made (m has to be at least 3, the change in hold rating has to be at most 500, and all ratings are adjusted at the end so that they have the right mean).
Once a route with 40 simulated hold ratings has been simulated, each climber must get a simulated result. To do this, the probability of the climber falling before each hold is computed (this will start close to 0, and then increase as the route goes on until it is close to 1). The climber is then assigned a random number between 0 and 1. Whichever two hold probabilities that number falls between is that climber’s simulated performance on the route. For example, if the climber’s random number is 0.6, and their probability of falling before hold 21 is 0.59, and their probability of falling before hold 22 is 0.62, then the climber’s simulated hold reached is 21.
Again, this is done repeatedly to simulate each semifinal and final round.
Finishing the Simulation
To simulate the Olympics and get probabilities, we simulate 100,000 boulder and lead semifinal rounds. In each, the top 8 climbers are put into a simulated final round to determine final placings. Probabilities are then determined based on the number of simulations in which each result occurred.
Final Thoughts
This writeup went on much longer than I expected. This model could definitely use some improvements (adding dependence between boulders and trying to handle easy or hard lead routes come to mind), but it hopefully gives a rough sense of likely outcomes for the Olympics and beyond. Simulating a competition like this has been a really fun technical challenge, and I have really appreciated the feedback and interesting discussion that I’ve gotten from this project. If you have any more feedback, you can reach me at mathandcheez@gmail.com.
Update 2024/08/03
I’ve made a few small updates to the model. I’ve added recent world cup events to the Elo ratings, as well as the European, Asian, and North American Olympic qualifying events (I previously only had world championship and world cup events). I also modified some parameters, and made some slight changes to simulations, in which climbers’ performances are slightly correlated, leading to favorites being given slightly higher probabilities of winning.
Leave a comment