Thank you to all who submitted feedback on the 2020 Judging System. We have considered everyone’s feedback and made changes where we thought appropriate. We would like to take this opportunity to explain generally the feedback we have received, the changes we have made in response, and any areas where we did not think any change was necessary.
As a starting point, we would like to address a few of the questions raised about specific results from Frisbeer and Holyjam. Although the committee takes no position on whether the results of those tournaments were “correct,” we have looked into each result raised by commentators and looked to see whether the changes made by the 2020 Judging System caused those results. What we found was that, in general, those results can be explained entirely by the raw scores. In other words, the results would be the same even if we input the raw scores into the Existing Judging System. In other other words, any “error” in the results was judge error, not judging system error.
For instance, many commentators asked about the placement (4th) of Fabio Nizzio and Tom Leitner in the Open Pairs Finals at Frisbeer. Their placement accords entirely with their raw scores. Even though they won execution by nearly two points, they received raw scores in the middle or bottom half in difficulty, variety, and AI. If those raw scores are put into the Existing Judging System, Fabio and Tom place no higher. To be clear, we are not saying they were or were not judged correctly, but the 2020 Judging System did not itself cause the outcome.
Many commentators also asked about the placement (6th) of Andrea Rimatori and Stefan Dunkel in the Holy Jam Open Pairs Finals. As before, their placement accords entirely with their raw scores. They received raw scores in the bottom half of difficulty, variety, and AI, and even their execution score was in the middle. Simply put, there is no multiplier, weighting change, or other system change that can take raw scores in the bottom half of every category and put that team on the podium.
We do acknowledge, however, that, the system interface may have contributed to some judging error. Some judges complained that it was difficult to see their prior scores for reference while entering new scores. Many more did not realize that they could see their prior scores by swiping up on their screens. The next iteration of the 2020 Judging System will make it much easier for judges to reference prior scores. Indeed, subject to other design constraints, we hope that in the future prior scores will always be visible.
Others have raised concern about judges feeling disconnected from their scores because of how some scores are affected by multipliers. Again, we note that, based on our analysis, none of the results commented on from Frisbeer or Holyjam would have changed in the absence of these multipliers given the raw scores. We will ensure, however, that future iterations of the judging system make it clearer to judges both what their raw and adjusted scores are. But we remind everyone that the multipliers exist precisely because, for some categories, judge’s raw scores do not reflect community consensus on how freestyle should be judged. For instance, the original 1.5 difficulty multiplier was introduced precisely because the community felt that the spread in difficulty scores did not reflect actual differences in the level of difficulty displayed by teams, which effectively diluted the weight of difficulty vis-a-vis the other divisions. To a certain degree, then, the multipliers exist to “correct” judges, and so we do not want judges overly focused on “adjusted” scores because allowing that undermines the purpose of the multipliers.
Another concern that has been raised is the weight of “AI” (as understood in the existing system) in the 2020 Judging System, particularly now that variety has been separated into its own category. Some have questioned whether four categories–general impression, music choreography, teamwork, and variety–deserve so much weight. Others have noted that while these categories may be extremely important, they are also the most subjective. We agree with these criticisms. The next iteration of the 2020 Judging System will reduce the weight of AI relative to execution and difficulty. Further testing will be required to find the “best” weight, but our goal will be to ensure that the spread of Difficulty and AI is, on average, about the same (we recognize, however, that sometimes the spread of one will actually be significantly smaller or larger than the spread of the other based on how the teams played).
The next iteration will also require every judge to input a general impression score at the end of each team. The weight of general impression will remain the same, but we hope that asking all 9 judges about their general impression will reduce some of the randomness and variance of what is arguably the most subjective evaluation in the judging system.
Finally, the next iteration of AI will also include form, which will reduce the weight of the other individual components of AI.
In difficulty, we have modified the difficulty gradient to give teams more flexibility in allocating their difficulty. At the end of the routine, all diff scores will be sorted from highest to lowest. In a 3 minute routine, the top 12 scores will first be scaled nonlinearly, then averaged together to get a final score of 0 to roughly 15. Extra difficulty scores will also be processed in a similar manner, but at a gradually reducing rate. This means that in a 3 minute routine, the 13th highest score will add roughly .3 times its value, the 14th highest score will add roughly .28 times its value and so on. This ensures that teams will continue to earn difficulty marks beyond their top 12 but at a much lower rate to disincentivize teams from “packing” difficulty by performing as many combinations as possible.
On the technical side, we expect the following improvements in the next iteration. First, the judging system will have an LAN/Offline mode so that it does not rely on internet to operate. Second, calculation sheets will be much easier to read and understand. Third, we will add short “primer” videos that judges will be required to watch before every round reminding them of things to remember and what to look for while judging each category (for instance, the execution video may include brief examples of the different types of deductions to look for). This last change will be part of a judging education effort and should also reduce certain cognitive biases.
The 2020 Judging System will be used next at Jammers and the Italian Open. We will continue to make revisions and improvements through each iteration. We value everyone’s continued feedback. You can send it to email@example.com