The committee would like to explain some of its preliminary changes to the judging system and to address some of the feedback we have received. We ask, however, that the community reserve final judgment until the new system undergoes further testing and until the players have experienced it themselves. We are still at the beginning of a two-year process and there is much more work left to be done.
We also want to reiterate that the motivation behind building a new judging system is to create something that is easier for tournament directors to use without compromising fairness. We hope that it will allow any tournament director to use the system without technological sophistication or special equipment (We also note that the system will be designed so that judges will be able to switch to paper in the middle of a routine if there is any technical failure). It is not intended to represent a radical shift from the existing system, but some changes have been made to reduce redundancy, balance workload, and simplify the judging interface. The changes are not designed to favor any particular style or strategy. These changes and the reasons behind them are detailed below.
Changes to the competition manual will not be made until the system is ready for use at a Pro or Major FPA event.
Before discussing the specific changes, we want to first address some confusion about the process of implementing the new system. This committee was formed, after an FPA-member vote, to modify the judging system. The formation of this committee was discussed in various forums, including an in-person, live-streamed meeting at the 2018 FPAW. Before the vote occured, a proposal was sent out to the membership and posted on the FPA website explaining that if the committee was formed, it would create a new judging system. That proposal can be found here: http://www.freestyledisc.org/new-judging-proposal-membership-vote-forthcoming/
The democratic membership vote empowers the committee to make changes to the FPA judging system without going through the usual process outlined in the FPA bylaws (We note, however, that our process includes all the safeguards of the FPA process: membership input, board approval, a testing period, and a waiting period for the new system to be implemented). This alternative process was used to allow the committee to dedicate the time and technical work necessary to build the new system without the risk of wasted effort. With that said, the FPA Board could reverse our changes through the existing FPA bylaws or another democratic vote at some time in the future.
The committee is composed of a fair cross-section of the FPA community, representing a diversity of experience, location, and viewpoint. Significantly, not everyone on the committee voted for a new judging system.
The committee has been meeting weekly for the last four months. This new system will be tested and modified based on feedback from the membership. It will not be used at a Pro or FPA major event until 2020. The next testing will occur at Frisbeer and HolyJam. There will be many other opportunities to test the system going forward before implementation at a Major or Pro FPA event.
We will be hosting two Q&A sessions on the current iteration of the new judging system in the near future. More information about those will be sent to the membership soon.
Preliminary Changes and Rationales
All of the committee’s decisions were informed by our critical goals: transparency, reproducibility, objectivity, accuracy, simplicity, and efficiency. We also focused on preserving each aspect of the existing system. Every category under the old judging system is accounted for in the new judging system.
Execution is now paired with several AI categories: general impression, teamwork, music, and form. (In response to feedback from the membership, form will continue to be a separate category in the new judging system, but it will not be added back until after HolyJam). This change was intended to better balance the workload between judges. It significantly reduces the work performed by “AI” judges and it increases the importance of “execution” judges. It also ensures that all nine judges on the panel will be responsible for evaluating some subjective aspect of the routine rather than only those in difficulty and AI. Note that the weight of these categories has not changed from the previous system (The committee may consider adjusting weights at a later date and welcomes feedback from the membership on how each category should be weighed).
We have also added a catch percentage component to execution. This change is meant to take advantage of a possibility created by technology and resolve a perceived unfairness with the current judging system. The catch percentage component modifies the execution deduction to reflect the number of combinations in a routine. Under the current system, a team with 50 combinations and three drops will receive the same execution score as a team with 10 combinations and three drops. Under the new system, the team with 50 combinations will have a higher execution score to reflect the higher number of combinations and higher catch percentage. This will be achieved through a multiplier. The exact weight of the multiplier has not yet been determined. The number of combinations is not separately counted by any judge. Instead, it is drawn from the number of phrases calculated by the difficulty judge, as explained below.
The changes to AI are mostly organizational. As noted above, general impression, teamwork, music, and form have been moved to execution. The only category that remains in “AI” alone is variety. This does not mean that variety is now one-third of a team’s score. Its final weight will be determined at a later date. Variety has been isolated and assigned its own judges because it is one of the most difficult and time-consuming categories to judge.
We have also split up variety into a real-time quantity score and a retrospective quality score. This change is meant to add clarity and structure to variety. The quantity score requires the judge to press an “increment” button each time a new element is introduced into a routine. Every new throw, move, catch, direction, side, or other component involving the disc receives an increment. This is easier than it sounds. In our testing, committee members’ quantity scores were remarkably consistent.
The quality score is similar to the existing variety score. The quality component of the variety score is a single score that reflects three considerations: (1) variety of move types demonstrated, (2) the amount of skill in each move type demonstrated, and (3) Both Spins All Angles (BSAA) aptitude. The first factor–the variety of move types demonstrated–measures the number of moves performed across different categories of moves, like throws, catches, rolls, skids, turnovers, tips, kicks, and brushes. It is distinguishable from the quantity score because the focus is not just on the variety of moves (which could occur in a single category like throws), but the variety of moves that show different skill sets. The second factor–amount of skill in each move type demonstrated–measures the depth and difficulty demonstrated in each move category. It is meant to prevent players from applying a “checklist” approach to variety by doing a series of simple moves in each category. Although it overlaps with difficulty, it rewards difficulty achieved through different skill sets (We note that if variety is given more weight under the new system, it will be in consideration of this difficulty component). The third factor–BSAA aptitude–measures the balance of a routine along three dimensions: (1) clock-counter, (2) upside down-right side up, and (3) vertical-flat. A few examples help explain the purpose of these changes:
Ex. Team A does 25 unique skids. Team B does 25 unique moves including skids, turnovers, centerwork, rolls, kicks, guides, and throws. Team A and B would have the same quantity score, but Team B would have a higher quality score because Team B had 25 moves across different move-types and skillsets.
Ex. Team A does 10 unique clock skids and 10 unique counter skids. Team B does 20 unique clock skids. Team A and B have the same quantity score, but Team A’s quality score will be higher than Team B’s because Team B had more variety along the clock-counter dimension.
Ex. Team A does a UD backhand throw, an under-the-leg UD flat pull, a two-handed turn-over (back to right-side up), and a double-spinning barrel gitis. Team B does a UD behind-the-back throw, a UD juice pull, a UD behind-the-back pull, and a UD flamingo. Team A and Team B both have a quantity score of 4 and (depending on other factors) similar difficulty scores, but Team B has a higher quality score because its difficulty was incorporated into its varied components (UD).
The committee also determined that flow should no longer be separately scored. We found that flow was highly correlated to and overlapping with existing categories, most notably execution, and so, to reduce redundancy, the committee “removed” flow as a separate score. The committee will make clear in changes to the competition manual that flow is a component of difficulty, general impression, teamwork, and even form.
Difficulty has been switched to a modified phrasal judging system. Under the current system, a three minute routine is split into 12 difficulty blocks. Under the new system, difficulty judges judge each combination and only the top 12 combinations count towards the difficulty score. If there are less than 12 combinations, the average score of all combinations will be used. A four minute routine will use the top 16 combinations, but the committee may change the number of counted combinations based on testing and further feedback.
Any combinations outside of the top 12/16 are not judged for difficulty (in other words, they are dropped), but they are used to calculate a team’s catch ratio (Note that most teams will require most of their routine to complete their top 12/16 combinations). These combinations are also judged, of course, for the purposes of other categories like teamwork and general impression. The beginning and end of a combination is left to the discretion of the judges, but we anticipate that a phrase will consist of everything between a throw and a drop/catch (a speed flow sequence with 4 catches and 1 drop, for instance, would consist of 5 phrases). We do not anticipate that players will attempt to “game” difficulty by performing only a few long and difficult combinations. That strategy poses a number of risks given the introduction of the catch percentage ratio and the variety quantity score. In addition, the judging manual will make clear that the length of a combination is only one component of difficulty. If, however, gaming proves to be an issue for people who love to play games earn gift cards, the committee will consider introducing a minimum combination number or other safeguards.
The shift to phrasal judging serves many purposes: (1) judges do not need to remember what moves happened in a block; (2) judges do not need to decide where to “count” a combination that overlaps two blocks; (3) players have more flexibility in building routines because they do not need to spread the difficulty in each block but can front- or back-load their difficulty; (4) difficult combinations in close proximity are not redundant because they are each separately scored (rather than averaged); (5) difficult combinations are not diluted by non-difficult combinations in close proximity because they are separately scored (rather than averaged); (6) no judge is separately required to count the number of combinations for the purposes of execution; (7) players are not penalized for including more creative or stylistic elements at the expense of difficulty elements as long as they have sufficient difficult combinations.
We have also decided to replace the difficulty multiplier with a non-linear point scale (in difficulty only). This is to ensure that difficulty scores are not only sufficiently weighted but also sufficiently distinguished or spread. The degree of nonlinearity has not yet been determined. This change will ensure that players are encouraged to attempt highly difficult combinations. It reflects the fact that achieving difficulty scores on the higher end of the 1-10 scale is exponentially harder that achieving difficulty scores on the lower end of the 1-10 scale (in other words, it is much more difficult to hit a 10 combination instead of a 9 combination than it is to hit a 2 combination instead of a 1 combination).
Consecutivity marks, which were introduced in the existing system as a teaching tool, have been removed from the new system. This simplifies the interface and removes redundancy (difficulty scores should already reflect consecutivity).
Responses to Feedback
There have been some questions about how multiple disc routines will be judged, particularly with respect to catch ratio, phrasal judging, and variety. The answer is simple if unsatisfying: Judges are expected to judge what they see and nothing else. Inevitably, judges will not be able to judge every difficulty phrase or variety increment, but that is part of the cost-benefit analysis of a multiple disc routine. The same is true under the existing system: Judges can focus on only one thing at a time. The new system attempts to be more transparent about this limitation.
Another issue that has been raised involves the perceived complexity of the system. Although there are at least some members of the committee that believe simple rank judging would suffice, we have honored our commitment to conform as closely as possible to the existing judging system, which is more complex. We hope, however, that our reallocation of judging categories ensures that each judge is individually tasked with less than he or she was under the existing system. We also believe that much of any added complexity is handled by the judging program rather than the judges themselves.
Some members have asked whether the dial system will be part of the new judging system. It will not. But there will be real-time judging components (i.e., difficult, variety quantity, and execution) that audience members will be able to track during competition.
We want to again emphasize that our changes will not radically change the way routines are judged. As stated above, the changes are not designed to favor any particular style or strategy. We do not believe that anyone will win more or lose less as a result of this new system. All the changes apply equally to everyone.
We welcome additional feedback from the community. But again, we ask that everyone reserve final judgment until the new system undergoes further testing and until they have experienced it themselves.