Who really are the leading tryscorers for 2021?

Some more lockdown fun with some statistical modelling on try scorers.

As mentioned in a previous post, I’ve been working on a try scoring model to try and identify who really are the best try scorers.

After quite a bit of work involving 1,711 games, 3,292 players and over 350,000 data points from the 2013-20 seasons, I have developed a good model that predicts the number of tries scored based on:

Position
Team Tries
Runs
Metres gained
Line Breaks
Tackle breaks

The model has a goodness of fit rating (Multiple R) of 97.17% with a standard error of +/- 0.94 tries, so it’s very accurate in predicting how many tries a player should score based on the above factors.

The logic to rate try scorers is to use the model’s formula to work out how many tries a player should have scored based on their above performance factors. Then, subtract the number of tries they actually have scored to get a Tries Above/Below Expectation figure. From that, you can then calculate a % of tries they have scored above or below expectation.

For example, if the model suggested that Player X should have scored 10 tries this season based on their Position, Team Tries, Runs, Metres gained, Line Breaks and Tackle breaks, but they have only scored 8 tries, then their Tries Above/Below Expectation would be -2. Expressed as a % of their expected tries this would produce a rating of -25% (-2 divided by 8, multiplied by 100).

This % rating allows you to compare players who have scored different numbers of tries.

The position factor was calculated from the distribution of tries across all positions from the 2013-20 seasons. This effectively compensates players who play in positions that historically don’t score a lot of tries. So if a player scores more tries than average for that position, their rating will be higher. The model works similarly for the other performance factors as well.

So without further adieu, onto the ratings below.

I used a cut-off of 6 tries because that is what the NRL uses for their current leading try scorers, plus with a small number of tries the model can produce some volatile (eg fluke) results.

NRL Leading Try Scorers 2021: up to Round 18

Rank Name Tries Expected tries Tries Above/Below Expectation % Tries Above/Below Expectation
1 Sam Walker 6 3.1 2.9 92.40%
2 Angus Crichton 7 3.8 3.2 83.80%
3 Stephen Crichton 6 3.6 2.4 65.50%
4 David Nofoaluma 12 7.6 4.4 58.90%
5 Kevin Proctor 6 3.8 2.2 56.80%
6 Reimis Smith 11 7.0 4.0 56.00%
7 Cody Walker 10 6.5 3.5 54.10%
8 Matthew Dufty 10 7.4 2.6 34.50%
9 Kalyn Ponga 6 4.5 1.5 34.00%
10 Jason Saab 16 12.1 3.9 32.30%
11 Isaiah Papali’i 7 5.4 1.6 29.70%
12 Adam Doueihi 7 5.6 1.4 24.10%
13 Brett Morris 11 9.2 1.8 19.00%
14 David Fifita 12 10.2 1.8 18.20%
15 James Tedesco 7 5.9 1.1 17.70%
16 Tom Trbojevic 14 12.0 2.0 16.40%
17 Mikaele Ravalawa 9 7.9 1.1 14.20%
18 Nathan Cleary 8 7.0 1.0 13.80%
19 Xavier Coates 8 7.1 0.9 12.30%
20 Josh Addo-Carr 19 17.4 1.6 9.30%
21 Latrell Mitchell 6 5.5 0.5 8.60%
22 Joseph Manu 6 5.6 0.4 6.80%
23 Ryan Papenhuyzen 8 7.5 0.5 6.60%
24 Brandon Smith 9 8.7 0.3 2.90%
25 Murray Taulagi 10 9.8 0.2 2.10%
26 Viliame Kikau 8 8.0 0.0 -0.40%
27 Jordan Rapana 9 9.1 -0.1 -0.90%
28 Maika Sivo 15 15.8 -0.8 -5.20%
29 Clinton Gutherson 12 12.7 -0.7 -5.50%
30 Ken Maumalo 8 8.6 -0.6 -7.40%
31 Charlie Staines 13 14.1 -1.1 -7.60%
32 Ben Murdoch-Masila 6 6.5 -0.5 -8.10%
33 Sitili Tupouniua 8 8.8 -0.8 -8.90%
34 Brian Kelly 6 6.6 -0.6 -9.00%
35 Daine Laurie 7 7.8 -0.8 -10.60%
36 Dane Gagai 8 9.0 -1.0 -11.10%
37 William Kennedy 8 9.2 -1.2 -12.60%
38 George Jennings 11 12.7 -1.7 -13.40%
39 Matt Ikuvalu 11 12.7 -1.7 -13.60%
40 Sebastian Kris 6 7.0 -1.0 -14.60%
41 Matt Burton 12 14.1 -2.1 -14.80%
42 Justin Olam 8 9.5 -1.5 -15.80%
43 Connor Tracey 10 12.1 -2.1 -17.10%
44 Kyle Feldt 7 8.7 -1.7 -19.40%
45 Josh Morris 7 8.7 -1.7 -19.50%
46 Alex Johnston 24 30.2 -6.2 -20.60%
47 Blake Ferguson 6 7.8 -1.8 -23.40%
48 Tommy Talau 8 10.7 -2.7 -25.30%
49 Brad Parker 7 9.5 -2.5 -26.00%
50 Corey Thompson 7 9.5 -2.5 -26.60%
51 Taane Milne 6 8.2 -2.2 -26.60%
52 Campbell Graham 6 8.5 -2.5 -29.30%
53 Alexander Brimson 6 8.5 -2.5 -29.60%
54 Ronaldo Mulitalo 6 8.6 -2.6 -30.00%
55 Brian To’o 9 13.0 -4.0 -30.80%
56 Daniel Tupou 8 11.7 -3.7 -31.70%
57 Cody Ramsey 6 9.3 -3.3 -35.40%
58 Reuben Garrick 14 22.7 -8.7 -38.20%
59 Reece Walsh 6 9.7 -3.7 -38.30%
60 Jahrome Hughes 9 14.8 -5.8 -39.10%
61 Hamiso Tabuai-Fidow 6 11.3 -5.3 -47.00%
62 Nicho Hynes 6 11.8 -5.8 -49.10%

I know some people will look at this list and see some good try scorers like our own Alex Johnston and Maka Sivo down the list in negative territory, and say this is garbage, but what this model shows is that for the amount of runs, metres, tackle breaks and line breaks they have made, plus the opportunities given to them by virtue of their position, they should have really scored more tries and so are rated as underperforming.

Bombed tries will negatively impact a player’s rating because they usually make a combination of metres, tackle breaks and line breaks before they don’t score the try.

Conversely, players with limited opportunities that “punch above their weight” and score tries will rate highly. Kevin Proctor, a prop who has scored 6 tries is the prime example here.

3 Likes

Hi Govettsleap,
This looks great and no doubt you have spent countless hours dedicated to this. But can I ask the AJ question, if in 2020 he was top tryscorer with 23 for the season, in 2019 it was Sivo with 20, an expected count of 30 tries would be amazing would it not?
I mean if he didn’t do his hamstring he is probably a chance at 30 but is 30 for the season not a little higher then normal?
And why so many more than any other player including the Foxx?

I guess the broad answer here GTR is that scoring tries in rugby league is not all about the individual and their tries scored.

RL is a team game and tries scored by individuals are mostly not scored in isolation. They are the result of the team providing opportunities to score as well as the individual’s ability to convert those opportunities into tries.

Try scoring opportunities are unequally distributed through positions and teams.

Its these variables and inequalities that the model compensates and penalises players for to provide a clear picture of who actually is scoring more or less than their opportunities warrant.

Playing on the end of a high class backline, AJ is privileged with the lots of opportunities to score tries. The model says given his opportunities across a range of performance factors, he has scored less than what you would expect.

Is the model flawed then?

Not according to the statistical standards of multivariate linear regression modelling. It achieves a very high goodness of fit and all the coefficients (variables) are highly significant (meaning the chance of the goodness of fit being a fluke are lottery odds). For stat nerds, the significance of the model is in excess of 99.9999%. which means the chances of the model performance being a fluke is about 1 in a million+.

It’s easy to be enamoured by a raw try scoring tally or even the better tries per game rate. But the reality is that looking at individual raw statistics in a team game is misleading. The issue of actually who is the best try scorer is more complex and nuanced than who has scored the most tries (for reasons explained above).

How have other variables such as rule changes, refereeing outcomes etc been included in the analysis?

It’s all getting to complex. I think you just need to watch some porn. :stuck_out_tongue_winking_eye:

3 Likes

Good question Bugs. Not really (apart from incorporating 2020 data into the model) and not yet.

What I have done here is develop a formula to predict total tries scored by players based on selected performance data from the 2013-20 seasons.

What I plan to do soon though is run separate regressions (the formula making process) for 2013-19 and 2020-21 seasons and then compare the formulas and their performance to see if there are any significant changes there.

I also plan to use the 2020-21 seasons as a holdout sample, where I use the formula derived from 2013-2019 dataset to make try scoring tally predictions for the 2020-21 seasons and then analyse how accurate these predictions were. This analysis will tell me if the rule changes made to the game during 2020-21 have significantly affected the relationship between those performance factors used in the model and try scoring or not.

I’ve kind of already done this for the 2021 season results above in a rough way but I want to do this more formally for two seasons rather than one.

Finally, last night I reworked and slightly improved the model by adding minutes played and post contact metres to the mix. Adding both of these variables slightly (but significantly) improved the model’s accuracy.

1 Like

Good idea. See you all in an hour!

1 Like

An hour? Are you still on dial up internet? :joy:

2 Likes

I normally have a little nap after the action.

2 Likes

Urgh… I love maths and stats also, but you have an overfitting issue here with considering too many variables.

I don’t think this is useful or insightful as it’s inaccurate.

1 Like

It should tell you something that all the wingers are apparently doing poorly according to your model?

Also using the integer value they “missed” or “exceeded” by to rank seems incorrect, you should be using a % vs expected.

Also sample sizes are way to small to be statistically relevant.

Raxi,

Overfitting
You could be right. But the only way to determine this is to do a test on a holdout sample (eg 2013-19 dataset versus the 2020-21 dataset). The results of that test will prove if overfitting is happening or not. I’ll probably do this over the next few days.

Wingers performance indicates questionable integrity of model?
Maybe, maybe not. Only way to determine this is to do a significance test between the population and wingers. I’ll try and do this over the next day or so.

Integer value versus % value
I think you are misreading my results above. I actually am ranking players by % and not be the integer value.

This one’s open
image

1 Like

Good responses.

Wingers - Maybe / maybe not means you aren’t sure.

Data Set - As a rough rule of thumb, your model should train on at least an order of magnitude more examples than trainable parameters. Simple models on large data sets generally beat fancy models on small data sets, I think you have a fancy model with a small data set, as you have too many parameters vs tries scored

So take out from this, is as you have so many parameters, and barely any tries to work with, the best measure here is to strip it down and literally consider only:

-total player tries
-the minutes they played
-teams total tries while they they were on the field, as a try they couldn’t score by virtue of not being on the field should not factor.

How this is then presented is an interesting question I can think a few ways.

There is no absolute science, comparing positions and roles is hard, 99.9% accuracy is interesting, see already the issues with positions. Why should wingers be penalised if the expectation is they score lots of tries? What’s more interesting is showing the positive outliers per position, rather than trying to compare a winger with a second rower.

Thanks for the advice Raxi,

Sample size and parameters
From my research on this, it seems there is no clear answer and it all depends on the complexity of the problem you are trying to solve with the model. Whilst there are minimums suggested, these aren’t really useful because I’d like to try and achieve maximum accuracy without overfitting.

I have 12,182 tries in the dataset. So that’s 4 orders of magnitude, which based on your rule of thumb, suggests 3 variables to predict total tries scored by individual players.

Approach
I’m going to rebuild this model from scratch, adding one variable at a time and each time testing the model against the holdout sample. If the holdout sample’s Adjusted R^2 value increases compared to the previous run and the added variable’s P-value is <.05 , I’ll keep that variable then move on to the next variable until I reach the point where the holdout sample’s Adjusted R^2 value isn’t increasing.

From what I’ve read, this is the recommended approach to guard against overfitting.

Team tries while the player was on the field
That’s a good suggestion but it will be a challenge to source and manipulate this data.

I can scrape when tries were scored from the NRL website. I also already have interchange data. So from those two sources I could create times on field records for player then cross-reference these against try scoring times to come up with an adjusted team tries total for each player.

Positions
To come up with numerical values for positions I did the following:

  1. Calculated the % of tries scored by each singular position (eg if wingers scored 24% of tries then the % value here would be 12% because there are two players in this position).

  2. Set the lowest ranking position (Interchange) at 1.

  3. Scaled each position relative to the Interchange % value from 1. For example, if the Interchange position scored 3% of tries and the wing position scored 12% of tries then the wing factor would be 4.

So what this variable does is reward players who score tries at a higher rate and penalise players that score at a lower rate than their position average.

I think this approach is sound, but given your querying of the wing results, I’ll go back and check the calculations on the values for this variable and also how the wing residuals plot against the fitted value (tries scored) to see if there are any patterns that suggest bias here.

Once I finish this work I can then also present the results per position as you suggest, for those that don’t like the adjustments made by the Position variable.

Some updated ratings based on my recent work.

The process

A brief rundown of the process…

  1. Based on Raxi’s advice, I stripped back the model to bare basics, focusing on metrics that only provide try scoring opportunity and not skill:
  • Game Minutes Played (GMP: 4800 minutes played = 1 GMP)
  • Position Value (PosVal: based on the average % distribution of tries by position)*
  • Team Tries (TT: total tries scored by the team in games the player played in)

So this means I deliberately left out line breaks and tackle breaks.

[*] Note; a weighted average for this value was calculated for players that have played a mix of positions.

The logic is that by only forecasting on opportunity, you get the players skill level mostly contributing to the results above or below expectation.

  1. Derived a forecasting formula via multivariate linear regression using the 2013-20 results (Adjusted R Squared value [goodness of fit] = 74.7%)

  2. Applied this forecasting formula against 2021 data up to Round 19 to produce two statistics:

(i) Tries scored +/- expectation
(ii) % of tries scored +/- expectation (tries scored +/- expectation / tries scored * 100)

  1. These raw results included “noisy data”; extreme results produced by players with a small number of games and tries. These were obviously “fluke” results. But to weed out this noisy data I needed a process to sift the noise from the signal. So, in short, I applied a statistical test on games played and tries scored, working from the highest numbers downwards to find the point where the results became more noise than signal.

Those points were 8 games and 7 tries.

  1. Raxi’s (good) suggestion of screening team tries for the time players were on the field is yet to be implemented (it will be a mission in Excel to pull that one off)

2021 Try Scoring Ratings (up to Rnd 19)

I have produced two sets here to satisfy different preferences.

Some people would simply like to know who has scored more tries than expected, regardless of how many tries they have scored compared to other players. This is the Tries +/- Expectation table.

Other people may be interested in who has scored more tries than expected, relative to the number of tries they have scored. This is the Tries +/- Expectation % table. This statistic measures who is most “punching above their weight” in try scoring.

Personally, I prefer the Tries +/- Expectation % statistic because it removes the greater opportunity wingers and fullbacks have to score more tries.

Tries +/- Expectation %

Rnk Name Tries Exp tries Tries +/- Exp % Tries +/- Exp
1 Brandon Smith 9 -0.6 9.6 107.22%
2 David Fifita 13 2.3 10.7 82.62%
3 Sitili Tupouniua 10 2.5 7.5 75.22%
4 Viliame Kikau 8 2.2 5.8 71.93%
5 Nathan Cleary 8 2.7 5.3 66.62%
6 Angus Crichton 8 2.9 5.1 63.35%
7 Isaiah Papali’i 7 2.7 4.3 61.66%
8 Jahrome Hughes 9 3.5 5.5 61.23%
9 Tom Trbojevic 15 7.0 8.0 53.50%
10 Cody Walker 11 5.3 5.7 51.81%
11 Alex Johnston 24 13.7 10.3 42.78%
12 Matt Burton 12 7.0 5.0 41.91%
13 Josh Addo-Carr 21 13.4 7.6 36.11%
14 Reimis Smith 12 7.9 4.1 34.24%
15 Clinton Gutherson 12 8.3 3.7 30.63%
16 Daly Cherry-Evans 7 5.5 1.5 21.52%
17 Reece Walsh 7 5.6 1.4 20.54%
18 Matthew Dufty 10 8.4 1.6 16.25%
19 Maika Sivo 16 14.2 1.8 11.42%
20 Justin Olam 9 8.1 0.9 10.14%
21 Adam Doueihi 8 7.2 0.8 9.70%
22 Latrell Mitchell 7 6.5 0.5 6.64%
23 Jason Saab 16 16.2 -0.2 -1.02%
24 Charlie Staines 14 14.2 -0.2 -1.55%
25 James Tedesco 7 7.1 -0.1 -1.90%
26 Matt Ikuvalu 11 11.8 -0.8 -7.09%
27 Reuben Garrick 14 15.1 -1.1 -8.21%
28 William Kennedy 9 9.8 -0.8 -8.65%
29 Dane Gagai 8 8.7 -0.7 -9.35%
30 Connor Tracey 11 12.3 -1.3 -11.71%
31 Josh Morris 7 8.1 -1.1 -15.33%
32 Joseph Manu 7 8.2 -1.2 -16.57%
33 George Jennings 11 12.9 -1.9 -16.99%
34 Stephen Crichton 7 8.6 -1.6 -22.36%
35 David Nofoaluma 12 16.1 -4.1 -34.09%
36 Tommy Talau 8 11.0 -3.0 -37.59%
37 Brian To’o 10 13.9 -3.9 -39.03%
38 Jordan Rapana 10 14.1 -4.1 -40.68%
39 Daine Laurie 7 10.0 -3.0 -42.18%
40 Murray Taulagi 10 14.9 -4.9 -49.07%
41 Brad Parker 7 10.6 -3.6 -50.84%
42 Ken Maumalo 10 15.1 -5.1 -51.11%
43 Mikaele Ravalawa 9 13.7 -4.7 -52.04%
44 Daniel Tupou 9 14.5 -5.5 -61.58%
45 Xavier Coates 8 13.6 -5.6 -69.75%
46 Corey Thompson 7 12.6 -5.6 -80.70%
47 Kyle Feldt 8 15.1 -7.1 -88.88%
48 Ronaldo Mulitalo 7 13.7 -6.7 -96.25%

Tries +/- Expectation

Rnk Name Tries Exp Tries % Tries +/- Exp Tries +/- Exp
1 David Fifita 13 2.3 82.62% 10.7
2 Alex Johnston 24 13.7 42.78% 10.3
3 Brandon Smith 9 -0.6 107.22% 9.6
4 Tom Trbojevic 15 7 53.50% 8.0
5 Josh Addo-Carr 21 13.4 36.11% 7.6
6 Sitili Tupouniua 10 2.5 75.22% 7.5
7 Viliame Kikau 8 2.2 71.93% 5.8
8 Cody Walker 11 5.3 51.81% 5.7
9 Jahrome Hughes 9 3.5 61.23% 5.5
10 Nathan Cleary 8 2.7 66.62% 5.3
11 Angus Crichton 8 2.9 63.35% 5.1
12 Matt Burton 12 7 41.91% 5.0
13 Isaiah Papali’i 7 2.7 61.66% 4.3
14 Reimis Smith 12 7.9 34.24% 4.1
15 Clinton Gutherson 12 8.3 30.63% 3.7
16 Maika Sivo 16 14.2 11.42% 1.8
17 Matthew Dufty 10 8.4 16.25% 1.6
18 Daly Cherry-Evans 7 5.5 21.52% 1.5
19 Reece Walsh 7 5.6 20.54% 1.4
20 Justin Olam 9 8.1 10.14% 0.9
21 Adam Doueihi 8 7.2 9.70% 0.8
22 Latrell Mitchell 7 6.5 6.64% 0.5
23 James Tedesco 7 7.1 -1.90% -0.1
24 Jason Saab 16 16.2 -1.02% -0.2
25 Charlie Staines 14 14.2 -1.55% -0.2
26 Dane Gagai 8 8.7 -9.35% -0.7
27 William Kennedy 9 9.8 -8.65% -0.8
28 Matt Ikuvalu 11 11.8 -7.09% -0.8
29 Josh Morris 7 8.1 -15.33% -1.1
30 Reuben Garrick 14 15.1 -8.21% -1.1
31 Joseph Manu 7 8.2 -16.57% -1.2
32 Connor Tracey 11 12.3 -11.71% -1.3
33 Stephen Crichton 7 8.6 -22.36% -1.6
34 George Jennings 11 12.9 -16.99% -1.9
35 Daine Laurie 7 10 -42.18% -3.0
36 Tommy Talau 8 11 -37.59% -3.0
37 Brad Parker 7 10.6 -50.84% -3.6
38 Brian To’o 10 13.9 -39.03% -3.9
39 Jordan Rapana 10 14.1 -40.68% -4.1
40 David Nofoaluma 12 16.1 -34.09% -4.1
41 Mikaele Ravalawa 9 13.7 -52.04% -4.7
42 Murray Taulagi 10 14.9 -49.07% -4.9
43 Ken Maumalo 10 15.1 -51.11% -5.1
44 Daniel Tupou 9 14.5 -61.58% -5.5
45 Xavier Coates 8 13.6 -69.75% -5.6
46 Corey Thompson 7 12.6 -80.70% -5.6
47 Ronaldo Mulitalo 7 13.7 -96.25% -6.7
48 Kyle Feldt 8 15.1 -88.88% -7.1

Comments
Some people may query players having a negative try expectation rating. This is because there is a constant (the intercept which is negative) in the formula and until players rack up enough opportunity data (GMP and TT) to overcome this constant they will have a negative expected tries rating.

Brandon Smith and David Fifita are exceptional players, playing in positions that historically don’t score as many tries as outside backs. They are both great at breaking tackles and turning those runs close to the line into tries. This has translated into high ratings on both scales.

Factors

Team Tries relationship
The Team Tries factor turned out to be a weak, negative factor of -0.018. This means that for every 100 team tries scored, the expected tries forecast for a player was reduced by 1.8 tries.

So as an example, if you had two players, A & B who had equal opportunities in other areas, but A’s team had scored 100 team tries, and the B’s team scored 50 tries, the expected tries of player B would be only 0.9 tries higher than player A.

This basically means that (good tryscoring) players playing in weaker teams that score less tries than better teams have a slight advantage.

Runs & Metres
I tested this against holdout samples (2013-19 v’s 2020-21) and they did not improve the forecasts.

Other possible improvements

Try location
Where teams score their tries laterally (left, middle, right) could be incorporated into the model so that players in positions where their team either score more or less can be appropriately compensated or penalised. I’ve seen summary data of this statistic before but am not sure if its available on a game by game basis.

Tackles in opposition 20
I think this opportunity statistic would be worth testing. Runs overall turned out to be a weak insignificant indicator, mainly because most runs are made where there is little or no try scoring opportunity; so lots of noise, little signal. The TO20 statistic may provide more signal than noise though.

3 Likes

Appreciate your work mate. Can you help me with my tax return?

Might have done the formula for the tik tok guy to predict case numbers.

1 Like

I did advanced stats as a management elective about 20 years ago. I chose cricket as my basis to develop the skills as I watched casually and didn’t hold too much interest. It became an obsession. If I went for a beer or dinner, everyone refrained from talking cricket. I used to have nightmares about bell curves. The lecturer would put a question on the boards to discuss and ask me not to comment. Good luck getting sane again. :mask:

1 Like

Yep, total garbage.