Some Referees Just Blow Our Mind(Referees and Your Club)

What an absolutely stupid fucking article to write.



These statistics on their own are pretty much meaningless because they don’t take into account the probability of winning the matches under each referee.

A couple of simple examples to demonstrate my point.

Team A has a 35% win record under Referee X over 10 matches. This looks like Referee X is terrible for Team A. But Team A’s average probability of winning these matches under Referee X is only 36%. So they are winning at about the rate they should be under Referee X.

Team B has a 65% win record under Referee Y over 5 matches. This looks like Referee Y is pretty good for Team B. But Team B’s average probability of winning these matches under Referee Y is 75%. So they are winning at a lower rate under this referee than they would normally be expected to.

The way to do an analysis of referees properly to detect any bias would be to do the following:

  1. Use a large sample of matches to reduce the margin of error. 5 seasons of NRL matches gives you a bit over 1,000 games.

  2. For each referee and each team they referee, calculate the average number of points each team scores less or more than expected, based on the line market from the TAB. Also record the number of times each referee refereed each team. We’ll call this stat (pts scored less or more than the market expected) the Residual.

  3. From the whole sample, calculate Mean (average) and Standard Deviation (SD: a measure of how much the values in the sample vary) of the Residual.

  4. For each referee refereeing each team, calculate the Mean Residual of these matches. Then from these values, subtract the Residual Mean from Step 3. This gives you a figure of how much more or less each referee-team combination was from the Residual Mean in Step 3.

  5. For each value calculated from step 4, divide this by the SD of the whole sample in step 3. This gives you what’s known as a Z-score; how many standard deviations a referee-team combination is away from the mean of the whole sample.

  6. From the results in 5 filter out those referee-team combinations where the Z-score > 2 or < -2. These are the referee-team combinations that are suspect and worth investigating further.

  7. I won’t go into the detail here but the last step is to investigate the suspect combinations found in step 6 by working out whether this result was just a fluke or not. You do this by looking at the number of matches in each referee-team combination and then calculating the expected range of variance based on the number of matches in the referee-team combination sample.

A low number of matches in a referee-team combination sample would produce a higher expected variance and make it possible that this suspect combination was a fluke. There is technique to produce a definitive answer on this.

A high number of matches in a referee-team combination sample would produce a lower expected variance and make it unlikely that this suspect combination was a fluke. The technique mentioned above would give you a definitive answer on these combinations, allowing you to say that this was no fluke and must be due to causes other than randomness (ie referee bias). Using this technique, you could say for a particular referee-team combination:

The chances of the results of referee X refereeing Team Y being due to luck and not refereeing bias is 1 in XXX.

For referee-team combinations with an absolute z-score of > 2 and a high sample, say 60 matches, this chance could easily be in the range of 1 in 100 to 1 in 1,000. That is, highly unlikely to be due to be bad luck.

It would be interesting to actually do this analysis…but getting all the data into a usable form takes time.


Thanks Govetts,my head now hurts!

1 Like

Sorry Govetts, as they moved to one referee the only sample size that is relevant, is this season. You can’t you use previous seasons when there was one referee as it would be inconsistent due to the age of referees then and now and new referees. Plus this season has also rules alterations with the six again, so they would need to be factored in as well. As would likely hood of any sin bins, ie, are any referees more prone to give a sin binning than others or is there a consistency of sin bins given after a flow of penalties.

You would also need to take into account if any particular player(s) tended to be penalised more frequently under a particular ref.

1 Like

I should mention there is a potential flaw in this approach; by using the line betting markets to calculate the residual, bias may be masked in some referee-team combinations by the market comprehending this bias.

So an alternative would be to use a reliable set of power ratings; ratings calculated from margins of victory from all preceeding matches. These ratings would be independent of any market comprehension of referee bias.

But these ratings may not be as accurate as the market so ideally you would do both.

1 Like

Two referees would make it harder but it still would be possible to do the analysis. It would just mean more combinations to analyse and potentially more variance.

You don’t need to look at sin bins, penalties or anything else; just how much less or more on average each team performs under each referee than expected.

Love working on stats and ratings and use them alot with my betting, I am not a huge gambler, I mean my wagers are around the $10-20 mark, I don’t have it in me to bet larger amounts. I use my stats and ratings in footy multis, I only need a couple to come through each season to be hundreds in front.

I always come out on top. For example the rare times I go a casino, I only play craps or roulette as they statistically give me the best chances of winning, always end up with more than I start with, when I use my strategies.

With the dogs the favorite wins around 20% of the time, dogs with odds greater than $10 win 10% of the time. Please do not use this to gamble are there are other variables to account for in a race to rate it properly.

It is just a bit of fun that keeps my mind active at my age.

Yes, I use my power ratings (ratings derived from using MS Excel Solver function using margin of victory and calculated home ground advantage) to have a little dabble at multi-bet margin combinations for a bit of fun.

I haven’t been doing it much this year because I’ve spent my time experimenting to see if I can improve my ratings with injury and NRL Fantasy data.

One way to handle the two referees, mixed in with single referee data would be to allocate each referee (main, pocket) a standard % that they control the match and then apply these percentages to each match they control.

For example, let’s say we set the percentages as:
Main referee: 75%
Pocket referee: 25%

And there was a match these referees controlled and the team scored 10 pts more than expected. Then you would allocate the following to each referee:

Main: 7.5 pts (75% of 10)
Pocket: 2.5 pts (25% of 10)

1 Like

I thought it was stupid because it’s a lazy piece of journalism and straight out ref bashing.

We don’t need even more ammo for the crazy’s out there.

There are refs being abused and bashed in recent weeks in park footy.

This bullshit doesn’t help the mindset that a ref is against your side.

It’s just plain stupid and wrong.

1 Like

I am the same**,bet within your means.**

I take Quaddies but it can be along time between drinks and I do not take more than 50% of the market in each leg and that way when I get one it is a decent dividend.
I back a few league teams to win the comp and in past years I have crushed back thru Betfair but as they take 10% I shall not be crushing with them anymore.

I have also backed the RL outsider to lead at 1/2 time for an early payout because sometimes the fav team take it easy early in the contest.

Agree the article was stupid and wrong. I also think that on-field referees have a tough job, particularly single referees. I also think that most referees are probably not biased.

But I do have doubts about some referees when refereeing particular teams.

This is why I think you need to do a statistical analysis of their performance like I have suggested above. That way you will be able to conclusively either rule out any bias or identify where it is actually occurring, and in that case, that would then give the authorities an opportunity to address any occurrences of bias.

Not sure about team bias but player bias does seem to happen, they get a rep abit like Sam

people complain about klein a lot

lo and behold if you look up the stats we get the most favourable penalty counts from him out of any club

1 Like

I prefer Gerard Sutton or Atkins to Klein.