Heya all,

I've been reading for a while to try and find a formula that will give some kind of weighted representation of link ranking based on both the average rating for a link and the number of votes. Obviously just using average ranking*number of votes doesn't work. With that formula a link with 300 votes averaging 2 will score higher than a link with 100 votes scoring 5. That's not the result I want and is a bad statistical representation.

In my case, currently I would like to use the Review Ratings (not the link votes, although I will probably tweak any global for that in the future) and the number of reviews written (again, not the number of votes per link) to come up with a ranking value. I think I've found a good formula. Who wants to give the global a go?

I'm borrowing from The Internet Movie Database and some of the explanation from WOW Web Designs. I'd like to use the same true Bayesian estimate formula used by the Internet Movie Database, for calculating average ratings.

where:

R = average for the design (mean) = (Rating)

v = number of votes for the design = (votes)

m = minimum votes required to be listed in top 10

C = the mean vote across dimension

This formula normalizes scores, that is it pulls a particular score (R) to the mean (C) if the number of votes is not well above m. In other words, if a particular design has only a few votes above the minimum required votes to be listed in top 10 (m), the average score is decreased a little if it is above the mean, or increased a little if it is below the mean in accordance with the normal distribution rule of statistics.

Here is an example:

WR = (6 / 10) * 5.33 + (4 / 10) * 7.18 = 6.07

| | | | | |

v v+m R m v+m C

The formula normalizes the average rating of a relatively low rated design from 5.33 to 6.07 since the number of votes (v=6) is only slightly above the minimum required votes (m=4) and the mean across the dimension (C=7.18) is quite high. If, in the future, this particular design gets more votes, the difference between R and C will increase as the number of votes increase. The idea is that the more the votes, the more representative the average rating is.

So (and I've been told I ask for much ) I'm looking for a global that will apply this formula across a set of links and obviously only consider those links with v >= m for the final output but use all reviews to calculate C. It will probably make sense return the results to 2 decimals and if there's a tie between some records to then sort them (second level) according to original average rating per link.

Here's a global where Laura has done some related calculations already.

Safe swoops

Sangiro

I've been reading for a while to try and find a formula that will give some kind of weighted representation of link ranking based on both the average rating for a link and the number of votes. Obviously just using average ranking*number of votes doesn't work. With that formula a link with 300 votes averaging 2 will score higher than a link with 100 votes scoring 5. That's not the result I want and is a bad statistical representation.

In my case, currently I would like to use the Review Ratings (not the link votes, although I will probably tweak any global for that in the future) and the number of reviews written (again, not the number of votes per link) to come up with a ranking value. I think I've found a good formula. Who wants to give the global a go?

I'm borrowing from The Internet Movie Database and some of the explanation from WOW Web Designs. I'd like to use the same true Bayesian estimate formula used by the Internet Movie Database, for calculating average ratings.

Code:

weighted rank (WR) = (v / (v+m)) * R + (m / (v+m)) * C where:

R = average for the design (mean) = (Rating)

v = number of votes for the design = (votes)

m = minimum votes required to be listed in top 10

C = the mean vote across dimension

This formula normalizes scores, that is it pulls a particular score (R) to the mean (C) if the number of votes is not well above m. In other words, if a particular design has only a few votes above the minimum required votes to be listed in top 10 (m), the average score is decreased a little if it is above the mean, or increased a little if it is below the mean in accordance with the normal distribution rule of statistics.

Here is an example:

Code:

WR = (6 / 10) * 5.33 + (4 / 10) * 7.18 = 6.07

| | | | | |

v v+m R m v+m C

The formula normalizes the average rating of a relatively low rated design from 5.33 to 6.07 since the number of votes (v=6) is only slightly above the minimum required votes (m=4) and the mean across the dimension (C=7.18) is quite high. If, in the future, this particular design gets more votes, the difference between R and C will increase as the number of votes increase. The idea is that the more the votes, the more representative the average rating is.

So (and I've been told I ask for much ) I'm looking for a global that will apply this formula across a set of links and obviously only consider those links with v >= m for the final output but use all reviews to calculate C. It will probably make sense return the results to 2 decimals and if there's a tie between some records to then sort them (second level) according to original average rating per link.

Here's a global where Laura has done some related calculations already.

Safe swoops

Sangiro