Never Bet a Horse Named Joe: Update
Several years back, I tested a theory that horses with popular boys or girls names were overbet in parimutuel markets. My hunch was that the betting public is more likely to bet on a horse if that horse's name contained their own name (or that of their wife, son, daughter, etc.). I arrived at that hunch by extrapolating from a sample size of one - me.
What I found in that original analysis, using a limited dataset of races run in California, was weak evidence for my theory, but that fell short of statistical significance. However, I now have a much more robust dataset, consisting of nearly all races run in North America over the past four years.
With this new dataset of some 200,000 races, I ran a logistic regression on the probability of a horse winning a race using the following variables:
What I found in that original analysis, using a limited dataset of races run in California, was weak evidence for my theory, but that fell short of statistical significance. However, I now have a much more robust dataset, consisting of nearly all races run in North America over the past four years.
With this new dataset of some 200,000 races, I ran a logistic regression on the probability of a horse winning a race using the following variables:
- logodds: the logarithm of the horses betting odds, adjusted for the track take
- boysname: a binary variable indicating whether the horse's name contains one of the top 100 boys names over the past 100 years
- girlsname: a binary variable indicating whether the horse's name contains one of the top 100 girls names over the past 100 years
Under a perfectly efficient market, the first two terms (intercept and logodds) would be 0 and -1 respectively, The fact that they aren't are a reflection of the favorite longshot bias, in which favorites are underbet relative to their true odds and long shots are overbet.
The next term, boysname, is statistically significant, and directionally indicates that boys names are overbet. As an example, assume a horse's odds imply a win probability of 20%. If the horse's name does not contain a boys name, the true probability is 20.5% according to the favorite longshot bias correction of the first two terms. But if the horse contains a boy's name, then the true probability drops to 19.6%.
So, it turns out my original hunch was on the mark, at least for boys names.
Girls' names however fall short of statistical significance, although the coefficient indicates possible overbetting, but not to the extent with boys' names.
So, if you're betting the races at Churchill Downs today, you may want to avoid bets on Dan the Go To Man, Dragon Drew, Shane Zain, or My Boy Jack.
Postscript
In my original post, I made reference to Alan Woods, the former actuary who made millions using statistical models to bet horses in Hong Kong. You can read more about Alan Woods, and a lot more about his former partner, Bill Benter, in this excellent profile from Kit Chellel from Bloomberg.
Leave a Comment