PROJECT #1 – Predictive Analytics RESULTS!
“BEATING THE SPORTSBOOK”
PhD Candidate (Grad 2015)
Electrical / Biomedical Engineering
A Look at the Model’s Performance
So the 2014 test of the prediction model has wrapped up. How did we do?
Here’s the summary statistics: The overall prediction accuracy was 63%. Hypothetical wagers totaled $8,316 and payouts totaled $1,090, for an average ROI (per wager) of 13%. Payouts for each event and cumulative payout over the year is shown in the following chart.
The overall upward trend looks great, although there’s a frustrating run of break-evens and losses starting mid-October. The above chart shows cumulative payout if the maximum bet is fixed at $1,000. The next chart shows the results if the maximum bet is a percentage of the account balance, with an assumed $10,000 start balance.
So the account balance soars to over $40,000 before dropping back to a more reasonable $19,069 – an overall annual return of 91%. Compare that to a stock index fund like the S&P 500, whose 2014 annual return was around 14%.
I’ve had several people look at the downturn between October-December and say, “Yikes, what happened there?” There are several possible scenarios to answer that question:
The worst-case scenario might be that the sport of MMA is evolving. Remember the model is pattern-recognition based: it looks for patterns in past fight data that are useful in forecasting future fight outcomes. It’s possible that the nature of the sport is changing; making patterns observed in past fights no longer apply to new fights. In this case the model would be obsolete and a new approach would be needed.
Actually, the worst-worst-case scenario is that the patterns observed in past data were not patterns at all – but rather chance occurrences and coincidences. This would mean the model was making predictions using false patterns, and was never capable of accurate predictions in the long-term. The model’s good performance from January to October would seem to refute this scenario. Still, longer-running coincidences have been observed in sports; just look at the super bowl indicator.
The best-case scenario (and in my opinion the most likely) is that the down-turn is a natural and inevitable consequence of the random element of the sport, and the imperfect nature of the forecasts. There is an element of randomness to MMA fights. That’s why they’re exciting to watch: anything could happen, even in fights with a clear favorite. Randomness does not mean unpredictability, but it does mean you’ll get some (ok, many) predictions wrong. The prediction accuracy for the year was 63% – but that’s actually better than expected. My tests suggest the long-term accuracy should be around 56%. And when you’re getting 44% of your predictions wrong, it’s reasonable to expect runs of bad performance.
Despite the last three months, the overall performance for the year is encouraging. The model doesn’t perform as well as some individual bettors (I know a few who can make predictions more accurately), but does outperform the betting market as a whole. The chart below plots the model’s payout against the payout you would have seen if you had always bet on the favored fighter.
The strategy of always picking favorites is pretty much break-even in the long term, as we’d expect if the betting market is an efficient one. This project has been an exciting application of machine learning to find market inefficiencies.
The model performed well during the 2014 test. But there are a lot of ways it could be made better. There are a few avenues I’d like to pursue on this front – some are existing machine learning techniques that I’d like to implement, others are new ideas I’m working on.
In the immediate future I’ll be taking a break from MMA forecasting to do two things:
- Improve the prediction algorithm, and
- Re-purpose the algorithm for use with other, more main-stream sports.
Oh yeah, and if I have time I’ll try to finish my PhD by my April goal… J
In the meantime, if anyone wants to try making a business out of analyzing this or other sports (and has a large amount of money to kick in), please let me know!
Eric Chalmers / firstname.lastname@example.org
NOTE: If you have a similar Predictive Analytics project you’d like to test out live here on the www.datasciencereport.com like Eric did, please email: Ted O’Brien email@example.com
Some background on PROJECT
I (Eric) finished a bachelor’s degree in electrical engineering from the University of Alberta in 2011. I interned at Halliburton’s electronics engineering department and Alberta Health Services’ research & technology department, mostly doing electronic design and embedded C programming. After working at Alberta Health I made the mistake… I mean, I decided to do a PhD 😉
The original goal of my PhD work was to develop an electronic device for use in scoliosis treatment: The device would automatically deliver the optimal treatment dosage to the patient. I soon realized however, that doctors have only a vague idea of what the optimal dosage actually is. I realized that the ability to predict treatment outcome would go a long way toward understanding optimal treatment: because if you can forecast treatment success or failure, you also know what treatment will achieve success.
So I started learning pattern recognition and computational intelligence. I found predictive modelling captivating. I realized that with the right information and the right techniques, you could actually predict what would happen to a scoliosis patient two years in the future. The machine learning process itself could also reveal new information that had gone unnoticed before. I was hooked.
But doing predictive modelling in a biomedical research setting (i.e. keeping doctors and academics happy at the same time) can be challenging. The doctors have often been doing things the same way for decades; they value experience over analytics. Academics are generally rewarded for publishing; they value novelty & complexity over functionality. Simply designing an effective and efficient prediction model doesn’t quite satisfy anyone’s reward system.
So early in 2014 I started working on a model to predict the outcomes of mixed martial-arts (MMA) fights. It was strictly a for-fun side project, free from the egg-headedness of academia. This was a breath of fresh air. It was a technical challenge, but you knew when the model was or wasn’t working: the betting market provides an easy way to test its predictions against those of thousands of MMA fans. The evaluation scheme here is simple – if your model makes better predictions than the average sports bettor, it will make money in the long run.
About the prediction model
The first version of the program was written partly in C# and partly in (gulp) Matlab. It consisted of two main components. The first component gathered publicly-available data from several online sources, storing it in a local database which now contains records of over 4000 fights and has varying amounts of information associated with each one. New data is collected routinely. The second component used this data and a computational-intelligence-based machine learning process to train the actual prediction model.
Figure 1: The prediction system gathers publicly-available data from various online sources. It stores this data in a local database, and uses it to train the prediction model. The model can then generate predictions for upcoming mixed-martial-arts fights.
The first prospective test of this model was January 25, 2014 – UFC on FOX 10. That night it made 8 predictions and got 7 correct, including a bet on Nikita Krylov, who was a 4-to-1 underdog but who defeated his opponent with a stunning head kick 25 seconds into the fight.
Figure 2: One of the model’s first predictions was that Nikita Krylov (right) would defeat Walt Harris (left) on Jan 25.
But the first model was quite volatile, getting most or all of its predictions right one week and then running 3 or 4 weeks of heavy losses. I was testing the model by tracking hypothetical bets placed on all its predictions, and while it was netting a small profit overall, it wasn’t much better than a high-interest savings account. Its predictions were about 60% accurate, which is actually less accurate than the betting odds themselves.
The second version of the system is written almost completely in C#. It made a few changes to the data-collection component, but used an entirely new approach for the prediction model. While the first model was a computational behemoth – taking almost a prohibitive amount of time to make a prediction – the new approach tries to work toward what Arnold Zellner might call “sophisticated simplicity”. It still involves a lot of computation, but now predicts faster and a bit more accurately.
I’ve been testing the new model on fights starting in January 2014 (retrospectively for fights from January to April, and prospectively after that). A hypothetical bet is placed on each of the model’s predictions and the resulting payout or loss is tracked assuming the best odds available at the time of the fight. At present the system focuses on fights in the UFC promotion. The predictions have been 69% accurate. The hypothetical bets totaled 49.3 “units” (one unit represents the maximum allowable bet) and winnings totalled 9.8 units – an average return-on-investment of 20%.
How this 4 Month (Sept 1 to December 31st) Prediction Challenge works:
The system makes a prediction for every fight where both fighters have 2 or more fights in the UFC – so it doesn’t make predictions for fighters who are newer to the UFC. Usually about 4~7 fights on a given night will meet this 2-fight criteria, and the rest involve rookie fighters. For every prediction it places a bet size that reflects its confidence in the prediction.
RULES: Eric is starting with a mythical bankroll of $10,000. Most UFC Events take place on a weekend, and the system can bet up to $1,000 per fight. Odds are used from www.bestfightodds.com. Predictions are made on Wednesday / Result posted Monday.
Week #6 Postmortem
Eric System picked 3 fights that met the criteria for this weekend (October 25th) UFC Main Event 179 (Full Picks History). Eric’s pick of long shot Phil Davis, a nearly 4-1 underdog to beat Glover Teixeira won. That was the pick of the night that saved this week. The other 2 selections (see top of page) lost. Eric did notice some bugs in the System and when fixed the results would changed it’s picks to give us a +$2,100 weekend instead of reality which was +$150
The winning streak continues for the 5th straight week, as the PhD Candidate and his data science continue to “Beat the Sportsbook”. The account continues to grow up over $16,000 using a $10,000 bankroll to start giving a over a 160%+ total return-to-date.
Final Result: Won $150
Week #5 Postmortem
Eric’s System found 3 fights that met the criteria for this weekend (Sept 4th) UFC Main Event (Full Picks History). This week, we used the analytical filter of John McAuliffe, a former MMA fighter (and intelligence officer), to provide a final qualitative advice on the selections. John recommended Eric stay away from the Saffiedine/McDondald fight, which proved wise. The 2 fights that were picked both won, including slight favorite Krzysztof Jotko over Tor Troeng and 5-1 underdog Rick Story over Gunnar Nelson. (2-0)
A GREAT week as the PhD Candidate and his data science continue to “Beat the Sportsbook”. The account is now up over $16,000 using a $10,000 bankroll to start giving a over a 160% total return-to-date.
Final Result: Won $4,120
Week #4 Postmortem
Eric’s System found 3 fights that met the criteria for the weekend’s (Sept 27th) Main Event UFC 178. (Full Picks History) Winning picks includes Cerrone over Alvarez and Gamburyan over Gibson. Romero defeated Kennedy for the only bad pick of the night. The System was 2-1 for +$1,300 in winning. Another good week. As the PhD Candidate and his data science continue to “Beat the Sportsbook”. The account is now up over $12,000 using a $10,000 bankroll to start giving a over a 120% total return-to-date.
Final Result: Won $1,300
Week #3 Postmortem
Eric’s System found 2 fight that met all it’s criteria for the weekend’s (Sept 20th) Main Event, which was out of Japan, so the fights were early this morning Eastern Time (USA). He was 2-0 for +$1,820 in winnings and both fights were over pretty quickly (under 3 minutes each). Another good week. As the PhD Candidate and his data science continue to “Beat the Sportsbook”. The account is now up over $11,000 using a $10,000 bankroll to start giving a over a 110% total return-to-date.
Final Result: Won $1,820
Week #2 Postmortem
Eric’s System selected 5 Vegas underdog UFC Fighters for this past Saturday (9/13/14) Night UFC Event (see picks history). $1,000 was placed on each of the 5 fights. The net result was a good one (+$2,050) for a 40% ROI and making up for last weeks (-$2,033). The most impressive pick was Andrei Arlovski a 5-1 underdog!!
Final Result: Won $2,050
Week #1 Postmortem
Simply put, WEEK 1 was not a good weekend for the system’s picks (see picks history). It was upset city! It happens. The track record through over 8 months of fight predictions remains very strong and aside from some minor tweaks to limit downside risk, the system will stay as is. We do intend to examine a Man+Machine approach in the near future where Eric’s system makes it’s picks and an ex-MMA fighter /analyst will apply his human knowledge to those picks and adjust them. We’ll track those adjustments separately to see if it improves results.
Final Result: Lost $2,033
Note: All picks to date (January to Dec13th) can be viewed/downloaded here
Update Chart 12/19/2014 and New Picks to be posted Wednesday 12/24/2014
Important Note: This data science project is for fun and to experiment testing predictive analytics techniques and algorithms. Please be aware that Eric had a 20% ROI over the 8 month initial trial span, but in any given 1 month stretch the ROI could actually be negative. Use this information at your own risk.
Picks will be made on Wednesday of each week and running total will be made on Monday morning.