Put the team on your back: comment, subscribe, share, and enjoy.

### Estimating Draftkings Scores for NBA Players

Introduction

For an avid sports fan and a “numbers” guy (whatever that means), daily fantasy sports is kinda the perfect intersection of all the things I find awesome.

Basically how daily fantasy sports (DFS) work is the following

• All players playing today are available and have some cost, which is determined by the daily fantasy sports site
• You have some salary cap of fake dollars that you use to buy players
• Your goal is to assemble a team of players that will score the most points, given some scoring structure
• The top scoring teams then get paid out based on the tournament type and the payout structure.
• Here’s breakdown of the scoring structure on Draftkings

Obviously, a natural question is whether or not it’s possible to predict player performance accurately enough to inform what players one should pick.

I’ve been developing a library (which is in active development) that tries to answer this question — it scrapes data from Basketball Reference, stores the data in MySQL, and does some simple linear regression on the aforementioned Basketball Reference data.

Data

As I mentioned previously, I use daily boxscore data that is scraped from Basketball Reference (example URL).

Model

Above is an example of predicted player performance for all NBA players playing on 4/10/2015 vs. their actual performance.

Things to note

• The training data set was from 3/1/2015 to 4/5/2015 and the date for predicted player performance (4/10/2015) is outside this training set.
• R-squared is around 0.47, which essentially indicates a lot of unexplained variance between the predicted score and the actual score.  However, this might not necessarily have to do with the structure of the prediction model but rather the nature of events being predicted — I’d make the argument that NBA player performances have large variances, by nature.

Let me quickly outline the very simple linear regression model I used to generate the above result.

First, I started with the following logic

player performance = f(historical player performance, teammate, opponents, schedule)

Basically, I think about a player’s performance as dependent on four factors — how a player has been playing, what teammates are playing, what opponents are playing, and the player’s schedule.

Now, the specific variables I use to represent these four factors are the following:

• Weighted historical performance
• I calculate this variable using the following weighting: last game’s performance + 0.6 x performance over previous 7 days + 0.3 x performance over previous 14 days + 0.1 x performance over previous 28 days
• Average Draftkings score that the opponent has conceded to the player’s position over the past 28 days
• Example: Over the past 28 days, point guards have scored, on average, 28.6 points against the Houston Rockets
• Missing teammates’ Draftkings scores
• Example: Today, there are 3 teammates that are sitting out for the Oklahoma City Thunder. They have averaged 12, 15, and 25 points on Draftkings, respectively.  Thus, the number of “missing” points is 52.
• Is the game a back-to-back?

Improvements

• I should split up historical performance by week, and add each week’s performance as it’s own variable
• I should actually calculate these regressions on a player-by-player basis rather than aggregating all player information and then running the regression.
• What I mean by this is that I should aggregate all historical data for a particular player, like Russell Westbrook, and then run a regression to predict Russell Westbrook’s next performance.  Then move on to the next player, say, Tim Duncan, and do the same thing.
• My reasoning is that I believe that the impact of these variables, like say, whether a game is a back-to-back will vary greatly from player to player.  For somebody that could very well be the human Energizer Bunny (see Westbrook, Russell) a back-to-back may have very little impact.  For somebody who could pass as the new “Jake from State Farm” (see Duncan, Tim) back-to-backs could be harder to recover from.

### There was pretty much no correlation between +/- and fouls over the past 5 years

• Each data point represents a team’s +/- per 100 possessions and fouls per 100 possessions after each season between 2009 to 2014.
• Here’s how to interpret R-squared.
• What is +/- ?
• Basically, for teams, it represents how many more or less points they scored per 100 possessions against their opposition.
• So why is this interesting? Because generally speaking, one would think that fouls and +/- would be inversely correlated.
That is, as a team averages more fouls per 100 possessions it’s performance relative to it’s opposition would suffer.  However, we pretty much don’t see any evidence of this relationship.

• Example of other things that are inversely correlated:

### The Right Time To Send Emails

All the emails I’ve sent from my Gmail account plotted with the date the email was sent on the y-axis and the time of day the email was sent on the x-axis

Last night, I asked myself an interesting question: “Do I tend to send emails at the “right” time of the day?”  In other words, through trial and error, have I learned to send emails at certain times of the day when I know they are more likely to be read (and responded to)?

To answer this question, I created the above graph in R that plots every email I’ve sent through my Gmail account, sorted by the day and time of day I sent an email.

It turns out, my emailing patterns do not accurately reflects what current marketing research tells us are the best ways to reach email subscribers — at all.

In fact, here’s what current marketing research tells us about the optimal time to send emails.

First of all, is it really that important to send emails at the “right” time?

• MailChimp found that about 2% of emails get opened around 4AM while about 7% get opened around 4PM.

Image via MailChimp

• A study by GetResponse looked at 21 million email messages and found that sending newsletters between 8 a.m. to 10 a.m. and between 3 p.m. and 4 p.m. can increase open and click-through rates by 6%.
• In addition, GetResponse also found that about 24% of all email opens occur within the first hour of delivery, while only 5% occur four hours after delivery (and less than 1% occur a day after delivery).
• Considering that email opens are time-sensitive it seems pretty clear that sending emails at the “right” time is important.

So then when should I be sending emails?

• Because email marketing differs by industry and company size, let’s start with some baseline email marketing statistics across different industries and different company sizes.
• As the previous section probably illustrated, the best times to send emails are in the morning and in the early afternoon.
• However, an Experian white paper found that the most unique opens, most unique clicks, highest transaction rate, highest revenue per email, and average order all occurred between 8PM and midnight.

Image via Experian

• Also, only 2% of all daily email volume is sent between 8PM and midnight, so if you find you’ve been suffering from a lot of competition from customers, sending emails later in the evening might not be such a terrible strategy.
• More emails are opened during the mid-week, but more emails are also sent during the mid-week (with the highest volume on Tuesdays and Thursdays).

Image via MailChimp

Not very many emails are sent on weekends.
Image from GetResponse.

• Emails sent during the weekend tend to have lower open and click-through rates.
• Tuesdays have the highest open rates while Fridays have the highest click-through rates.

Image from GetResponse.

Summary

• From a marketing research standpoint, it’s quite clear that there my emailing patterns are definitely not efficient or optimal.
• Generally speaking, sending emails in the morning or the early afternoon tend to have the most success.
• In fact, Fridays may be one of the best days to send emails as volume tends to be lower on Fridays compared to other weekdays, yet Friday has the highest click-through rate, and a relatively high open rate.
• However, it might not be such a bad idea to send emails at night either.
• Dave Chaffey at Smart Insights does a very good job summing up the pros and cons of sending emails on each day of the week based on his own experience and also the marketing data covered in this post.
• Saturday – Well this is the lowest volume day of the week, so you have the least competition!
• Sunday – I don’t understand why this is so high compared to Saturday – it does tend to be higher in web analytics than Saturday though.
• Monday – Relatively high, but often everyone is busy and the web analytics show volume is low. It makes sense in some markets like financial services where a decision is maybe made at the weekend and acted on on Monday.
• Tuesday – Traditionally the most popular day of the week for visits to a B2B site.
• Wednesday – Looks like an OK option since relatively low volume
• Thursday – Volume creeping up again as consumer mailers look to reach people before payday and the weekend
• Friday – Again a high volume since email may be read in work on Friday and at home over the weekend also – high competition though! FWIW We send our enewsletter on Friday since business folk may be winding down at the end of the week, although just as likely they’re chasing deadlines. We find readers browse it on the weekend.

How would you be able to reproduce your own personal email graph?

• I was inspired by this great blog post by Stephen Wolfram where he created a graph of the emails he has sent since 1990

This is what a third of a million emails looks like

• If you do want to do this in R, I recommend using the edeR function, which can essentially pull all the necessary email information you need.

### Points Per Half-Court Touch (Or Andre Drummond is a Beast)

• I knew Andre Drummond was killing it this season…but I didn’t know he was averaging 0.79 points per half-court touch!  I’m just waiting for Detroit GM Joe Dumars to screw this up and trade Drummond for free agent Darko Milicic (sorry Detroit fans), three Jamaican bobsledders, and a life-time supply of Krispy Kremes.
• The fact that Ryan Anderson and Klay Thompson are the only non-post players in the top-10 speaks to their ridiculous efficiency coming off catch-and-shoots in the half-court offense.
• Data from here. I only considered players that averaged at least 24 minutes per game, else Ryan Kelly would be at the top of this list (and the only list that Ryan Kelly should be at the top of is the list of  NBA player who should never be nicknamed R-Kelly — ever).

Because nobody should have to ever imagine Ryan Kelly hiding in their closet.