With all this individual play data at my disposal, I've been looking at the trends of play selection, particularly run v. pass (i.e., I'm not looking into field goal / punt frequencies). My end goal is to come up with a way to roughly estimate under which scenarios a team would pass / run, and the certainty of that estimate. Once I have a model which works fairly well, it'll allow me to do a few things: (1) I can determine, mathematically, what is a run scenario vs. a pass scenario. Then, I can look at how well teams perform in each of these scenarios. For instance, my off-hand example is during a definite passing scenario (down by two scores with 2 minutes left, for example), if a team runs the ball they can probably get about, what, 5 yards or so? Basically, I can build a basic game theory grid: avg. pass yards vs pass defense, avg. pass yards vs. run defense, etc. And (2) if I know when, on average, teams pass, I can determine which teams / coaches like to buck the trend, and see if bucking the trend is successful or not.
I'm going to consider modeling run / pass ratios for each down separately. Logically, I'm starting with the 1st down. Provided below is average run ratios, broken down by starting field field position.
So you can see a second-order relationship between field position and this ratio. Also, it looks like the relationship changes around 65 yards, with the slope being more gradual from 65 to 0 yards. For my purposes, I'll model these two portions of the field separately (0-65, and 65-100).
Next, the relationship between time and run ratio.
There's a pretty big break at the end of each half, can't tell if there's a big difference between the end of the half and end of the game, right around 10 minutes left. For now, I'm going both halves the same, except for the last ten minutes (break those out separately).
And finally, here's the relationship between point spread and run ratio.
Pretty linear relationship, breaking down when the spread hits thirty points. Teams really give up when down by 40 or more. In fact, I'm just going to go ahead and throw out all these values.
So, in summary, here's what I'm looking at: when the field position is greater and less than 65 (second order relationship); first 20 and last 10 minutes of the half (linear for the first 20, second-order for the last 10); and point spread up/down to 30 (linear). In the near future, I'll develop some linear regressions through all of these scenarios, so, stay tuned. I doubt the regressions will say anything earth shattering, but, you know, I can mathematically say how often teams run the ball in the last two minutes.