Predicting college football game attendance: a statistical model – Callahan
Callahan’s James Meyerhoffer-Kubalik, senior business analyst, created a statistical model to understand the variables that drive college football game attendance, specifically Ohio State. Jan-Eric talks with James about his findings and how understanding variables and knowing what’s in control of those variables leads to insights that become the basis for predictive modeling and will drive results for your business.
Listen here: (Subscribe on iTunes, Stitcher, Google Play, Google Podcasts, Pocket Casts or your favorite podcast service. You can also ask Alexa or Siri to “play the Uncovering Aha! podcast.”)
Welcome to Callahan’s Uncovering Aha! podcast. We talk about a range of topics for marketing decision-makers, with a special focus on how to uncover insights in data to drive brand strategy and inspire creativity. Featuring James Meyerhoffer-Kubalik and Jan-Eric Anderson.
Bạn đang xem: Predicting college football game attendance: a statistical model – Callahan
Jan-Eric: Hi, I’m Jan-Eric Anderson, head of strategy at Callahan.
James: And I’m James Meyerhoffer-Kubalik, senior business analyst at Callahan.
Jan-Eric: So we’re honored today to have James in the podcast booth with us and he’s here for a specific reason. A well known fact about James, at least well known at Callahan is that he is a die hard enthusiastic fan of The Ohio State University football team. So James has made a request to make an appearance in the studio and he wants to talk about some work that he’s done. And it’s only the type of work that James would do to understand what drives attendance at home football games at Ohio State. And this is fascinating. So I know you James, so it doesn’t surprise me at all that this would be something you’d be interested in. But just tell me a little bit about what were you doing, what did you do?
James: Oh, so the kind of the why behind it. So I was looking for things to analyze because I get bored and I was looking at college football was my passion, trying to merge the two. And I was realizing that there are attendance models out there, but when people do them, they do them for the entire spectrum of 130 teams. So there is no individual model out there that represents any one university. College attendance has kind of been declining over the past four to five years, about 3%, so it was really about what’s really driving this, especially at what’s really driving this at Ohio State. Just to be selfish, because I’m a selfish person, so I kind of honed in on Ohio State just to figure out, hey, what factors will prove out to drive attendance and kind of tell the story with that.
Jan-Eric: Gotcha. So, and obviously your work here at the agency. you are a statistician. You look at at lots of data sets and different moving variables and look for relationships between the variables and 99.9% of the time you’re doing that on behalf of our clients and understanding how it connects back to business. In this case it was essentially doing the same type of work, but the end game or the business was butts in seats at home football games. So I’d imagine there were quite a few variables you were looking at. So what are some of the variables that you looked at and was there other research that had already been done that you were able to kind of build some momentum off of?
James: Right. So the variables I looked at were kind of, I categorize as a game specific and these are kind of like point differentials from the previous game, really trying to measure how finicky fans are based on a winner or a loss, anything from who we played, what the matchup looked like, did Ohio State play USC, those kinds of game specific variables to more team specific variables, which would… Like how old is enriches Ohio State’s football program compared to the others?
James: And then more a university specific. So things specific about Ohio State at being located in Columbus, the distance from Columbus to another campus and kind of those kinds of variables. So I collected a 39 total variables manually, so this took me quite some time. It was over the duration of 10 years I collected these variables and then really start to put them into my model and shake it all about just to kind of see what fell out.
Jan-Eric: And over that period of 10 years, was Ohio State’s team, the caliber of their team, was it fairly consistent? I mean they’re a powerhouse team, they’re…
James: I appreciate you saying that.
Jan-Eric: Typically they’re at the top of the rankings and always kind of in that national picture. But over this 10 year period, they were pretty consistently a contender, putting a good team on the field?
James: Yes, absolutely.
Jan-Eric: All right, so you’ve got that in play. Now, what were some of the variables teased out? What drives attendance?
James: Okay, so I’m going to start with the positive variables then I’m going to kind of go to the negative variables and I think it’s important to know both, whether it has a deterrence or a positive effect. So the top positive variable on game day attendance was based on the type of matchup. And so I used a binary, a one-zero variable, on if it was a top 10 matchup between Ohio State and whoever their opponent was.
James: So just, I’m going to read out the kind of the coefficient just to kind of give context to all those stats nerds out there listening. So for every time there was an occurrence, Ohio State was in a top 10 matchup, there was an additional 1,290 fans in the seat. The next one is near and dear to my heart was the Michigan game. So I was able to control for that. So it’s binary. Once again, whenever time they played Michigan on top of the other variables, I saw an additional 889 fans in the seat.
Jan-Eric: So can I stop you there, just to clarify. Bigger impact on attendance when Ohio State was in the top 10 and playing another top 10 opponent?
James: Right.
Jan-Eric: Bigger impact there than when Ohio State was playing Michigan, their biggest rival?
James: And because most of the times when they play Michigan, Michigan is also part of that top 10, so you’d really need to combine those two that-
Jan-Eric: So when Michigan’s in the top 10 then it would be more than another opponent?
James: Yeah.
Jan-Eric: Okay. Got you, got you.
James: The next one was just playing somebody from a Power Five conference and so that was an additional 342 people in stands. That’s binary, so based on occurrence. And the next one was the previous point differential. So what this says is for every point Ohio State beat somebody, but I assume they beat somebody, because I’d like to, was an additional 12 people in the stands. So if they beat them by 20, that’s 240 extra people in the stand.
Jan-Eric: From the week prior to the game [crosstalk 00:06:19], got you.
James: Yep, and that didn’t matter if it was a home or away, is the way I looked at that.
Jan-Eric: Got you.
James: So those were all the positive things. The negative things were more weather related. So one thing I dove deep into weather, whether it snowed, whether it rained and kind of how people experienced that weather. So when precipitation happened, I did it on a binary basis. So when it occurred, there were 1,433 people, less people in the stands because of that.
Jan-Eric: Got you.
James: The next one was alcohol being sold. When I think of this, I think of the university, trying to diversify and get with the times and selling alcohol to meet consumer needs during a game. But when that occurred, since it started occur, every game there was 1,320 less people in the stands.
Jan-Eric: So let me clarify that again. Ohio State began to have alcohol sales in stadium.
James: Mm-hmm (affirmative).
Jan-Eric: Was that during that entire 10 year period of your research?
James: No. So that was probably-
Jan-Eric: It started started midstream?
James: Yeah, midstream.
Jan-Eric: Okay. So once it started, you analyzed a shorter period of time there to see what kind of… And what you’re saying is that the introduction of alcohol sales had a negative impact on attendance.
James: Yeah.
Jan-Eric: And it’s interesting.
James: The rational that I got from that trying to say does correlation equal causation or vice versa is that it’s really a… Sometimes it’s a family event and if I think if… I have chihuahuas, so I don’t have kids, but if I think of taking kids to a game, that might be a negative experience knowing that there’s people that are drinking, consuming alcoholic beverages and…
Jan-Eric: That’s interesting, valuable insight though if you’re head of ticket sales or looking at how to drive revenue at home games. The beer sales, incremental sales and revenue from sales of beer, does that offset… How many fewer seats did you say?
James: 1,320.
Jan-Eric: So is it enough to make up that and I would imagine it probably was.
James: Yeah.
Jan-Eric: But I don’t know. But that’s interesting.
James: Right, that’s the way, that would be a good way to look at it.
Jan-Eric: That’s interesting. Okay. So negative impacts on precipitation, you said beer sales.
James: The next one was, I tried to bring in competition. So you know with Ohio State, they’re the largest university in Ohio, one of the largest in the country. There’s not an NFL team, but I went to the closest NFL team that people in Columbus like, which is the Cleveland Browns, go Baker. So every time the Browns played on a two day window of before, if they played on a Thursday or if they played on a Sunday or a Monday, that would capture that to told timeframe. You saw 253 less people in the stands and that’s a binary, so on occurrence.
Jan-Eric: With the home game.
James: Yeah, with a home game.
Jan-Eric: Got you.
James: And so I see it as people having to choose, make that consumer choice between going to a Browns or going-
Jan-Eric: Attend this or that.
James: Yep. So the next one was interesting. This one is incremental. So for the every hundred mile, the distance between Columbus, Ohio, where Ohio State plays and whatever the university, if it’s Michigan, Ann Arbor. For every a hundred miles, you lose 22 people in the stands.
Jan-Eric: Now, do you attribute that to the visiting team, the away team, the visiting teams’ fan base and their willingness to be there? Or does that relate to the recurring home crowd and less interest because it’s further away?
James: I think it has more to do with the away team because if you guys can pull up a map and look at Ohio, Columbus is dead center. You can get to anywhere in Columbus in two and a half, three hours. So you don’t have much cost to associated with going to a game. You don’t really need to book a room or anything like that. So it’s not as expensive. But when you’re talking, now you’re coming from USC in Los Angeles and that flight, because of the distance, is going to be longer. I think then costs start to play in the consumer’s mind for a game.
James: So after that, the a hundred mile a distance one, the next one I actually had to create a variable here and that was fun. So what I did is I had to create a rolling 12 game-win loss record for each team and the reason I did that is because when Ohio State’s in the top 10, it can be in the top 10 through 25, but after that you really lose sight of where rankings are. So to say, like Ohio State played somebody if they were top 10 and played UCF and UCF wasn’t ranked but was beating people, it’s still a good match up.
James: So I needed a way to actually be able to bring that into the analysis where I can say, hey, UFC’s a good opponent, how would I represent that? And so that’s what I did. I took the rolling 12 games for Ohio State and took the difference of that with the rolling 12 games for like a UCF or whoever their opponent was.
Jan-Eric: So UCF, University of Central Florida, a school that has a team that has gotten a lot better but maybe hasn’t always had the national ranking that maybe it’s deserved. And so this tried to make up for that, which actually makes a lot of sense.
James: Right. The next one was, and this is the last one, was the kind of the history of your program. it’s kind of like that home 12 loss variable, but it’s using the age of when that program came into existence. So like Rutgers is the most historic college football, kind of like KU in basketball, to bring it back to Lawrence. So if you have two historic powerhouses, like an Oklahoma versus an Ohio State, that allure does something, it means something to fans, and it will actually show up in these models.
Jan-Eric: Regardless of rankings, regardless of current records or trailing 12 game records or anything like that. There’s something about, it’s the blue bloods.
James: The blue bloods.
Jan-Eric: Of the sport. Yeah, that makes sense.
James: So those were really all the variables that I controlled for and fell out. I did, like I said, look at other things, but that kind of, when you do a model like this, you kind of bring in everything and then you start to shake it out to see what sticks, what has a lower P value to really start to tell the story.
Jan-Eric: Got you, and your process and going through this, I imagine it’s similar to what we do here is as we’re getting data into the intelligence platform, I mean data. You had to determine the variables that you wanted to analyze, right? You had to go gather that information from a variety of different sources and bring it together. But then bringing it together, you’ve got to structure the data in a way that you can analyze it, right? And start to look at how things stack on one another. Talk just a little bit about that process.
James: Right, so when we do this, we bring in all the data and then we say, look at the data and make sure that it makes sense. We do quality check on our data and then we harmonize it to bring it all to the same level before we analyze it. And then we, really next, what I do in my process is identify, if I am building a model for a client, what model is the best fit for what I’m trying to say? And for this, this is going to be a, for those statistics geeks out there, a Tobit model is usually used.
James: And what that model accounts for is there’s a ceiling because there’s a finite number of seats in a stadium. So that accounts for that. But in this case, Ohio State will sell past their capacity of of 104,588 at the time I did this analysis. And because of that, you don’t need to use models like that, but it’s just kind of going through those thought processes. Like this model would be the best model to use. Are there any limitations where I couldn’t use that and then we just go to the next best model, is kind of our process.
Jan-Eric: Got you. So you say you choose the model after you’ve got the data organized and then the results pop out and you’ve kind of just gone through what some of those positive drivers were and as well as the negative drivers. And then the game from that point is to find the application of the learning. Obviously this was a passion project for you and something you doing outside of work that we’re doing for clients here at Callahan. But I really appreciate you coming in to talk about it because not only is it timely, because we’re at the beginning of the college football season, but it’s a great metaphor for just the importance of having insight like this, where you know what are things that are going to impact business performance, and in this context, the business performance is ticket sales or game attendance at Ohio State.
Jan-Eric: It’s so important because it allows you to not only to understand what’s going to drive results, but you can then establish benchmarks of what should have we seen in ticket sales or in business performance based on the conditions and variables that were at play. It also can become the basis for prediction models or to forecast what’s going to happen based on what we see coming up, and then again, in this case, you can predict what attendance may be and how it may fluctuate based on your understanding of these variables and then predicting what variables, what’s the weather going to look like and who are we playing and what’s the trailing 12 game record, etc., etc., etc.
Jan-Eric: So it’s a great metaphor to kind of hammer home the point of the value in this work and understanding these variables because it really can, once you get that insight to what drives results, you can understand as a marketer, as a CMO, what’s in my control of those variables that matter? What’s in my control? What’s not in my control and what can I be doing to try to take advantage of the insights? So I think it’s an entertaining example. Any other thoughts you want to share? Any predictions you have about Ohio State this year?
James: Oh well. Well we do have a new coach. Urban Meyer has left the coop, but I think we’ll actually be better. I think Ryan Day is a little bit more offense focused and this will be more aggressive and so we’ll beat Michigan, that’s for sure. So, but that’s my prediction.
Jan-Eric: Okay. Well, it seems now it’d be a good time to point out that James’s opinions are those of his own and not endorsed by this company nor this podcast.
James: That’s a good point.
Jan-Eric: Urban Meyer has been coach at Ohio State for how long?
James: Seven years.
Jan-Eric: So there has been a coaching change during that time period of your analysis?
James: Right.
Jan-Eric: So that’s interesting and I guess didn’t change much, didn’t change any of the variables.
James: And that’s because Ohio State, like KU basketball, is a luxury good. So it’s like the fluctuations aren’t going to be as great as you go to Ohio as opposed to Ohio State, where their fluctuations are going to be greater based on these. So, yeah. So because it’s a luxury good, you’re always going to have the people there in the seats willing to see that product, so.
Jan-Eric: Yeah. Well, it’s fascinating. James, it’s always entertaining. Thanks for coming in, enjoying your insight on Ohio State football. Thanks for listening to the podcast and I hope you found it entertaining.
You’ve been listening to the Uncovering Aha Podcast. Callahan provides data savvy strategy and inspired creativity for national consumer brands. Visit us at callahan.agency to learn more.