Bowl Championship Series: BCS
Issue
The Bowl Championship Series (BCS) ranking process is a failure by any measure. The good news is that it finally appears the powers-that-be are going to work out a playoff system. But what is the root cause of the problematic BCS rankings? Why don’t they work? And what type of numerical system might meet the needs of a college football ranking system?
Statistics: Or Lack of!
A cursory review of BCS statistics quickly identifies the main problem, which is that is the people who created these “methods” do not appear to use any form of statistics. Further limiting the public’s understanding of these data is that the methods used to calculate rankings are not available. In other words, they have not been peer-reviewed in any meaningful way – and subscribe to the “trust me” method!
We know the accuracy is questionable at best or scandalous at worst, since we never read or hear about odds, confidence intervals, error, probability or other common statistical references when referring to these data. We also know intuitively that around each number there is error. If the error is not displayed, we know we cannot trust neither the numbers nor the authors – hence the ruckus around these rankings.
The Champ: Play-off
The great thing about a playoff for college football, like every other major sports league, is that you know the answer at the end. The best team on that day is the final one standing. End of debate. Rodney Harrison was recently asked who he liked in the NFL playoff and his answer was that it is hard to estimate since anything can happen in a playoff game. Well said. The challenge with a college playoff system is not that it wouldn’t work, because it would. Rather, it cuts the number of bowl games in half. Ouch, that is a lot of lost revenue!
The Champ: Numerical Calculation
I will disclose my bias for a playoff system since, as Rodney stated, anything can happen. But I believe there is likely a method that would, in fact, provide a numerical answer that most would agree with. First, the method needs to be made public, and it should be a method that has a history of success. “Odds” are, of course, one system, but in reviewing the odds estimates for the BCS championship game, there were many conflicting estimates with some odds makers suggesting a difference of only a point or two. In other words, it was too close to call.
Odds is an interesting process (better than the “look what I made up” numerical process), but probability estimates are the only real tool we have that could pick a winner. Odds and probability sound similar but in fact are quite different. The difference:
- Probability is used to express sensitivity, specificity and predictive value. It is the proportion of people in whom a particular characteristic, such as a positive test, is present.
- Odds is the ratio of two complementary probabilities. (PDF)
Along the probability line is a process called Evidence Based Management (EBM) which uses Bayesian analysis.
Bayes Theorem: a statistical principle for combining prior knowledge of the classes with new evidence gathered from data. See Introduction to Data Mining Chapter 5 pp: 228-229) (PDF)
EBM with Bayesian analysis states: What was thought before the test was done, combined with the test result is greater than what is thought after the test result. In other words, what you thought you knew before the football contest, the game, and what you think afterward – LSU is still No. 1 syndrome! It is this process that could provide an answer to who is No. 1 regardless of the date, time or opponent,* effectively removing the Rodney affect, but not likely the debate!
Conclusion
I am not sure that the BCS question is all that important or worth a lot of time in the context of solving the world’s problems, but if we are going to do the math, let’s at least try to make the process transparent, thoughtful and based on some sort of peer-reviewed science. Frankly, that is the only way my team will EVER have a chance at a BCS championship!
*Note: I do not address “style” points: a non-sportsmanship concept.
JMP 9.0 – Applied Data Analysis
JMP (jump): The Sharpest Tool in the Shed
I have been a JMP fan ever since being introduced to the product through the University of Minnesota statistics department. I have a used a number of statistical programs over the years, but JMP is a perfect fit for the wide range of data analysis work I perform for customers.
The Problem – What You Don’t Know
I have found that even simple data sets can, and do, hide their secrets effectively. In fact, it is amazing what we don’t know about even the most basic data sets unless the data is run through a statistical package. Here is an example (PDF) of state population estimates for 2009. There are only 50 data points. The tool used here is the basic and easy distribution analysis in JMP. Of the 50 states, four actually have populations considered outliers in the data set. The median population is about 4.1 million. It would seem that these four states would be easy to identify, but that’s not necessarily true. I thought Wyoming, with a small population at 544,000, would also be an outlier – but that’s not so. All this information is at your fingertips with the click of a button.
A public example of more complex data set is collected by NOAA. Here we are taking a small sample (416K points) of Sea Level Pressure Data and plotting. JMP 9.0 makes short work of this data set.
Setting Expectations
One of the best attributes of any statistical package is helping users understand their hypotheses, or assumptions about some aspect of the world. Each of us creates hypotheses every day as a part of life. Estimating commute times to work is one example. JMP allows us to not only think more clearly about these everyday data interactions, but to test them if we so desire. The hypothesis test is a statistical approach to testing a theory, according to The Economist, Numbers Guide. This test however, is not necessary to increase the basic understanding of data you are responsible for, whether it is financial, engineering, manufacturing, marketing, medical or administrative.
JMP – World Applications
JMP’s latest magazine describes uses of the program. (PDF) It is used in clinical trials, consumer products, product development and, of course, manufacturing. In today’s competitive marketplace, analyzing your data with a statistical package has become a business fundamental – much like cash flow. It’s something every business should have and deploy as part of an effective business strategy.
South Carolina Lazy? I don’t think so!
Lazy: When Noise Interferes with the Signal
Recently the Post and Courier ran an article highlighting a Business Week analysis that said South Carolina was the eighth laziest state in the union! Typically, subjective words used to describe data pop a red flag that warns me of impending data misuse doom.
The Data Set
The American Time Use Survey (ATUS), measures the time people spend doing various activities such as work, childcare, housework, watching television, volunteering and socializing. Hence this is an activity survey, not a lazy survey. The data are collected by the Census Bureau and sponsored by the Bureau of Labor Statistics (BLS). I ran a query to understand the nature of the survey, data availability and error rates. I called in the big guns from Global Pragmatica LLC to assist in converting the data from a ASCIDAT file to my JMP statistical software package format. These folks are experts in scripting and were a huge help. Thank you!
These data are collected regionally but analyzed nationally. There is about a 90-percent chance, or level of confidence, that an estimate based on a sample will differ by no more than 1.6 standard errors from the “true” population value because of sampling error. No estimates are made for state level data, and one University of Minnesota analyst stated she was not aware of state level error estimates.
It is inappropriate to analyze these data at the state level without calculating the error inherent in the data. If you did that, the analysis would be interesting but useless when comparing one state to another. Why?
Sports Activity Variable Analysis
For a test sample, I choose state level geography,with sports as a variable activity. This category captures the respondent’s participation in sports, exercise and recreational activities. To extract the data from the system, I used a tool created by the University of Minnesota called the American Time Use Survey -X. The data needs to be processed by a statistical package, in this case my JMP program. An analysis of people participating in sports activities indicates that South Carolina would rank 22nd out of 50 states in terms of average minutes spent participating in sports in a 24 hour period – not bad. However, upon further inspection of South Carolina’s 2009 detailed weighted data, the state could rank anywhere from 12th to 23rd,based on national error rates! (PDF) Unfortunately, since these are state data, the results are meaningless. That’s because the sample is simply too small, which is one of many buried statistical problems. This 2009 sample included a total of 200 people, where 166 recorded zero sports activity minutes. (PDF) In fact, the median is zero, which is another red flag for this data set. A review of other states’ data revealed the same issue. This is a fascinating national data set. But unfortunately, analysis of non-national geographies yields unreliable results.
Reverse Pivot Table: Matrix → xyz Format
I like to add a few technical tools now and again. Here is a sweet piece of programming that could save time converting a matrix to a xyz table. The surface plot on my web page can be created by converting matrix data to an xyz format.
Issue
The problem I often run into with excel spreadsheets, is the data is defined in a matrix. Sometimes it is more convenient to reorgainze the data with a pivot table in order to represent the data as xyz coordinates. At first glace it appears this should be an easy task, but with out the right excel module or the full version of sql- forget it.
Solution
A solution to this problem is provided by The Spreadsheet Page, a reverse pivot table. The link does an excellent job of explaining the process. At the bottom a VBA link allows one to copy the code into your excel application. A big thank you to these guys for sharing this- it saved me many hours of work.
You Cannot Prove the Null
This sounds like statistics better go to the next site–stop before you do that. I think we can help. In 2009, I joined the SAS JMP® software user group and receive periodic updates and explanations on different aspects of statistics. Since using JMP®, my whole world has become significantly (no pun intended) less complicated.
The latest issue of JMPer Cable (pdf) (Issue 26 Winter 2010 pp 6-9), has a short and informative article by Ramirez and Bailey on significant testing. Questions we analysts need to ask ourselves, from a data standpoint include: is there a difference, does the data tell me anything, or is the simple comment “that’s interesting” good enough or do we need to have that discussion with the marketing guys again? Regardless, this article explains the null hypothesis (no change) and the alternative hypothesis in a few easy to understand pages. The authors do this with short informative examples and more importantly the intuitive computer display from the JMP® statistical package. This article is a conversation, not a lecture, allowing one to absorb concepts that frankly can be confusing.

