Bowl Championship Series: BCS
Issue
The Bowl Championship Series (BCS) ranking process is a failure by any measure. The good news is that it finally appears the powers-that-be are going to work out a playoff system. But what is the root cause of the problematic BCS rankings? Why don’t they work? And what type of numerical system might meet the needs of a college football ranking system?
Statistics: Or Lack of!
A cursory review of BCS statistics quickly identifies the main problem, which is that is the people who created these “methods” do not appear to use any form of statistics. Further limiting the public’s understanding of these data is that the methods used to calculate rankings are not available. In other words, they have not been peer-reviewed in any meaningful way – and subscribe to the “trust me” method!
We know the accuracy is questionable at best or scandalous at worst, since we never read or hear about odds, confidence intervals, error, probability or other common statistical references when referring to these data. We also know intuitively that around each number there is error. If the error is not displayed, we know we cannot trust neither the numbers nor the authors – hence the ruckus around these rankings.
The Champ: Play-off
The great thing about a playoff for college football, like every other major sports league, is that you know the answer at the end. The best team on that day is the final one standing. End of debate. Rodney Harrison was recently asked who he liked in the NFL playoff and his answer was that it is hard to estimate since anything can happen in a playoff game. Well said. The challenge with a college playoff system is not that it wouldn’t work, because it would. Rather, it cuts the number of bowl games in half. Ouch, that is a lot of lost revenue!
The Champ: Numerical Calculation
I will disclose my bias for a playoff system since, as Rodney stated, anything can happen. But I believe there is likely a method that would, in fact, provide a numerical answer that most would agree with. First, the method needs to be made public, and it should be a method that has a history of success. “Odds” are, of course, one system, but in reviewing the odds estimates for the BCS championship game, there were many conflicting estimates with some odds makers suggesting a difference of only a point or two. In other words, it was too close to call.
Odds is an interesting process (better than the “look what I made up” numerical process), but probability estimates are the only real tool we have that could pick a winner. Odds and probability sound similar but in fact are quite different. The difference:
- Probability is used to express sensitivity, specificity and predictive value. It is the proportion of people in whom a particular characteristic, such as a positive test, is present.
- Odds is the ratio of two complementary probabilities. (PDF)
Along the probability line is a process called Evidence Based Management (EBM) which uses Bayesian analysis.
Bayes Theorem: a statistical principle for combining prior knowledge of the classes with new evidence gathered from data. See Introduction to Data Mining Chapter 5 pp: 228-229) (PDF)
EBM with Bayesian analysis states: What was thought before the test was done, combined with the test result is greater than what is thought after the test result. In other words, what you thought you knew before the football contest, the game, and what you think afterward – LSU is still No. 1 syndrome! It is this process that could provide an answer to who is No. 1 regardless of the date, time or opponent,* effectively removing the Rodney affect, but not likely the debate!
Conclusion
I am not sure that the BCS question is all that important or worth a lot of time in the context of solving the world’s problems, but if we are going to do the math, let’s at least try to make the process transparent, thoughtful and based on some sort of peer-reviewed science. Frankly, that is the only way my team will EVER have a chance at a BCS championship!
*Note: I do not address “style” points: a non-sportsmanship concept.
Gini Coefficient
Issue
The Gini Coefficient, developed by the Italian statistician Corrado Gini, is the most commonly used measure of inequality. The coefficient varies between 0, which reflects complete equality and 1, which indicates complete inequality (one person has all the income or consumption, all others have none). (The World Bank) We wanted to use this method to look at income distribution throughout South Carolina, but first we had to understand the formula.
At first glance, there is a fair amount of math needed to calculate the coefficient. Make no mistake, this is and can be a very complex formula, utilizing probability sampling, bootstrapping, confidence intervals and other statistical methodology. We however, tried to keep it applied, and therefore used the most basic variation:
![]()
After sorting out the symbolism, we created a sample problem (PDF). The sample problem allowed us to work through the math in a structured process. The value of ”doing the math” is that one gains an understanding as to how different variables affect the formula. The PDF contains two versions of the sample problem, one showing the formula and the other with plugged numbers. Note how unlike most of the available examples, we show a calculation needed prior to using the formula. In this case (dollars strata) TIMES (number of persons). That’s because the analyst may need to do a number of calculations prior to applying the formula.
The Formula: Results
We applied the formula to the classic income distribution (wealth share) problem, using Census, Household and Family Income Report B19001, for each county in South Carolina. These data have 16 income strata. We found the formula is particularly sensitive to changes in the top two strata, not necessarily the number of persons, but average dollar value. In other words, ”the tail wags the dog” in this formula. The other critical piece of information needed is what value to assign the highest strata. The census uses approximately $400,000 as an approximation for the average top strata dollar figure. They calculate this number using volumes of data, so it’s good enough for me.
After making our calculations, the formula really did reveal a number of interesting trends. One, the impact of the economy on higher wage earners – in the case of these data – is very delayed. In other words, higher income households continued to make money well into the latest recession. The other revealing attribute is the affect of a rising tide. A rising tide does in fact lift boats, but some higher than others and in the process it also sinks a few! In this case, households with higher incomes grew at a proportionally higher rate than those with lower incomes, and in some counties, household income (high and low) was hit particularly hard.
Conclusions
Now that you understand the formula, if you use these data, the Census Bureau has already done the Gini Coefficient income calculations for you! Yes, to my surprise the the Bureau has been doing this calculation since the 1990s. The file is B19083. It may sound like I have given you a shortcut but now you have to figure out the new GUI American Community Survey interface. Good Luck!
Acknowledgement: Thank you to the staff at the US Census Bureau for assisting me in understanding key drivers of the Gini Coefficient.
Trade Environmental Assessment Model (TEAM)
Environmental Economic Impacts
TEAM is a suite of software tools developed by Abt Associates for the Environmental Protection Agency (EPA) National Center for Environmental Economics to assess the environmental impacts of international trade agreements.
The ECONOMIC resolution is 4 digit NAICS sector level.
The ENVIRONMENTAL resolution TEAM currently analyzes are four separate environmental release and resource use categories: air, water, and carbon dioxide pollutants along with energy consumption by fuel type.
The GEOGRAPHIC resolution is the state level.
This tool is an exciting, and likely the first, applied environmental economic impact calculation tool. Imagine for a moment a community considering the development of an industrial park with a certain mix of industries. Wouldn’t it be useful to estimate the environmental economic impact this development could have on a regional community. TEAM can assist in this evaluation.
The EPA has done an excellent job in development, analysis, and peer review. The model is based on standard economic methodology, grounded with sources such as Duchin and Steenge, Miller and Blair, Leontief, and the National Research Council among others.
Further, as a result of the model design, outputs dovetail with; transportation, economic, land use, and environmental impact models as part of the final economic/environmental impact deliverables.
Utility Industry Economic Environmental Impact Example
This is an example of output created by the TEAM model. The example uses the sector 2211- Electric Power Generation, Transmission and Distribution to demonstrate a portion of the model functionality. This example is only scratching the surface and for introductory purposes only. In my view the possibilities for TEAM are limitless. (pdf)
I have added an example from Miller and Blair (2009), which demonstrates the matrix algebra used to create these coefficients. The process is intuitive and flexible, demonstrating the value of applied industry – environmental analysis. (pdf)
Grocery Stores: The Profitability Index
Issue
The Post and Courier recently posted an article titled Super Market Central that raises more questions than it answers. The article compares the number of grocery stores in affluent Mount Pleasant, S.C., compared with North Charleston, S.C. One Mount Pleasant store owner was quoted as saying “We want to be in markets where there are households with families.” Actually, North Charleston and Mount Pleasant both have an average of 2.5 persons per household! (Source: City-data.com) After review a number of data-intensive sites such as city data and the U.S. Census, I was able to confirm many points in the article and, like the author, identify differences in these communities. Many, such as race and income, are obvious. However, this still does not explain the disparity between the numbers of grocery stores in one community versus another.
Empirical Research
It turns out this is a significant problem, not only here in the low-country, but across the United States. One source “Closing the grocery gap in low-income areas,” identifies key issues. Other research from CA Food Policy Advocates suggests:
“One promising model, among others, that has emerged involves the conversion of existing corner stores, typically depending upon sales of alcohol, tobacco and sodas, into neighborhood groceries selling healthy foods. Because so many of the necessary costs — rent, utilities, space, and management possessing some degree of both business skills and familiarity with neighborhood preferences — already are present, the conversion can be relatively inexpensive and, in fact, provide the store with additional opportunities to be profitable. A viable neighborhood grocery store represents multiple policy gains, including food access, nutrition and fitness, transportation, community development and crime reduction.”
Unfortunately, the quote describes symptoms, but not the cause of the problem. The root cause is how capital is rationed to achieve the highest return. The Post and Courier article states margins for grocery stores (a basic commodity) are around 1.5 percent. This is pretty poor even by commodity standards. In fact, one would have to wonder why anyone would go into this business, especially a small business, as suggested above. There simply are not enough retained earnings to make a living! However, to understand why Mt. Pleasant has more grocery stores than North Charleston, we must look for the answer among the financial tools used to make capital allocation strategic decisions – in other words, to build new grocery stores.
The Profitability Index
Most firms’ capital budgeting process uses some sort of discounted cash flows, the most common being net present value (NPV). Although there are other methods, such as payback or average accounting return, we assume our grocery stores use a variation of NPV. In the capital budgeting process, a project is accepted if NPV is greater than 1 and rejected if it is less than 1. That basically means if the project is accepted it will make money (hopefully). We will assume that both North Charleston and Mount Pleasant grocery projects have positive NPVs. So far so good. Unfortunately, investment capital is limited, especially when risk is factored in. We therefore can choose only one project. To do that, we run both projects back through the profitability index.
Profitability Index (PI) = Present Value (PV) of cash flows subsequent to initial investment/ Initial Investment
Again if PI is greater than 1 we accept the project, if it’s less than 1 we reject it. When using NPV, we make a go, no-go decision. However, when applying PI, projects are ranked according to the ratio of present value to initial investment. The project with the best potential return (greater than 1) is funded. It is Mount Pleasant in this case. The project is funded, as the article states, not because of corn flake sales, but because of special item sales, which less affluent customers avoid. Special items sales create a better return (profit) on capital invested in the Mount Pleasant location.
Conclusion
Both projects are in fact profitable. But one provides a slightly better return. At this point corporate culture also comes into play. For example, “what we did last week, which worked, will likely work in our next venture” … and so on. One can see this pattern in Mount Pleasant – the me too effect. This happens in part because firms generate positive NPVs because of prior investments, leveraging their current market position. An organization does need to make a profit, whether it is the small corner store or a large grocery chain. Without that profit, the store will cease to exist.
In the end a different model is needed (not currently in the domain of the typical grocery store) that incorporates social networking, transportation, specific product offerings, efficient security and product distribution. This comprehensive model leverages capital not only for the current project, but for indirect cash flows of future business ventures yet to be determined in the same locale. Extending the scope of the investment decision breaks the current boom-bust grocery store location cycle. The question is how to get business owners to adopt this perspective.
Hotel Impact on “Tourism”: Methodology for Estimating Economic Impact
Issue
This impact analysis was created by Julie Flowers, a retired statistician for the South Carolina Parks and Recreation Department, using Travel Industry of America (U.S. Travel Association) data. Julie is a pretty sharp statistician and does a nice job of outlining what is important when thinking about tourism impacts.
Analysis
Ms. Flowers uses IMPLAN to create a final economic impact based on TIA data. To create an economic impact, we need either spending or employment patterns by industry. As Ms. Flowers points out, there is no “tourism” industry. Our closest proxy is the Hospitality and Leisure super-sector. When it comes to hospitality, the big dog is lodging (hotel and motels). This is not apparent at first glance, but without this industry there is no tourism other than tenting! Within the hospitality industry it is clear that other businesses leverage lodging’s strength.
Hotel Impacts
TIA developed a strong method for collecting and analyzing hospitality data that is generally available to the public. What comes to light when exploring economic relationships within the travel industry is that for every dollar spent on lodging, $3.60 is spent on travel-related items – food, retail, recreation, etc. In employment, that number leaps to 5.3 jobs* for every job in the hotel industry. In other words, the hotel industry provides the anchor for other businesses to flourish. TIA data states that every $1 million in travel spending creates 13.5 jobs outside the hotel industry. Therefore, every million dollars spent in hotels likely generates over 70 jobs in hospitality-related industries!
Conclusion
Hospitality analysis requires a fair amount of data to create economic impacts, in this case supplied by TIA. Furthermore, TIA provides a solid methodology for justifying these spending patterns. Ms. Flowers’ process starts with a analysis of spending, then plugs the data into an economic analysis program, IMPLAN. With this transparent method further analysis is possible, uncovering the deeper relationships within this cluster. In this case, the end result provides a more clear picture of how supporting the hotel industry leads to significant gain in both employment and industry output within hospitality related industries.
*Spreadsheet addition error in report.
Manufacturing: Decline or Revitalization?
Issue
The Post and Courier recently printed an article from the Associated Press on the national economy. It is an interesting article in that unlike many articles of this type, there is a limited amount of talk, and actually some interesting data. Unfortunately, most of the data points were taken out of context and in one instance actually mislead the reader. Of particular interest are the manufacturing data.
Manufacturing Expansion – NOPE!
The data which were quoted appear to be from the U.S. Census, but are actually from the Federal Reserve Board.
“U.S. manufacturing output expanded in May at the slowest pace in 20 months”
Actually manufacturing declined* by -0.4 percent. The Federal Reserve goes on to explain these data in more detail:
“In April (2011), manufacturing output fell 0.4 percent after increasing 0.6 percent in March. The rates of change for manufacturing were also revised down for both January and February; lower estimates for the production of cigarettes, petroleum products, pharmaceuticals, microprocessors, and military aircraft contributed to the downward revisions. The index for manufacturing in April was 4.6 percent above its year-earlier level. Capacity utilization for manufacturing moved down 0.4 percentage point to 74.4 percent, a rate 10.0 percentage points above its trough in June 2009 but still 4.6 percentage points below its average from 1972 to 2010.”
Analysis: Wish the Late 80s Were Back
When evaluating manufacturing, two important measurements are production and capacity utilization (CU). Production (Federal Reserve, St. Louis) had been increasing since the end of the recession. Because this trend was broken well before reaching production output established late in the past decade, April’s release was disturbing.
More troublesome however, is the continued long term slide in CU. Fortunately, we rebounded from the recession in this statistic too, but again the numbers seem to be leveling off. Most manufacturers operate best when they run between 80 to 83 percent of full capacity. Any number higher than this typically means that the manufacturer has to bring old, less efficient equipment on line. So although there is an increase in production, efficiencies actually drop. In addition, high CUs tend to dominate the business model, leaving other areas of the business to suffer, such as quality (think Toyota).
Unfortunately, this is not our current problem. The current state of production is low capacities resulting in machines sitting idle, workers being laid off and budgets being reduced – all of which are a real drag on the recovery. So how do we get back on track?
Solutions
Solutions to America’s long term decline were the subject of a paper by Timothy J. Bartik, “Thoughts on American Manufacturing Decline and Revitalization” back in 2003. He outlines six ways to support manufacturers. We have noted these suggestions over the years but maybe now, as a result of hitting a manufacturing ceiling, it is the time to take a hard look at policies such as retraining, capital formation and access to information to improve this industry’s competitiveness.
For the best information on the economic indicators, see The Federal Reserve Bank of Richmond (National Economic Indicators)
*See Major Industry Groups Manufacturing (April)
Labor Market Information: An Overview
Issue
My friend and business associate Gary Crossley provides labor market information (LMI) to a variety of organizations nationwide. Recently he sent me one of his overview presentations to post on Moore Data.
Most analysts believe they have a grip on labor market data, but what Gary and I find is that this is not so. The reality is analysts tend not to stray very far from unemployment statistics, rarely giving any weight to other key data sets that fill in the labor market knowledge gap. Below is Gary’s presentation. I have provided the appropriate links to corresponding Bureau of Labor Statistics (BLS) web sites.
Presentation Analysis (PDF)
Slides 9-11: Employment: Provides a general definition for employed and unemployed along with basic calculations.
Slides 14,15: Quarterly Census of Employment and Wages (QCEW) discusses data sources and uses of data.
Slides 17,18: Current Employment and Statistics (CES) discusses data sources and uses of data.
Slides 20 – 23: Occupational Employment Statistics (OES) discusses data sources, uses and programs.
Slides 25, 26: Local Area Unemployment Statistics (LAUS) discusses data sources and uses of data. For detailed calculation of unemployment see Unemployment Calculation Post.
Slide 28: Mass Layoff Statistics (MLS) discusses general program.
Slides 29, 30: These slides provide a link to the Bureau of Labor Statistics Handbook of Methods. This is a particularly good resource for analysts. The handbook not only describes method, but also what programs use these data as an engine to drive their information programs. The handbook also provides a detailed account of data limitations, which is a plus when determining appropriateness of data for various uses. Slide 30 provides a nice “cheat sheet” to the different data sets.
Slides 33 – 45: These slides list the different entities that provide data portals. Of course you can analyze these data yourself, but the challenge is understanding the nuances of the data so that one does not come to the wrong conclusion.
Slides 45 – 47: These slides touch on supply and demand, training, and military data. However, one of the more interesting and eye-opening data sources is the Census data set of military service-related disabilities. These data can be found at Veterans.
Conclusion
LMI data is readily available on the web. Most competent analysts will use two or three data sets attempting to triangulate to find the “answer.” Gary has provided a basic reference that will assist the user in thinking about the different data sources and how they may help you answer your labor market question. Thanks Gary.
Economic Development: Analytic Hierarchy Process
Issue
Economic development continues to be a focus for states – especially here in South Carolina as the state strives to attract businesses. The most recent challenge for economic development agencies is the diverse visions of what economic development should provide for the community. These visions seem to have become more divisive as tensions increase due to a lack of success. Where once new jobs were the only factor to consider, new jobs now compete with environmental concerns, quality of life, wage rates, tax incentives and a host of business, community and regulatory concerns.
AHP
The Analytic Hierarchy Process (AHP) economic development targeting tool is one solution to meeting the needs of diverse perspectives. AHP is able to compare or rate varied perspectives, assisting policy makers in narrowing economic development strategies. The process was developed by Saaty. (Saaty, T. L., and Alexander, J.M. (1989) Conflict Resolution: The Analytical Hierarchy Approach. New York: Praeger.) Saaty’s approach is outlined in Targeting Regional Economic Development. Goetz et al (2009).
Process
The process is a method that weights or prioritizes outcomes when several considerations are relevant. For example, when attempting to reconcile survey results taken from different constituents, the process uses pairwise comparisons of several outcomes. The goal of AHP, when comparing different criteria, is to determine relative importance of each criterion in achieving the goal. The math includes solving a “weighting” problem using an eigenvector P (Matrix) corresponding to an eigenvalue equal to K, or the matrix rank (Saaty 1980). Typically the Lanczon Algorithm is used to calculate the matrix. In this case, the hierarchy of importance (range from equal importance to extreme importance) is limited to 9 levels to reduce error.
Results
Traditional industry targeting methods answer a variety of questions independently of one other. However, without some type of preference elicitation process, those results simply do NOT answer the question of which firm or industry is most attractive to the region. One of the earliest applications of AHP to solve this problem, was Cox et al. (2000). The example below demonstrates how Cox used the process to rank regional expectations or needs that were most important to the community. These criteria were then compared to industry characteristics or used to develop a strategy of industry selection, rejection or negotiation parameters.

The Advent of the Algorithm
The Post and Courier recently printed an unusual column discussing “Deep Reading.” It is an excellent article by Laura Casey discussing a lost art.
“Deep Reading, or slow reading, is a sophisticated process in which people can critically think, reflect and understand the words they are looking at. With most this means slowing down – even stopping and rereading a page or paragraph if it doesn’t sink in…”
Deep reading seems out of sync with today’s “modern” communication tools, like Twitter, and it is. But what does this have to do with the algorithm? As a data guy, the algorithm is my life, and the only way to understand what makes it tick is deep reading! My choice for this subject is The Advent of the Algorithm, by David Berlinski.
We experience the algorithm everywhere in our daily lives – in the operation of the toaster, the dentist’s office, our vehicles and in most, if not all, of the communication tools we use daily. I reviewed definitions of the algorithm online, but none have the elegance of Berlinski, in his book, The Advent of the Algorithm:
“In the logician’s voice:
an algorithm is a finite procedure,
written in a fixed symbolic vocabulary,
governed by precise instructions,
moving in discrete steps, 1,2,3,…,
whose execution requires no insight, cleverness,
intuition, intelligence, or perspicuity,
and that sooner or later come to an end.”
Where did this idea come from? It turns out the algorithm we all know is the brain child of four people: Kurt Gödel, Alonzo Church, Alan M.Turning and Emil Post. Each contributed, in part, to the concept of the modern day algorithm, including functions, calculus of conversion and machines capable of manipulating symbols (computers).
Berlinski describes, in depth, the development of the algorithm concept. In one example, the Euler algorithm, he makes clear how important it is – for an analyst, anyway – to know, not just understand, the magic and flaws behind the algorithm. A case in point is the Numerical Solution For Ordinary Differential Equation, page 245.
“From a mathematical point of view, the original differential equation, contingent as it was upon the concept of the limit, has been replaced by a difference equation, one in which the derivative is approximated by a difference quotient, involving no limits whatsoever.
The Euler algorithm demonstrates this method:
BEGIN Euler
Input xΟ, yΟ, xf, h
x: = xΟ
y: = yΟ
WHILE (x<xf) DO
y: = y+ h* f(x,y)
x: = x+ h
OUTPUT x,y
ENDDO
END Euler
This simple algorithm, however, provides critical insight into the weakness of the algorithm:
“…the difference between an analytic and algorithmic solution to an ordinary differential equation is sharp and it is inescapable. An analytic solution completely penetrates the future or the past; an algorithmic solution acts only over a finite interval of time and space. The analytic solution returns a differential equation to a continuous world; an algorithmic solution, to a world that is discrete.”
Now I understand why the brakes failed in my Toyota.
Note: some of the reviews of the book were not very flattering, but this is DEEP READING with the good stuff starting on page 205. Do you think you have what it takes to DEEP READ? Have at it!

