Bowl Championship Series: BCS

January 17, 2012 · Posted in statistics · Comment 

Issue

The Bowl Championship Series (BCS) ranking process is a failure by any measure. The good news is that it finally appears the powers-that-be are going to work out a playoff system. But what is the root cause of the problematic BCS rankings? Why don’t they work? And what type of numerical system might meet the needs of a college football ranking system?

Statistics: Or Lack of!

A cursory review of BCS statistics quickly identifies the main problem, which is that is the people who created these “methods” do not appear to use any form of statistics. Further limiting the public’s understanding of these data is that the methods used to calculate rankings are not available. In other words, they have not been peer-reviewed in any meaningful way – and subscribe to the “trust me” method!

We know the accuracy is questionable at best or scandalous at worst, since we never read or hear about odds, confidence intervals, error, probability or other common statistical references when referring to these data. We also know intuitively that around each number there is error. If the error is not displayed, we know we cannot trust neither the numbers nor the authors – hence the ruckus around these rankings.

The Champ: Play-off

The great thing about a playoff for college football, like every other major sports league, is that you know the answer at the end. The best team on that day is the final one standing. End of debate. Rodney Harrison was recently asked who he liked in the NFL playoff and his answer was that it is hard to estimate since anything can happen in a playoff game. Well said. The challenge with a college playoff system is not that it wouldn’t work, because it would. Rather, it cuts the number of bowl games in half. Ouch, that is a lot of lost revenue!

The Champ: Numerical Calculation

I will disclose my bias for a playoff system since, as Rodney stated, anything can happen. But I believe there is likely a method that would, in fact, provide a numerical answer that most would agree with. First, the method needs to be made public, and it should be a method that has a history of success. “Odds” are, of course, one system, but in reviewing the odds estimates for the BCS championship game, there were many conflicting estimates with some odds makers suggesting a difference of only a point or two. In other words, it was too close to call.

Odds is an interesting process (better than the “look what I made up” numerical process), but probability estimates are the only real tool we have that could pick a winner. Odds and probability sound similar but in fact are quite different. The difference:

  • Probability is used to express sensitivity, specificity and predictive value. It is the proportion of people in whom a particular characteristic, such as a positive test, is present.
  • Odds is the ratio of two complementary probabilities. (PDF)

Along the probability line is a process called Evidence Based Management (EBM) which uses Bayesian analysis.

Bayes Theorem: a statistical principle for combining prior knowledge of the classes with new evidence gathered from data. See Introduction to Data Mining Chapter 5 pp: 228-229) (PDF)

EBM with Bayesian analysis states: What was thought before the test was done, combined with the test result is greater than what is thought after the test result. In other words, what you thought you knew before the football contest, the game, and what you think afterward – LSU is still No. 1 syndrome! It is this process that could provide an answer to who is No. 1 regardless of the date, time or opponent,* effectively removing the Rodney affect, but not likely the debate!

Conclusion

I am not sure that the BCS question is all that important or worth a lot of time in the context of solving the world’s problems, but if we are going to do the math, let’s at least try to make the process transparent, thoughtful and based on some sort of peer-reviewed science. Frankly, that is the only way my team will EVER have a chance at a BCS championship!

*Note: I do not address “style” points: a non-sportsmanship concept.

Transportation Economic Development Impact System (TREDIS)

December 6, 2011 · Posted in TREDIS · Comment 

The Transportation Economic Development Impact System (TREDIS), is a product developed by the Economic Development Research Group, Inc (EDR). It is an integrated framework for transportation planning and project assessment – designed to cover a wide range of applications, from looking at benefit/cost impacts of a single transportation investment, to analyzing the macroeconomic impacts of alternative long-range plans.

It models passenger and freight travel across all modes, and it assesses costs, benefits, and impacts across a range of economic responses and societal perspectives. To  integrate this range of features, TREDIS operates as four separate but interconnected modules:

  • Travel Cost
  • Market Access
  • Economic Adjustment, and
  • Benefit Cost

For more information see:

Trade Environmental Assessment Model (TEAM)

August 15, 2011 · Posted in environment · Comment 

Environmental Economic Impacts

TEAM is a suite of software tools developed by Abt Associates for the Environmental Protection Agency (EPA) National Center for Environmental Economics to assess the environmental impacts of international trade agreements.

The ECONOMIC resolution is 4 digit NAICS sector level.

The ENVIRONMENTAL resolution TEAM currently analyzes are four separate environmental release and resource use categories: air, water, and carbon dioxide pollutants along with energy consumption by fuel type.

The GEOGRAPHIC resolution is the state level.

This tool is an exciting, and likely the first, applied environmental economic impact calculation tool. Imagine for a moment a community considering the development of an industrial park with a certain mix of industries.  Wouldn’t it be useful to estimate the environmental economic impact this development could have on a regional community. TEAM can assist in this evaluation.

The EPA has done an excellent job in development, analysis, and peer review. The model is based on standard economic methodology, grounded with sources such as Duchin and Steenge, Miller and Blair, Leontief, and the National Research Council among others.

Further, as a result of the model design, outputs dovetail with; transportation, economic, land use, and environmental impact models as part of the final economic/environmental impact deliverables.

Utility Industry Economic Environmental Impact Example

August 15, 2011 · Posted in environment · Comment 

This is an example of output created by the TEAM model.  The example uses the sector 2211- Electric Power Generation, Transmission and Distribution to demonstrate a portion of the model functionality.  This example is only scratching the surface and for introductory purposes only.  In my view the possibilities for TEAM are limitless. (pdf)

I have added an example from Miller and Blair (2009), which demonstrates the matrix algebra used to create these coefficients. The process is intuitive and flexible, demonstrating the value of applied industry – environmental analysis. (pdf)

Grocery Stores: The Profitability Index

July 6, 2011 · Posted in economics · Comment 

Issue

The Post and Courier recently posted an article titled Super Market Central that raises more questions than it answers. The article compares the number of grocery stores in affluent Mount Pleasant, S.C., compared with North Charleston, S.C. One Mount Pleasant store owner was quoted as saying “We want to be in markets where there are households with families.” Actually, North Charleston and Mount Pleasant both have an average of 2.5 persons per household! (Source: City-data.com) After review a number of data-intensive sites such as city data and the U.S. Census, I was able to confirm many points in the article and, like the author, identify differences in these communities. Many, such as race and income, are obvious. However, this still does not explain the disparity between the numbers of grocery stores in one community versus another.

Empirical Research

It turns out this is a significant problem, not only here in the low-country, but across the United States. One source “Closing the grocery gap in low-income areas,” identifies key issues. Other research from CA Food Policy Advocates suggests:

“One promising model, among others, that has emerged involves the conversion of existing corner stores, typically depending upon sales of alcohol, tobacco and sodas, into neighborhood groceries selling healthy foods. Because so many of the necessary costs — rent, utilities, space, and management possessing some degree of both business skills and familiarity with neighborhood preferences — already are present, the conversion can be relatively inexpensive and, in fact, provide the store with additional opportunities to be profitable. A viable neighborhood grocery store represents multiple policy gains, including food access, nutrition and fitness, transportation, community development and crime reduction.”

Unfortunately, the quote describes symptoms, but not the cause of the problem. The root cause is how capital is rationed to achieve the highest return. The Post and Courier article states margins for grocery stores (a basic commodity) are around 1.5 percent. This is pretty poor even by commodity standards. In fact, one would have to wonder why anyone would go into this business, especially a small business, as suggested above. There simply are not enough retained earnings to make a living! However, to understand why Mt. Pleasant has more grocery stores than North Charleston, we must look for the answer among the financial tools used to make capital allocation strategic decisions – in other words, to build new grocery stores.

The Profitability Index

Most firms’ capital budgeting process uses some sort of discounted cash flows, the most common being net present value (NPV). Although there are other methods, such as payback or average accounting return, we assume our grocery stores use a variation of NPV.  In the capital budgeting process, a project is accepted if NPV is greater than 1 and rejected if it is less than 1. That basically means if the project is accepted it will make money (hopefully). We will assume that both North Charleston and Mount Pleasant grocery projects have positive NPVs. So far so good. Unfortunately, investment capital is limited, especially when risk is factored in. We therefore can choose only one project.  To do that, we run both projects back through the profitability index.

Profitability Index (PI) = Present Value (PV) of cash flows subsequent to initial investment/ Initial Investment

Again if PI is greater than 1 we accept the project, if it’s less than 1 we reject it. When using NPV, we make a go, no-go decision. However, when applying PI, projects are ranked according to the ratio of present value to initial investment. The project with the best potential return (greater than 1) is funded. It is Mount Pleasant in this case. The project is funded, as the article states, not because of corn flake sales, but because of special item sales, which less affluent customers avoid. Special items sales create a better return (profit) on capital invested in the Mount Pleasant location.

Conclusion

Both projects are in fact profitable. But one provides a slightly better return. At this point corporate culture also comes into play. For example, “what we did last week, which worked, will likely work in our next venture” … and so on. One can see this pattern in Mount Pleasant – the me too effect. This happens in part because firms generate positive NPVs  because of prior investments, leveraging their current market position. An organization does need to make a profit, whether it is the small corner store or a large grocery chain. Without that profit, the store will cease to exist.

In the end a different model is needed (not currently in the domain of the typical grocery store) that incorporates social networking, transportation, specific product offerings, efficient security and product distribution. This comprehensive model leverages capital not only for the current project, but for indirect cash flows of  future business ventures yet to be determined in the same locale. Extending the scope of the investment decision breaks the current boom-bust grocery store location cycle. The question is how to get business owners to adopt this perspective.

Hotel Impact on “Tourism”: Methodology for Estimating Economic Impact

June 15, 2011 · Posted in local industry · Comment 

Issue

This impact analysis was created by Julie Flowers, a retired statistician for the South Carolina Parks and Recreation Department, using Travel Industry of America (U.S. Travel Association) data. Julie is a pretty sharp statistician and does a nice job of outlining what is important when thinking about tourism impacts.

Analysis

Ms. Flowers uses IMPLAN to create a final economic impact based on TIA data.  To create an economic impact, we need either spending or employment patterns by industry.  As Ms. Flowers points out, there is no “tourism” industry. Our closest proxy is the Hospitality and Leisure super-sector. When it comes to hospitality, the big dog is lodging (hotel and motels).  This is not apparent at first glance, but without this industry there is no tourism other than tenting! Within the hospitality industry it is clear that other businesses leverage lodging’s strength.

Hotel Impacts

TIA developed a strong method for collecting and analyzing hospitality data that is generally available to the public. What comes to light when exploring economic relationships within the travel industry is that for every dollar spent on lodging,  $3.60 is spent on travel-related items – food, retail, recreation, etc. In employment, that number leaps to 5.3 jobs* for every job in the hotel industry. In other words, the hotel industry  provides the anchor for other businesses to flourish. TIA data states that every $1 million in travel spending creates 13.5 jobs outside the hotel industry. Therefore, every million dollars spent in hotels likely generates over 70 jobs in hospitality-related industries!

Conclusion

Hospitality analysis requires a fair amount of data to create economic impacts, in this case supplied by TIA. Furthermore, TIA provides a solid methodology for justifying these spending patterns.  Ms. Flowers’ process starts with a analysis of spending, then plugs the data into an economic analysis program, IMPLAN.  With this transparent method further analysis is possible, uncovering the deeper relationships within this cluster. In this case, the end result provides a more clear picture of how supporting the hotel industry leads to significant gain in both employment and industry output within hospitality related industries.

*Spreadsheet addition error in report.

 

SC State Unemployment March 2011

April 25, 2011 · Posted in unemployment · Comment 

Issue

I do not review these data every month because unemployment data is more useful when the analysis focuses on the labor trends. Unfortunately, the Post and Courier is more interested in reporting hype and misinformation than telling us what the data actually says. This is a shame since they are wasting people’s time, including economist Frank Hefner’s, who I am sure pointed out what I am about to say, based on his comments:

“College of Charleston economist Frank Hefner said the unemployment rate does not tell the whole story. The recovery in the past year has been slow, he said, and fewer people are in the workforce, such as those individuals who are discouraged and no longer looking for work.”

The bottom line, which Dr. Hefner eluded to, is there is no reason to be “elated” about in this jobs picture!

Incorrect Analysis: Again

Jezzz.  I constantly feel that I need to correct the Post and Courier on this point! Mixing data sets to fit the story misleads the reader. Adjusted and unadjusted unemployment numbers are two completely different data processes – apples and oranges. State adjusted employment for the month of March increased by 3,746.  A little different than the 15,700 noted in the article! (PDF)

Unemployment Analysis: Current Employment Statistics Benchmark

This analysis uses data from the Bureau of Labor Statistics. Stated above, South Carolina gained 3,746 jobs in March. The major sticking point, however, is the labor force dropped by 3,199 persons from February to March, and by almost 18,000 from March 2010. Three numbers come together to create the unemployment rate: labor force (LF), employment and unemployment.  It is not possible to adjust one with out adjusting one of the others. If we assume a LF scenario that is neither growing or declining – very conservative considering South Carolina’s population is growing – we see the unemployment rate remained flat at 10 percent from February 2011 to March 2011. See PDF.

Regardless of the meager employment growth, some is better than none! However, employment changes by the minute in the state. So what does the final employment picture look like for March? The Current Employment Statistics program (adjusted) provides clues to the result of all those changes from month to month and year to year. These data explain why an accounting professional the Post and Courier interviewed may be challenged in finding employment. The business services industry, accountants included, actually declined in employment from the previous month. Even so, over the past year there has been an improvement of almost 20,000 jobs in this major sector. Unfortunately, 94 percent are not in the accounting field! Where was the growth? It turns out it is right where it has been and should be this time of year, in leisure and hospitality.

Conclusion

It appears that the current recovery, which is already lagging significantly behind other recoveries, is going to be slow at best. With an increase in commodities prices (essentially a excise tax on disposable income – i.e.  fuel) and the loss  of 10,500 jobs in government employment this past year, “elated” would not describe the way many people feel about the current state of economy.

Labor Market Information: An Overview

March 21, 2011 · Posted in workforce information · Comment 

Issue

My friend and business associate Gary Crossley provides labor market information (LMI) to a variety of organizations nationwide. Recently he sent me one of his overview presentations to post on Moore Data.

Most analysts believe they have a grip on labor market data, but what Gary and I find is that this is not so. The reality is analysts tend not to stray very far from unemployment statistics, rarely giving any weight to other key data sets that fill in the labor market knowledge gap. Below is Gary’s presentation. I have provided the appropriate links to corresponding Bureau of Labor Statistics (BLS) web sites.

Presentation Analysis (PDF)

Slides 9-11: Employment: Provides a general definition for employed and unemployed along with basic calculations.

Slides 14,15: Quarterly Census of Employment and Wages (QCEW) discusses data sources and uses of data.

Slides 17,18: Current Employment and Statistics (CES) discusses data sources and uses of data.

Slides 20 – 23: Occupational Employment Statistics (OES) discusses data sources, uses and programs.

Slides 25, 26: Local Area Unemployment Statistics (LAUS) discusses data sources and uses of data. For detailed calculation of unemployment see Unemployment Calculation Post.

Slide 28: Mass Layoff Statistics (MLS) discusses general program.

Slides 29, 30: These slides provide a link to the Bureau of Labor Statistics Handbook of Methods.  This is a particularly good resource for analysts. The handbook not only describes method, but also what programs use these data as an engine to drive their information programs. The handbook also provides a detailed account of data limitations, which is a plus when determining appropriateness of data for various uses. Slide 30 provides a nice “cheat sheet” to the different data sets.

Slides 33 – 45: These slides list the different entities that provide data portals. Of course you can analyze these data yourself, but the challenge is understanding the nuances of the data so that one does not come to the wrong conclusion.

Slides 45 – 47: These slides touch on supply and demand, training, and military data. However, one of the more interesting and eye-opening data sources is the Census data set of military service-related disabilities.  These data can be found at Veterans.

Conclusion

LMI data is readily available on the web. Most competent analysts will use two or three data sets attempting to triangulate to find the “answer.” Gary has provided a basic reference that will assist the user in thinking about the different data sources and how they may help you answer your labor market question. Thanks Gary.

 

BEA RIMS II and Lucky Charms

February 8, 2011 · Posted in economic development · Comment 

Recently I have had number of questions concerning the Bureau of Economic Analysis (BEA), Regional Input-Output Modeling System, or RIMS II, data set. I use these data primarily for scoping, to determine whether there are any surprises in the economic study region that may help me  formulate a plan. It performs superbly in this application. From the RIMS II handbook pp1:

“Using RIMS II for impact analyses has several advantages. RIMS II multipliers can be estimated for any region composed of one or more counties and for any industry or group of industries in the national I-O table. The cost of estimating regional multipliers is relatively low because of the accessibility of the main data sources for RIMS II. According to empirical tests, the estimates based on RIMS II are similar in magnitude to the estimates based on relatively expensive surveys. To effectively use the multipliers for impact analysis, users must provide geographically and industrially detailed information on the initial changes in output, earnings, or employment that are associated with the project or program under study. The multipliers can then be used to estimate the total impact of the project or program on regional output, earnings, or employment.”

RIMS is a solid input-output modeling system for the right phase of a project because it is able to provide final multipliers for many different industries.  However, this is where the capability ends.  It is like eating a bowl of Lucky Charms. You can find yellow moons, orange stars and green clovers the marshmallows, but no meat and potatoes. In this case, the good stuff is the impact on affected local industries. With RIMS it is necessary to know that information up front, which is unlikely. From the handbook manual case study, pp15:

“These changes consist of the decline in the purchases of goods and services that results from closing the military base and the decline in purchases by military personnel. For both types of purchases, the user must determine which purchases occur in the economic area and then must show these purchases in producers’ prices.”

So although we know something about direct industries – likely a guess determined through a review of an available budget – we have no way of knowing the relationships of these impacts to the broader economy other than the multiplier, which gives us an accurate, yet gross estimate of impacts.

Below is an example of the RIMS II output, pp18:

RIMS Out put

Missing from the basic calculation are the indirect, induced and industry details such as taxes, proprietor income and relationship of these data to other regions either within or outside the study area. In other words, we do not have a complete picture of the money flows as a result to a change in the economy. Although some calculations can be completed using Type I and Type II data, it is these missing details that fill out the play book for a competent economic development analysis and subsequent plan.

The Advent of the Algorithm

January 17, 2011 · Posted in method · Comment 

The Post and Courier recently printed an unusual column discussing “Deep Reading.” It is an excellent article by Laura Casey discussing a lost art.

“Deep Reading, or slow reading, is a sophisticated process in which people can critically think, reflect and understand the words they are looking at. With most this means slowing down – even stopping and rereading a page or paragraph if it doesn’t sink in…”

Deep reading seems out of sync with today’s “modern” communication tools, like Twitter, and it is.  But what does this have to do with the algorithm? As a data guy, the algorithm is my life, and the only way to understand what makes it tick is deep reading! My choice for this subject is The Advent of the Algorithm, by David Berlinski.

We experience the algorithm everywhere in our daily lives – in the operation of the toaster, the dentist’s office, our vehicles and in most, if not all, of the communication tools we use daily. I reviewed definitions of the algorithm online, but none have the elegance of Berlinski, in his book, The Advent of the Algorithm:

“In the logician’s voice:

an algorithm is a finite procedure,

written in a fixed symbolic vocabulary,

governed by precise instructions,

moving in discrete steps, 1,2,3,…,

whose execution requires no insight, cleverness,

intuition, intelligence, or perspicuity,

and that sooner or later come to an end.”

Where did this idea come from? It turns out the algorithm we all know is the brain child of four people: Kurt Gödel, Alonzo Church, Alan M.Turning and Emil Post. Each contributed, in part, to the concept of the modern day algorithm, including functions, calculus of conversion and machines capable of manipulating symbols (computers).

Berlinski describes, in depth, the development of the algorithm concept. In one example, the Euler algorithm, he makes clear how important it is – for an analyst, anyway – to know, not just understand, the magic and flaws behind the algorithm. A case in point is the Numerical Solution For Ordinary Differential Equation, page 245.

“From a mathematical point of view, the original differential equation, contingent as it was upon the concept of the limit, has been replaced by a difference equation, one in which the derivative is approximated by a difference quotient, involving no limits whatsoever.

The Euler algorithm demonstrates this method:

BEGIN Euler

Input xΟ, yΟ, xf, h

x: = xΟ

y: = yΟ

WHILE (x<xf) DO

y: = y+ h* f(x,y)

x: = x+ h

OUTPUT x,y

ENDDO

END Euler

This simple algorithm, however, provides critical insight into the weakness of the algorithm:

“…the difference between an analytic and algorithmic solution to an ordinary differential equation is sharp and it is inescapable. An analytic solution completely penetrates the future or the past; an algorithmic solution acts only over a finite interval of time and space. The analytic solution returns a differential equation to a continuous world; an algorithmic solution, to a world that is discrete.”

Now I understand why the brakes failed in my Toyota. :)

Note: some of the reviews of the book were not very flattering, but this is DEEP READING with the good stuff starting on page 205.  Do you think you have what it takes to DEEP READ? Have at it!

Next Page »