SCE&G Weather Normalization Adjustment (WNA) Statistics

By: Scott Moore
March 9, 2012 · Posted in statistics · Comment 

Issue

Last month I discovered, much to my surprise, that even though I pay my SCE&G utility bill in full and on time, I actually owe SCE&G money – at one point more than $90. We had spent a lot of money when we replaced our HVAC unit, turned down the hot water heater, purchased a more efficient refrigerator, and installed LED light bulbs. Behind the scene and not identified on my bill is an adjustment that increases and decreases the actual amount owed based on a dubious calculation called Weather Normalization Adjustment (WNA).

The most distressing part is that this is done without warning or allowing us to opt out of this practice. I noticed this not as a result of a high electric bill but as a kWh price that jumped from $0.1045 to $0.1433. When I called to inquire about this rate hike, I was told of this invisible calculation. And this has been going on for more than a year. The information provided by SCE&G through the energy analyzer is also not correct since it includes the WNA. In other words, we are not given the actual monthly electric cost because it is not displayed. Perplexed, dismayed, and downright angry, I decided to look into this further.

Power companies use what is called Optical Climate Normals Method as a way to hedge against the variations experienced by the utility to balance their costs, including weather, fuel costs, and swings in the economy. WNA expands this hedging method to the unsuspecting customer’s bill! WNA is an adjustment to a monthly electric bill based on how that month’s temperature varies from an average, likely 15 years.

This process is different than a balanced payment which “zeros” at the end of the year (estimated annual bill divided by 12). The impetus for making these adjustments is the company’s real desire to lessen the burden on customers when there are significant temperature variances from normal that result in burdensome electric bills – especially for those on fixed incomes. Unfortunately, the process does not work. This is not because SCE&G folks are bad people or not trying. They are doing their best, but  the statistics will not produce the results they desire within the pilot program of one year.

Statistics- Timeline Importance

Typically, the devil is the details and this case is no different. I contacted the U.S. Department of Commerce National Climatic Data Center and inquired about the appropriate process for using historical data to predict future weather patterns to balance to zero within one year. It turns out that it is inappropriate to use average historical data with this method to predict future temperature. Unfortunately, it just doesn’t work. The results are the proof. The pilot program was to last one year, but as of last month I still owed $60 … oops … while most others owe much, much more!

Assumptions vs. Pay the Bill!

Assumption: Stationary Time Series. This model apparently assumes the past average will continue into the future. Unfortunately, the impetus for the process change was an outlier (colder than normal winter). This should have been the first clue not to take historical data to fix an anomaly. There is no indication that the opposite will occur any time soon to offset the observed outlier. It is possible that this could have been tested by not taking only the average temperature of the last 15 years, but by sampling 15 years of data from the last 60 years and evaluating the sample means. If nothing else, this would have made  clear to decision makers the significant uncertainty in predicting future weather patterns with historical data. This knowledge may have resulted in a different strategic decision.

Assumption: Trending Data. The alternative to stationary is trending data. That is to say that this formula requires the random variable to sum to zero. However, based on National Oceanic and Atmospheric Administration data, there is a trend, understood or not. Therefore, the formula will not sum to zero. Hence my $60 debit. Further complicating the process is the fact the model needs to be updated regularly, likely a timely and expensive proposition. Taking all the changes into account, it will still be 10 or more years before one can expect a balanced bill.

Assumption: No Modeling Necessary. Complex statistical problems need to be modeled. Modeling assists the analyst in understanding how well the formula describes situation or event. The hope is to avert a knee-jerk reaction based on erroneous assumptions or data. The best way to understand a new method is through a statistical model, which would provide a probability of the formula zeroing out a person’s electric bill after one year. This is highly unlikely, given the data, and the model would provide us with that information. So instead of having a one-year pilot program we might need a 20 year pilot program or, better yet scrap the formula and create another solution, which is what I suggest. The statistical model by itself is outdated when it comes to forecasting weather.

Assumptions: Balancing Costs Independent of Consumer Action. The WNA calculation is done on the kWh used. As an example, an abnormally cold temperature will kick in the WNA calculation, resulting in a consumer debit off the books! The result is consumers believe they have a lower electric bill and thus turn the thermostat up, believing all is well. The unintended consequence is to use more energy, not less.

The WNA circumvents economic “substitution,” whereby consumers make a different choice to maximize their needs. In this case substitution could include decisions to turning the heat down, switching to a different energy source, turning the heat off in a vacant room, or other actions designed to reduce energy use. In our household we thought all was OK. The result was we turned i[ the heat not knowing that the actually energy billed was more than what we were being charged. Energy use management does not work without full disclosure by the provider of the effects the consumer’s actions, something SCE&G preaches on a daily basis over the air waves!

Conclusion

My recommendation to SCE&G is to stop the program and have customers pay their debit bill over the next eight months before this program gets even more out of control than it already is. If the company wants to hedge, they can do it with their time and money – not mine. As for people who cannot pay their bill, during extreme weather events it is important to continue to work with those individuals to create a predictable balanced payment plan zeroing the balance over a predetermined number of months. Statistics can really assist in this process. The rest of us need to be allowed to opt out or join SCE&G’s balanced payment plan.

I chose to opt out, paid the balance of the WNA to SCE&G (likely the only SCE&G customer in the state that has a zero electric bill balance) and informed them I would happily pay the actual kWh usage in full in the future. I was informed that if I did that, they would send out a bill collector to collect my “funny money” WNA payment, which by the way, is supposed to balance after twelve months. I wonder what the statistics say about either of those independent events happening in the future!

Model Example Results

We ran a model below for demonstration purposes. A person might assume that a WNA should be close to zero for most of the time, but this is not true. In fact, it can get quite far from zero. The following plot shows percentiles for the maximum positive deviation from zero in 12 periods. (This is based on 100,000 random sequences of length 12.) Notice that 15 percent of the time, you never have a positive sum (balance). Fifty-five percent of the time the maximum deviation is 2 or less, while 95 percent of the time, the maximum positive deviation is less than 7. Results for a different standard deviation (σ) are the same, but with the maximum sum values multiplied by σ.

Example Modeled WNA

 

Bowl Championship Series: BCS

By: Scott Moore
January 17, 2012 · Posted in statistics · Comment 

Issue

The Bowl Championship Series (BCS) ranking process is a failure by any measure. The good news is that it finally appears the powers-that-be are going to work out a playoff system. But what is the root cause of the problematic BCS rankings? Why don’t they work? And what type of numerical system might meet the needs of a college football ranking system?

Statistics: Or Lack of!

A cursory review of BCS statistics quickly identifies the main problem, which is that is the people who created these “methods” do not appear to use any form of statistics. Further limiting the public’s understanding of these data is that the methods used to calculate rankings are not available. In other words, they have not been peer-reviewed in any meaningful way – and subscribe to the “trust me” method!

We know the accuracy is questionable at best or scandalous at worst, since we never read or hear about odds, confidence intervals, error, probability or other common statistical references when referring to these data. We also know intuitively that around each number there is error. If the error is not displayed, we know we cannot trust neither the numbers nor the authors – hence the ruckus around these rankings.

The Champ: Play-off

The great thing about a playoff for college football, like every other major sports league, is that you know the answer at the end. The best team on that day is the final one standing. End of debate. Rodney Harrison was recently asked who he liked in the NFL playoff and his answer was that it is hard to estimate since anything can happen in a playoff game. Well said. The challenge with a college playoff system is not that it wouldn’t work, because it would. Rather, it cuts the number of bowl games in half. Ouch, that is a lot of lost revenue!

The Champ: Numerical Calculation

I will disclose my bias for a playoff system since, as Rodney stated, anything can happen. But I believe there is likely a method that would, in fact, provide a numerical answer that most would agree with. First, the method needs to be made public, and it should be a method that has a history of success. “Odds” are, of course, one system, but in reviewing the odds estimates for the BCS championship game, there were many conflicting estimates with some odds makers suggesting a difference of only a point or two. In other words, it was too close to call.

Odds is an interesting process (better than the “look what I made up” numerical process), but probability estimates are the only real tool we have that could pick a winner. Odds and probability sound similar but in fact are quite different. The difference:

  • Probability is used to express sensitivity, specificity and predictive value. It is the proportion of people in whom a particular characteristic, such as a positive test, is present.
  • Odds is the ratio of two complementary probabilities. (PDF)

Along the probability line is a process called Evidence Based Management (EBM) which uses Bayesian analysis.

Bayes Theorem: a statistical principle for combining prior knowledge of the classes with new evidence gathered from data. See Introduction to Data Mining Chapter 5 pp: 228-229) (PDF)

EBM with Bayesian analysis states: What was thought before the test was done, combined with the test result is greater than what is thought after the test result. In other words, what you thought you knew before the football contest, the game, and what you think afterward – LSU is still No. 1 syndrome! It is this process that could provide an answer to who is No. 1 regardless of the date, time or opponent,* effectively removing the Rodney affect, but not likely the debate!

Conclusion

I am not sure that the BCS question is all that important or worth a lot of time in the context of solving the world’s problems, but if we are going to do the math, let’s at least try to make the process transparent, thoughtful and based on some sort of peer-reviewed science. Frankly, that is the only way my team will EVER have a chance at a BCS championship!

*Note: I do not address “style” points: a non-sportsmanship concept.

Gini Coefficient

By: Scott Moore
January 10, 2012 · Posted in statistics · Comment 

Issue

The Gini Coefficient, developed by the Italian statistician Corrado Gini, is the most commonly used measure of inequality. The coefficient varies between 0, which reflects complete equality and 1, which indicates complete inequality (one person has all the income or consumption, all others have none). (The World Bank) We wanted to use this method to look at income distribution throughout South Carolina, but first we had to understand the formula.

At first glance, there is a fair amount of math needed to calculate the coefficient. Make no mistake, this is and can be a very complex formula, utilizing probability sampling, bootstrapping, confidence intervals and other statistical methodology. We however, tried to keep it applied, and therefore used the most basic variation:

Gini Formula

After sorting out the symbolism, we created a sample problem (PDF).  The sample problem allowed us to work through the math in a structured process. The value of  ”doing the math” is that one gains an understanding as to how different variables affect the formula. The PDF contains two versions of the sample problem, one showing the formula and the other with plugged numbers. Note how unlike most of the available examples, we show a calculation needed prior to using the formula.  In this case (dollars strata) TIMES (number of persons). That’s because the analyst may need to do a number of calculations prior to applying the formula.

The Formula: Results

We applied the formula to the classic income distribution (wealth share) problem, using Census, Household and Family Income Report B19001, for each county in South Carolina. These data have 16 income strata. We found the formula is particularly sensitive to changes in the top two strata, not necessarily the number of persons, but average dollar value. In other words, ”the tail wags the dog” in this formula. The other critical piece of information needed is what value to assign the highest strata. The census uses approximately $400,000 as an approximation for the average top strata dollar figure.  They calculate this number using volumes of data, so it’s good enough for me.

After making our calculations, the formula really did reveal a number of interesting trends. One, the impact of the economy on higher wage earners – in the case of these data – is very delayed. In other words, higher income households continued to make money well into the latest recession. The other revealing attribute is the affect of a rising tide. A rising tide does in fact lift boats, but some higher than others and in the process it also sinks a few!  In this case,  households with higher incomes grew at a proportionally higher rate than those with lower incomes, and in some counties, household income (high and low) was hit particularly hard.

Conclusions

Now that you understand the formula, if you use these data, the Census Bureau has already done the Gini Coefficient income calculations for you! Yes, to my surprise the the Bureau has been doing this calculation since the 1990s.  The file is B19083. It may sound like I have given you a shortcut but now you have to figure out the new GUI American Community Survey interface. Good Luck!

Acknowledgement: Thank you to the staff at the US Census Bureau for assisting me in understanding key drivers of the Gini Coefficient.

JMP 9.0 – Applied Data Analysis

By: Scott Moore
December 18, 2010 · Posted in statistics · Comment 

JMP (jump): The Sharpest Tool in the Shed

I have been a JMP fan ever since being introduced to the product through the University of Minnesota statistics department. I have a used a number of statistical programs over the years, but JMP is a perfect fit for the wide range of data analysis work I perform for customers.

The Problem – What You Don’t Know

I have found that even simple data sets can, and do, hide their secrets effectively. In fact, it is amazing what we don’t know about even the most basic data sets unless the data is run through a statistical package. Here is an example (PDF) of state population estimates for 2009. There are only 50 data points. The tool used here is the basic and easy distribution analysis in JMP. Of the 50 states, four actually have populations considered outliers in the data set. The median population is about 4.1 million. It would seem that these four states would  be easy to identify, but that’s not necessarily true. I thought Wyoming, with a small population at 544,000, would also be an outlier – but that’s not so. All this information is at your fingertips with the click of a button.

A public example of more complex data set is collected by NOAA.  Here we are taking a small sample (416K points) of Sea Level Pressure Data and plotting. JMP 9.0 makes short work of this data set.

Setting Expectations

One of the best attributes of any statistical package is helping users understand their hypotheses, or assumptions about some aspect of the world. Each of us creates hypotheses every day as a part of life. Estimating commute times to work is one example. JMP allows us to not only think more clearly about these everyday data interactions, but to test them if we so desire. The hypothesis test is a statistical approach to testing a theory, according to The Economist, Numbers Guide. This test however, is not necessary to increase the basic understanding of data you are responsible for, whether it is financial, engineering, manufacturing, marketing, medical or administrative.

JMP – World Applications

JMP’s latest magazine describes uses of the program. (PDF) It is used in clinical trials, consumer products, product development and, of course, manufacturing. In today’s competitive marketplace, analyzing your data with a statistical package has become a business fundamental – much like cash flow. It’s something every business should have and deploy as part of an effective business strategy.

South Carolina Lazy? I don’t think so!

By: Scott Moore
August 6, 2010 · Posted in statistics · Comment 

Lazy:  When Noise Interferes with the Signal

Recently the Post and Courier ran an article highlighting a Business Week analysis that said South Carolina was the eighth laziest state in the union! Typically, subjective words used to describe data pop a red flag that warns me of impending data misuse doom.

The Data Set

The American Time Use Survey (ATUS), measures the time people spend doing various activities such as work, childcare, housework, watching television, volunteering and socializing. Hence this is an activity survey, not a lazy survey.  The data are collected by the Census Bureau and sponsored by the Bureau of Labor Statistics (BLS). I ran a query to understand the nature of the survey, data availability and error rates.  I called in the big guns from Global Pragmatica LLC to assist in converting the data from a ASCIDAT file to my JMP statistical software package format. These folks are experts in scripting and were a huge help. Thank you!

These data are collected regionally but analyzed nationally.  There is about a 90-percent chance, or level of confidence, that an estimate based on a sample will differ by no more than 1.6 standard errors from the “true” population value because of sampling error.   No estimates are made for state level data, and one University of Minnesota analyst stated she was not aware of state level error estimates.

It is inappropriate to analyze these data at the state level without calculating the error inherent in the data. If you did that, the analysis would be interesting but useless when comparing one state to another. Why?

Sports Activity Variable Analysis

For a test sample, I choose state level geography,with sports as a variable activity. This category captures the respondent’s participation in sports, exercise and recreational activities. To extract the data from the system, I used a tool created by the University of Minnesota called the American Time Use Survey -X.  The data needs to be processed by a statistical package, in this case my JMP program. An analysis of people participating in sports activities indicates that South Carolina would  rank 22nd out of 50 states  in terms of average minutes spent participating in sports in a 24 hour period – not bad. However, upon further inspection of South Carolina’s 2009 detailed weighted data, the state could rank  anywhere from 12th to 23rd,based on national error rates! (PDF) Unfortunately, since these are state data, the results are meaningless. That’s because the sample is simply too small, which is one of many buried statistical problems. This 2009 sample included a total of 200 people, where 166 recorded zero sports activity minutes. (PDF) In fact, the median is zero, which is another red flag for this data set.  A review of other states’ data revealed the same issue. This is a fascinating national data set. But unfortunately, analysis of non-national geographies yields unreliable results.

Real Estate and In-Migration

By: Scott Moore
July 30, 2010 · Posted in statistics · Comment 

The Post and Courier covered a local real estate economist’s presentation on the Real Estate Recovery.  Core to any real estate recovery is, of course, employment and wage growth.  However, a key statistic overlooked in this presentation was  migration patterns.  I had mentioned in my June Unemployment post that areas such as Detroit were having problems as a result of a declining labor force. This map from Forbes graphically displays the migration problems Detroit is having.

But when you click on Berkley, Charleston, or Dorchester counties, a picture of in-migration emerges.  This is an important indicator of growth potential because people have jobs when they move here, have decided to collect transfer payments (retirement) in this region or believe there is  potential for work in the area.

Another important statistic this map displays is how our rural population is moving to metro areas (short black lines).  This is important for two reasons: 1) unemployed people may have  the opportunity to find work and 2) if they find work, the state increases its tax base while decreasing social services.

Unlike the economist quoted in the article, I predict our real estate growth will be better than the median national real estate growth, primarily because of in-migration. This is not to say it will be even close to the bubble years (when we had an unrealistic and unsustainable market), but we should see steady improvement as a result of our region’s possibilities.

I am bullish, for a change. I do believe we have significant control over our own growth since the most important contributors to growth and sustainability include education, health care, public safety, urban planning,  convenience and  infrastructure (including biking and walking trails), which all are within our control.

Thank you to Keihly Moore for her assistance with this article.

Reverse Pivot Table: Matrix → xyz Format

By: Scott Moore
July 8, 2010 · Posted in statistics · Comment 

I like to add a few technical tools now and again.  Here is a sweet piece of programming that could save time converting  a matrix to a xyz table. The surface plot on my web page can be created by converting matrix data to an xyz format.

Issue

The problem I often run into with excel spreadsheets, is the data is defined in a matrix. Sometimes it is more convenient to reorgainze the data with a pivot table in order to represent the data as xyz coordinates.  At first glace it appears this should be an easy task, but with out the right excel module or the full version of sql- forget it.

Solution

A solution to this problem is provided by The Spreadsheet Page, a reverse pivot table. The link does an excellent job of explaining the process.  At the bottom a VBA link allows one to copy the code into your excel application. A big thank you to these guys for sharing this- it saved me many hours of work.

Survey Monkey

By: Scott Moore
March 2, 2010 · Posted in statistics · Comment 

One of the easiest methods to collect survey data is Survey Monkey (SM). What many do not realize is that this tool is a cost effective way to collect simple everyday samples. Who wants to go to lunch?  Give us some feedback on the meeting?  SM allows ten questions for free.  It is surprising the amount of data (no relation to quality) one can capture in 10 questions.

I have also used SM for larger research projects. OK, yes there are issues with online accessibility.  As an example in SC, 40 percent do not have a computer at home- so one needs to know the subject and audience to insure data is not unintentionally skewed- you all know the rules!

When this process is appropriate, I typically supplement the online survey with a phone call (reminder, especially when time is an issue) and spend measured time confirming emails and contacts. This process however saves a significant amount of time in the end, especially if it is a survey that is repeated time and time again.  SM output quality is quite high- as good as YOUR process. SM provides a simple but effective interface to do what you need.

Groundhog Day- Forecasting Made Simple!

By: Scott Moore
February 2, 2010 · Posted in statistics · Comment 

Groundhog Day is one of my favorite holidays. As an impact economist and data statistician, this holiday represents the truth about forecasting. It has all the elements of forecasting in a simple-to-understand format. The good stuff like the null and alternative hypothesis, geography, time, historical data sets, measurements are applied against a “strict ” criteria, sources, and witnesses to boot! The result is clean and understandable to all. If the groundhog sees its shadow, expect six more weeks of winter;  if not, the season will likely be a little shorter.  Brings a grin to my face.

In all seriousness, it is a day when we need to thank the persons that work very hard every day in the  forecasting sciences. In particular, the nod goes this year to NOAA staff who forecast hurricanes in the South, tornadoes in the Midwest and fire dangers in the West. These efforts are real and intense using sophisticated models driven off of extensive databases where engineering, science, statistics and social science all come together to try to warn us of events that can and do affect our lives. Thank you.  As for Smokey, my groundhog mascot…no shadow today.  However, if we could get all the groundhogs on Google, I wonder what that forecast would look like! I guess we need more research.

You Cannot Prove the Null

By: Scott Moore
January 21, 2010 · Posted in statistics · Comment 

This sounds like statistics better go to the next site–stop before you do that. I think we can help. In 2009, I joined the SAS JMP® software user group and receive periodic updates and explanations on different aspects of statistics. Since using JMP®, my whole world has become significantly (no pun intended) less complicated.

The latest issue of JMPer Cable (pdf) (Issue 26 Winter 2010 pp 6-9), has a short and informative article by Ramirez and Bailey on significant testing. Questions we analysts need to ask ourselves, from a data standpoint include: is there a difference, does the data tell me anything, or is the simple comment “that’s interesting” good enough or do we need to have that discussion with the marketing guys again? Regardless, this article explains the null hypothesis (no change) and the alternative hypothesis in a few easy to understand pages. The authors do this with short informative examples and more importantly the intuitive computer display from the JMP® statistical package. This article is a conversation, not a lecture, allowing one to absorb concepts that frankly can be confusing.

Next Page »