JMP 9.0 – Applied Data Analysis
JMP (jump): The Sharpest Tool in the Shed
I have been a JMP fan ever since being introduced to the product through the University of Minnesota statistics department. I have a used a number of statistical programs over the years, but JMP is a perfect fit for the wide range of data analysis work I perform for customers.
The Problem – What You Don’t Know
I have found that even simple data sets can, and do, hide their secrets effectively. In fact, it is amazing what we don’t know about even the most basic data sets unless the data is run through a statistical package. Here is an example (PDF) of state population estimates for 2009. There are only 50 data points. The tool used here is the basic and easy distribution analysis in JMP. Of the 50 states, four actually have populations considered outliers in the data set. The median population is about 4.1 million. It would seem that these four states would be easy to identify, but that’s not necessarily true. I thought Wyoming, with a small population at 544,000, would also be an outlier – but that’s not so. All this information is at your fingertips with the click of a button.
A public example of more complex data set is collected by NOAA. Here we are taking a small sample (416K points) of Sea Level Pressure Data and plotting. JMP 9.0 makes short work of this data set.
Setting Expectations
One of the best attributes of any statistical package is helping users understand their hypotheses, or assumptions about some aspect of the world. Each of us creates hypotheses every day as a part of life. Estimating commute times to work is one example. JMP allows us to not only think more clearly about these everyday data interactions, but to test them if we so desire. The hypothesis test is a statistical approach to testing a theory, according to The Economist, Numbers Guide. This test however, is not necessary to increase the basic understanding of data you are responsible for, whether it is financial, engineering, manufacturing, marketing, medical or administrative.
JMP – World Applications
JMP’s latest magazine describes uses of the program. (PDF) It is used in clinical trials, consumer products, product development and, of course, manufacturing. In today’s competitive marketplace, analyzing your data with a statistical package has become a business fundamental – much like cash flow. It’s something every business should have and deploy as part of an effective business strategy.
Productivity, Wages and Demand
Productivity and Wages: Successful Business Partners (PDF)
“Productively measures how efficiently economic inputs are converted into output, which are the goods and services that business sells. So when more is produced with the same or less we can increase income (that is value added) and potentially increase profit.”
Productively has one underling assumption, demand. If there is no demand then productively is not a factor. If demand declines, similar to our current situation, productively is everything. We see companies trying to deal with lack of demand by laying off large numbers of employees. Those left, do work harder and longer hours, but likely are producing significantly less as a result of decreased demand. This is one of the reasons wages are flat, there is simply no way to increase prices even-though everyone is working harder.
Productivity and Wages: Successful Business Partners
This paper provides a sample calculation which demonstrates how to think about wages and productively when plugged into your demand formula.
Sources:
Bureau of Labor Statistics – Productivity
CEPR – Price Byte
South Carolina Lazy? I don’t think so!
Lazy: When Noise Interferes with the Signal
Recently the Post and Courier ran an article highlighting a Business Week analysis that said South Carolina was the eighth laziest state in the union! Typically, subjective words used to describe data pop a red flag that warns me of impending data misuse doom.
The Data Set
The American Time Use Survey (ATUS), measures the time people spend doing various activities such as work, childcare, housework, watching television, volunteering and socializing. Hence this is an activity survey, not a lazy survey. The data are collected by the Census Bureau and sponsored by the Bureau of Labor Statistics (BLS). I ran a query to understand the nature of the survey, data availability and error rates. I called in the big guns from Global Pragmatica LLC to assist in converting the data from a ASCIDAT file to my JMP statistical software package format. These folks are experts in scripting and were a huge help. Thank you!
These data are collected regionally but analyzed nationally. There is about a 90-percent chance, or level of confidence, that an estimate based on a sample will differ by no more than 1.6 standard errors from the “true” population value because of sampling error. No estimates are made for state level data, and one University of Minnesota analyst stated she was not aware of state level error estimates.
It is inappropriate to analyze these data at the state level without calculating the error inherent in the data. If you did that, the analysis would be interesting but useless when comparing one state to another. Why?
Sports Activity Variable Analysis
For a test sample, I choose state level geography,with sports as a variable activity. This category captures the respondent’s participation in sports, exercise and recreational activities. To extract the data from the system, I used a tool created by the University of Minnesota called the American Time Use Survey -X. The data needs to be processed by a statistical package, in this case my JMP program. An analysis of people participating in sports activities indicates that South Carolina would rank 22nd out of 50 states in terms of average minutes spent participating in sports in a 24 hour period – not bad. However, upon further inspection of South Carolina’s 2009 detailed weighted data, the state could rank anywhere from 12th to 23rd,based on national error rates! (PDF) Unfortunately, since these are state data, the results are meaningless. That’s because the sample is simply too small, which is one of many buried statistical problems. This 2009 sample included a total of 200 people, where 166 recorded zero sports activity minutes. (PDF) In fact, the median is zero, which is another red flag for this data set. A review of other states’ data revealed the same issue. This is a fascinating national data set. But unfortunately, analysis of non-national geographies yields unreliable results.
Real Estate and In-Migration
The Post and Courier covered a local real estate economist’s presentation on the Real Estate Recovery. Core to any real estate recovery is, of course, employment and wage growth. However, a key statistic overlooked in this presentation was migration patterns. I had mentioned in my June Unemployment post that areas such as Detroit were having problems as a result of a declining labor force. This map from Forbes graphically displays the migration problems Detroit is having.
But when you click on Berkley, Charleston, or Dorchester counties, a picture of in-migration emerges. This is an important indicator of growth potential because people have jobs when they move here, have decided to collect transfer payments (retirement) in this region or believe there is potential for work in the area.
Another important statistic this map displays is how our rural population is moving to metro areas (short black lines). This is important for two reasons: 1) unemployed people may have the opportunity to find work and 2) if they find work, the state increases its tax base while decreasing social services.
Unlike the economist quoted in the article, I predict our real estate growth will be better than the median national real estate growth, primarily because of in-migration. This is not to say it will be even close to the bubble years (when we had an unrealistic and unsustainable market), but we should see steady improvement as a result of our region’s possibilities.
I am bullish, for a change. I do believe we have significant control over our own growth since the most important contributors to growth and sustainability include education, health care, public safety, urban planning, convenience and infrastructure (including biking and walking trails), which all are within our control.
Thank you to Keihly Moore for her assistance with this article.
Deep Water Horizon Rig Employment
Off-Shore Moratorium
The economic impact of shutting down a deep water drilling rig is no doubt massively expensive. But employment impact is different. As a benchmark, the employment on the BP rig was 126 persons. It is reported that wages average about 100k per year for those workers. However, I could not find that person anywhere, except in the management ranks according to National OES data! Furthermore, most if not all support workers, according to the BLS, make less than 1/2 that unsubstantiated amount.
Interestingly, as soon as we get into a multiplier discussion, the numbers start off ridiculously high and go up from there. But a multiplier over two is not reasonable or supported in any research and especially not in this case. One primary reason is the service nature of the JOBS we are talking about, NOT the industry multipliers.
Rig Count
Of all the rigs out there, only 4% are off shore! (Baker Hughes) Therefore few if any support persons are going to be affected by a stoppage of drilling; as a result of 94 percent of their support services not being located off-shore. A potential economic impact is close to 400 million a year, which includes a multiplier. More likely however, those wages are moved to another location, or just paid, the result being no impact. The reason is most companies can not afford to idle (loose) those skilled workers.
Simple calculation: (number of rigs, 33 * employment, 126/rig * wages, 100k)
Unemployment Definition (BLS)
Unemployment Data
Unemployment numbers are one of the few data sets that are reported and analyzed in the media. Unfortunately, most of the current media analysis is flawed because writers don’t understand the definition of unemployment as reported by the Bureau of Labor Statistics (BLS). Here is a link to that definition(pdf). This post is to help you understand the basic definition of unemployment.
Key Points for Everyday Analysis:
Civilian Labor Force (labor force): These are the people who are counted, age 16 and older. It does not include folks in institutions such as prisons, nursing homes, military, etc.
Employed: This term applies to anyone did any work on the 12th of each month as paid employee at a farm or business, 15 hours or more in a family business, or had job but was on vacation, sick, absent due to bad weather, etc. Even those holding more than one job are counted only once.
Unemployed: People who weren’t employed on the 12th, but were available to work and were looking for work over the past four weeks.
Unemployment Rate Calculation: The ratio of unemployed to civilian labor force, expressed as percent.
Analysis Discussion:
What happens in the labor force makes a difference in the unemployment rate – specifically when people enter and exit. As an example, if more people enter the labor force than can find a job, the unemployment rate goes up.
Always consider the three classifications in the calculation; labor force, employment and unemployment. Focus on trends, not individual points. Compare trends, not points, from one year to the next. Think about what happens in the labor force during the year, such as a big layoff, teachers being hired in the fall, hurricanes. Review the Current Employment Statistics (CES) establishment survey data for clues to employment changes by industry.
Don’t confuse the neighborhood unemployment rate with the official BLS unemployment rate. True, if your neighbor is unemployed her unemployment rate is 100 percent, but this number has no correlation to the official unemployment rate.
Think of the unemployment rate as a tide. Thus, a drop of water tells you very little. Only by standing back and looking at the coastline can you discern the effects of water level change. You may not like the BLS definition, but the trend it produces is powerful.
How is unemployment calculated? See Unemployment Calculation Method and Documentation
Estimated Energy Use
Linked is graphic showing the different sources and uses of energy in the United States. I first ran across this graphic in Input – Output Analysis (Miller and Blair) in their discussion of energy and economic impacts. I encourage you to “do the the math” addition or subtraction in this case. These data are often referenced as sources and uses of energy. As an example, residential structures consume over 35 percent of the usable energy. This graphic details how that can be possible.
These data also bring into question the green energy movement including; jobs, energy savings, and industry impact. It is clear to this analyst that conservation would have a greater impact on our energy use short term, than new forms of green energy like wind or solar. This is not to say these new forms of energy generation are not important, only that there is the reality of our current energy system efficiency (see graphic) as a result of the system the United States currently deploys.
LAUS Unemployment Calculation Method and Documentation
Unemployment Method Description
Each month the Bureau of Labor Statistics (BLS) publishes national, state and local unemployment statistics. The results are reported in the local media, usually with a brief analysis along with a human interest story. Unfortunately, the story often does not match the data. One reason is the users are not familiar with the strict definition of unemployment as defined by the BLS. I would encourage anyone who has doubts about that definition to review it first before jumping into this detailed post of unemployment calculation. See definition of unemployment.
Statistics: Root of the Published Results
Calculating unemployment is a statistical process. You could stop here, but I encourage you to keep reading since this post gives the sources of those calculations and breaks it all down into bit size (non-math) pieces. We will give a brief explanation of each part of the process with source documents, where available, and links, if necessary, to key terms.
Why is the unemployment calculation process so complex? There are two primary reasons: 1) timing and 2) cost. The series is published every month for a number of labor market regions. Wouldn’t it be great if we could go out and actually count the number of people who are employed or unemployed, and just for fun, determine how many people are in the labor force every month? This process would be labeled a census. In this country, that is done once every 10 years.
Even if we could compile and report the results each month, imagine the expense involved. The next best process then, is to survey (estimate) the population and estimate the number who fall into each category, along with some general demographic information. The process starts with a monthly employment survey, administered by the Census Bureau, named the Current Population Survey or CPS. The data from this survey are used by BLS in statistical models to calculate unemployment rates.
Background
Let’s keep in mind the unemployment rate published by general media is the U-3 rate. There are actually 6 rates that provide different estimates of unemployment. The U-3 is the middle estimate. In South Carolina, the U-1 rate was 5.6 percent and the U-6 rate was 18.4 percent, averaged between the third quarter of 2008 and the third quarter of 2009. The U-3 rate during this same time period was 10.6 percent. This tells us that the unemployment rate is exact, given a certain level of statistical accuracy based on specific criteria. The following statistical process looks at how the rate is developed, regardless of level.
Statistical Process: Four primary Steps
Step One: CPS
Step one is the CPS. This is a national survey, completed in each state, done on a monthly basis.
Like any statistical survey sample, we know there is truth and error in the data. The question is, what is the true value and what is error, or noise? In a survey we need to model (statistically) the difference, allowing us to calculate the accuracy of our results in a consistent fashion. In this case, it’s for states and special regions. The BLS LAUS program uses the monthly Census data in a signal-plus-noise (SNP) model – actually two models – which when combined, estimate the true labor force for divisions and states. (Page 37 pdf)
The SNP model estimates also incorporate historical CPS auxiliary data. The end result is seasonal-, trend- and irregularity-adjusted employment/unemployment characteristics at the national level. (Page 37 pdf)
Step Two: Monthly Benchmark
In the past, large adjustments in employment/unemployment data were required at year’s end to match the national CPS sample because state monthly totals were not summing to the national CPS totals. That process has now been modified. The monthly data is bench-marked, real time, in two ways. First, census division models are constructed and controlled to the national CPS level, and second, state models are controlled to their appropriate census division estimates. We now have a statistical model of labor force, employment and unemployment for the nation, census regions, states and other special geographies. (Page 38 pdf)
Summary: Steps One and Two
Clearly there is a fair amount of math within this process. However, in its simplest form, a survey is taken throughout the country by the Census Bureau for a number of different geographies each month. Larger regions are more accurate than smaller ones. Census regions total to the national CPS. The BLS then works with the CPS data to create state data that is controlled to the appropriate census region, providing consistency month to month with the national results. We now have an estimate of labor force, employment and unemployment at the state level that is consistent with the national CPS survey.
Keep in mind each step involves error. So it is important to remember that as good as this process is, variability is not completely eliminated. That is one reason that trend analysis is important when analyzing these data.
Step Three: Estimates for sub-State Labor Market Areas (LMA)
The third step estimates unemployment and employment for areas within a state, such as a metropolitan statistical areas (MSA), county or city (sub-state). These typically are data that the media reports. Up until now our estimates have been for states, census regions and the nation as a whole.
With state level controls, local unemployment estimates are derived from local unemployment insurance (UI) statistics, based on two covered employee building blocks: 1) those with benefits and 2) those with exhausted benefits. These data allow for estimates of those unemployed and expected to be unemployed. New entrants and re-entrants cannot be estimated using this process. Instead, those data are estimated from national data based on demographics.
Local employment is estimated using the Current Employment Statistics (CES) and Quarterly Census of Employment and Wages (QCEW), or covered workers. These place-of-work estimates need to be adjusted to place-of-residence. This is accomplished with decennial census data. Data for each labor market area is adjusted to sum to the state total, calculated above. Finally, estimates for parts of Local Market Areas (LMAs) are primarily computed using the number of claims versus local population. (Page 39 pdf)
Keep in mind that not all those in the labor force are estimated in this process. Primarily, two groups not covered are those in agriculture and “all other,” which includes self-employed workers.
Step Four: Year-End Benchmark Correction or Smoothing
Smoothing is a year-end process that collects and distributes any irregularities that are noted throughout the year that were not a part of the original series. Therefore, mid-year data, unlike final smoothed data from prior years, still needs to go through a smoothing process. Trend analysis, when comparing prior year data with current data, is recommended. This will reduce the risk of misinterpreting the variance between the two data sets as a result of computations alone. (Page 39 pdf)
Summary: Steps Three and Four
Generally step three uses local data to determine who is and who is not employed, but is still an estimate. Smoothing in step four is generally a clean-up process to make the data as robust as possible for future use.
Conclusion
The methodological sources I have provided are being updated from April 1997. The basic process (1997) is the same with the exception of the monthly benchmarking and year end smoothing, incorporated in 2010. One important note to this process is the results are only as good as the inputs. States that take their UI data collection seriously are more accurate and thus provide a better picture.
I want to thank the Southeast BLS Regional Analysis Team for the assistance in helping me understand and interpret the LAUS detailed statistical documentation.
Unemployment and Education
Unemployment hit 12.6 percent in South Carolina this past December. I believe that was predictable, so this should not be a surprise to anyone. My question is what does unemployment look like when viewed from a different perspective, such as college graduation rates? In particular, I am interested in technical schools, since they should provide new skills, within a relatively short time horizon, needed by employers to compete.
What I found was encouraging. I used Trident Technical School (TT) graduation rates for two-year programs and the Charleston MSA, as my unemployment geography. Integrated Postsecondary Education Data System (IPEDS) provides the data for TT, with a little help from TT’s graduate page, while the BLS provides data for unemployment. Note: I gave TT the benefit of the doubt since they included multiple degrees in their data…hmmm.
My hypothesis is as unemployment increases graduations lag but also increase. The recession started in December 07. With a lag of two years, one should note more Associate Degree graduates given all other variables being constant.
TT has graduated on average ONLY three percent of enrollees (two-year programs), based on 150 percent of time to complete degree! In comparison Clemson is almost 70 percent (four-year) with other technical schools across the country graduating closer to 20 percent. The two-year degree is a subgroup of TT’s enrollment population, since others attend part-time, less than one year, night school, continuing education, etc.
The link is to the spreadsheet graph (pdf). Up front, this is insufficient data, but it does clear the “interesting” hurdle and one that may justify continued research or Moore Data! I used JMP to compile a back of the envelope model (pdf). Note the big jump from 2008 to 2009, but the model provides some hope (R-squared below .50), for this relationship. As unemployment increases, graduations are delayed but also increase! Of course there are other factors not in this analysis which affect the outcome, but this is one place to start to think about the process. If this analysis holds any shred of truth, it would suggest people are NOT waiting for the “old job” but getting after it, regardless if there is a job available or not.
One key piece of information which would assist in answering this question is the “SK” claims collected at the state level. This is an unemployment claim where the job is not expected to return, i.e. textiles. In other words, the job hunter is expected to find NEW and different employment. This analysis would point in the direction of a high number of “SK” claims. Hopefully the folks at the state are doing this analysis which would assist in predicting both educational needs and unemployment claims and more importantly, payouts, since they may be doing it for a while.
Survey Monkey
One of the easiest methods to collect survey data is Survey Monkey (SM). What many do not realize is that this tool is a cost effective way to collect simple everyday samples. Who wants to go to lunch? Give us some feedback on the meeting? SM allows ten questions for free. It is surprising the amount of data (no relation to quality) one can capture in 10 questions.
I have also used SM for larger research projects. OK, yes there are issues with online accessibility. As an example in SC, 40 percent do not have a computer at home- so one needs to know the subject and audience to insure data is not unintentionally skewed- you all know the rules!
When this process is appropriate, I typically supplement the online survey with a phone call (reminder, especially when time is an issue) and spend measured time confirming emails and contacts. This process however saves a significant amount of time in the end, especially if it is a survey that is repeated time and time again. SM output quality is quite high- as good as YOUR process. SM provides a simple but effective interface to do what you need.

