Boeing’s Effect on Employment Distribution – Shannon-Weaver Diversity Index

November 1, 2010 · Posted in local industry · Comment 

I was interested in the effect the Boeing employment (aircraft manufacturing) would have on the Charleston county industry diversity. So I used the Shannon-Weaver Diversity Index Calculation, a measure of the extent to which the employment of a region is distributed among its industries.  It ranges from 0 – perfect inequality, or no diversity – to 1 –perfect equality or diversity. This index is deployed in IMPLAN® Version 3.0, which describes the distribution of employment in any given study area.

A geographic region with a high level of diversity will typically be more robust, and adapt more quickly to changes in the broader economy. Geographic regions with a lower diversity index are more vulnerable to economic change. For example, a region dependent on one major employer such as a military base, paper or textile mill. A shift in the economy that affects the dominant industry can result in either windfall employment gains or catastrophic losses.

Charleston County’s  Shannon-Weaver Diversity Index is .68624 without Boeing employment. The index with Boeing, an estimated 4,000 employment increase, is .68836 – or a jump of about 0.03 percent.  With the addition of Boeing  jobs, Charleston County increases it industry diversity.  If we do the same analysis with food and drinking places in Charleston County, except in this case decrease jobs by 10,000, the index moves from .68624 to .69438, or a 1.2 percent gain. That also reflects a more diverse industry base.

This brief analysis shows that Charleston County is under-represented in aircraft manufacturing and over-represented in food and drinking places, given our industry mix. Economic development opportunities lie on the fringes of dominant industries that are looking to increase their supplier base or demand base, while emerging industries take advantage of established local industries that can service new ventures.

Biological Field Example Calculation

South Carolina Lazy? I don’t think so!

August 6, 2010 · Posted in statistics · Comment 

Lazy:  When Noise Interferes with the Signal

Recently the Post and Courier ran an article highlighting a Business Week analysis that said South Carolina was the eighth laziest state in the union! Typically, subjective words used to describe data pop a red flag that warns me of impending data misuse doom.

The Data Set

The American Time Use Survey (ATUS), measures the time people spend doing various activities such as work, childcare, housework, watching television, volunteering and socializing. Hence this is an activity survey, not a lazy survey.  The data are collected by the Census Bureau and sponsored by the Bureau of Labor Statistics (BLS). I ran a query to understand the nature of the survey, data availability and error rates.  I called in the big guns from Global Pragmatica LLC to assist in converting the data from a ASCIDAT file to my JMP statistical software package format. These folks are experts in scripting and were a huge help. Thank you!

These data are collected regionally but analyzed nationally.  There is about a 90-percent chance, or level of confidence, that an estimate based on a sample will differ by no more than 1.6 standard errors from the “true” population value because of sampling error.   No estimates are made for state level data, and one University of Minnesota analyst stated she was not aware of state level error estimates.

It is inappropriate to analyze these data at the state level without calculating the error inherent in the data. If you did that, the analysis would be interesting but useless when comparing one state to another. Why?

Sports Activity Variable Analysis

For a test sample, I choose state level geography,with sports as a variable activity. This category captures the respondent’s participation in sports, exercise and recreational activities. To extract the data from the system, I used a tool created by the University of Minnesota called the American Time Use Survey -X.  The data needs to be processed by a statistical package, in this case my JMP program. An analysis of people participating in sports activities indicates that South Carolina would  rank 22nd out of 50 states  in terms of average minutes spent participating in sports in a 24 hour period – not bad. However, upon further inspection of South Carolina’s 2009 detailed weighted data, the state could rank  anywhere from 12th to 23rd,based on national error rates! (PDF) Unfortunately, since these are state data, the results are meaningless. That’s because the sample is simply too small, which is one of many buried statistical problems. This 2009 sample included a total of 200 people, where 166 recorded zero sports activity minutes. (PDF) In fact, the median is zero, which is another red flag for this data set.  A review of other states’ data revealed the same issue. This is a fascinating national data set. But unfortunately, analysis of non-national geographies yields unreliable results.

LAUS Unemployment Calculation Method and Documentation

May 17, 2010 · Posted in method · Comment 

Unemployment Method Description

Each month the Bureau of Labor Statistics (BLS) publishes national, state and local unemployment statistics. The results are reported in the local media, usually with a brief analysis along with a human interest story. Unfortunately, the story often does not match the data. One reason is the users are not familiar with the strict definition of unemployment as defined by the BLS.  I would encourage anyone who has doubts about that definition to review it first before jumping into this detailed post of unemployment calculation. See definition of unemployment.

Statistics: Root of the Published Results

Calculating unemployment is a statistical process. You could stop here, but I encourage you to keep reading since this post gives the sources of those calculations and breaks it all down into bit size (non-math) pieces. We will give a brief explanation of each part of the process with source documents, where available, and links, if necessary, to key terms.

Why is the unemployment calculation process so complex? There are two primary reasons: 1) timing and 2) cost.  The series is published every month for a number of labor market regions. Wouldn’t it be great if we could go out and actually count the number of people who are employed or unemployed, and just for fun, determine how many people are in the labor force every month?  This process would be labeled a census. In this country, that is done once every 10 years.

Even if we could compile and report the results each month, imagine the expense involved. The next best process then, is to survey (estimate) the population and estimate the number who fall into each category, along with some general demographic information.  The process starts with a monthly employment survey, administered by the Census Bureau, named the Current Population Survey or CPS. The data from this survey are used by BLS in statistical models to calculate unemployment rates.

Background

Let’s keep in mind the unemployment rate published by general media is the U-3 rate.  There are actually 6 rates that provide different estimates of unemployment. The U-3 is the middle estimate. In South Carolina, the U-1 rate was 5.6 percent and the U-6 rate was 18.4 percent, averaged between the third quarter of 2008 and the third quarter of 2009.  The U-3 rate during this same time period was 10.6 percent. This tells us that the unemployment rate is exact, given a certain level of statistical accuracy based on specific criteria.  The following statistical process looks at how the rate is developed, regardless of level.

Statistical Process: Four primary Steps

Step One: CPS

Step one is the CPS.  This is a national survey, completed in each state, done on a monthly basis.

Like any statistical survey sample, we know there is truth and error in the data. The question is, what is the true value and what is error, or noise?  In a survey we need to model (statistically) the difference, allowing us to calculate the accuracy of our results in a consistent fashion. In this case, it’s for states and special regions.  The BLS LAUS program uses the monthly Census data in a signal-plus-noise (SNP) model – actually two models – which when combined, estimate the true labor force for divisions and states.  (Page 37 pdf)

The SNP model estimates also incorporate historical CPS auxiliary data.  The end result is seasonal-, trend- and irregularity-adjusted employment/unemployment characteristics at the national level. (Page 37 pdf)

Step Two: Monthly Benchmark

In the past, large adjustments in employment/unemployment data were required at year’s end to match the national CPS sample because state monthly totals were not summing to the national CPS totals.  That process has now been modified. The monthly data is bench-marked, real time, in two ways.  First, census division models are constructed and controlled to the national CPS level, and second, state models are controlled to their appropriate census division estimates.  We now have a statistical model of labor force, employment and unemployment for the nation, census regions, states and other special geographies.  (Page 38 pdf)

Summary:  Steps One and Two

Clearly there is a fair amount of math within this process. However, in its simplest form, a survey is taken throughout the country by the Census Bureau for a number of different geographies each month. Larger regions are more accurate than smaller ones. Census regions total to the national CPS. The BLS then works with the CPS data to create state data that is controlled to the appropriate census region, providing consistency month to month with the national results.  We now have an estimate of labor force, employment and unemployment at the state level that is consistent with the national CPS survey.

Keep in mind each step involves error. So it is important to remember that as good as this process is, variability is not completely eliminated.  That is one reason that trend analysis is important when analyzing these data.

Step Three: Estimates for sub-State Labor Market Areas (LMA)

The third step estimates unemployment and employment for areas within a state, such as a metropolitan statistical areas (MSA), county or city (sub-state). These typically are data that the media reports. Up until now our estimates have been for states, census regions and the nation as a whole.

With state level controls, local unemployment estimates are derived from local unemployment insurance (UI) statistics, based on two covered employee building blocks: 1) those with benefits and 2) those with exhausted benefits. These data allow for estimates of those unemployed and expected to be unemployed. New entrants and re-entrants cannot be estimated using this process. Instead, those data are estimated from national data based on demographics.

Local employment is estimated using the Current Employment Statistics (CES) and Quarterly Census of Employment and Wages (QCEW), or covered workers. These place-of-work estimates need to be adjusted to place-of-residence. This is accomplished with decennial census data.  Data for each labor market area is adjusted to sum to the state total, calculated above. Finally, estimates for parts of Local Market Areas (LMAs) are primarily computed using the number of claims versus local population. (Page 39 pdf)

Keep in mind that not all those in the labor force are estimated in this process. Primarily, two groups not covered are those in agriculture and “all other,” which includes self-employed workers.

Step Four: Year-End Benchmark Correction or Smoothing

Smoothing is a year-end process that collects and distributes any irregularities that are noted throughout the year that were not a part of the original series. Therefore, mid-year data, unlike final smoothed data from prior years, still needs to go through a smoothing process. Trend analysis, when comparing prior year data with current data, is recommended. This will reduce the risk of misinterpreting the variance between the two data sets as a result of computations alone. (Page 39 pdf)

Summary: Steps Three and Four

Generally step three uses local data to determine who is and who is not employed, but is still an estimate. Smoothing in step four is generally a clean-up process to make the data as robust as possible for future use.

Conclusion

The methodological sources I have provided are being updated from April 1997. The basic process (1997) is the same with the exception of the monthly benchmarking and year end smoothing, incorporated in 2010. One important note to this process is the results are only as good as the inputs. States that take their UI data collection seriously are more accurate and thus provide a better picture.

I want to thank the Southeast BLS Regional Analysis Team for the assistance in helping me understand and interpret the LAUS detailed statistical documentation.

Statistics and Helmets: Linking Data to Emotion

June 4, 2009 · Posted in statistics · Comment 

Recently I had coffee with my past professor and friend Chris Nachtsheim, (PhD. Operations Research) of the University of Minnesota. Chris and I often discuss how the world looks through the eyes of a statistician. Most events that we see or experience can be described numerically. Knowing this is important, since one can remove a degree of speculation and emotion out of everyday “crazy” events we witness, by locating available statistical data to help interpret what we experience.   One data group, particularly useful to me, pertains to helmets use. The statistics on helmet use is significant, thus providing quite a variety of statistical method and data-my real interest.

I often see children (just like me when I was a kid) not using a helmet.  In the past, I would jump to the conclusion of an inevitable brain injury for one that does not wear a helmet.   The statistics however, do not support that emotional speculation. Having said that, this is one of those areas where the statistics show how a $10 dollar helmet produces significant positive results when an accident does happen.  The old saying is “the natural state of any two wheeled vehicle is on it side”! All my helmets unfortunately, have indicated that that in fact is true.

I have included a link here to the Bicycle Helmet Safety Institute (BHSI). These folks have done a great job (on all sides of the issue) of gathering a number of third party statistics that are interesting and informative both from a research and safety perspective.

Citadel – Health Science Disabilities

January 14, 2009 · Posted in presentations · Comment 

During the past few years I have worked with Dr. Dena Garner at the Citadel speaking to her masters level students about physical disabilities.  I myself am physically disabled as a result of spinal cancer a number of years ago.  As a data analyst, I have shared with the class some of the sites I review to research statistics on persons with disabilities.  I thought it would be helpful to post those sites in one place so those interested can easily find them.  I have created that Disability Statistics List for anyone wanting to research further. See below.

websites-disabilities

« Previous Page