Gini Coefficient
Issue
The Gini Coefficient, developed by the Italian statistician Corrado Gini, is the most commonly used measure of inequality. The coefficient varies between 0, which reflects complete equality and 1, which indicates complete inequality (one person has all the income or consumption, all others have none). (The World Bank) We wanted to use this method to look at income distribution throughout South Carolina, but first we had to understand the formula.
At first glance, there is a fair amount of math needed to calculate the coefficient. Make no mistake, this is and can be a very complex formula, utilizing probability sampling, bootstrapping, confidence intervals and other statistical methodology. We however, tried to keep it applied, and therefore used the most basic variation:
![]()
After sorting out the symbolism, we created a sample problem (PDF). The sample problem allowed us to work through the math in a structured process. The value of ”doing the math” is that one gains an understanding as to how different variables affect the formula. The PDF contains two versions of the sample problem, one showing the formula and the other with plugged numbers. Note how unlike most of the available examples, we show a calculation needed prior to using the formula. In this case (dollars strata) TIMES (number of persons). That’s because the analyst may need to do a number of calculations prior to applying the formula.
The Formula: Results
We applied the formula to the classic income distribution (wealth share) problem, using Census, Household and Family Income Report B19001, for each county in South Carolina. These data have 16 income strata. We found the formula is particularly sensitive to changes in the top two strata, not necessarily the number of persons, but average dollar value. In other words, ”the tail wags the dog” in this formula. The other critical piece of information needed is what value to assign the highest strata. The census uses approximately $400,000 as an approximation for the average top strata dollar figure. They calculate this number using volumes of data, so it’s good enough for me.
After making our calculations, the formula really did reveal a number of interesting trends. One, the impact of the economy on higher wage earners – in the case of these data – is very delayed. In other words, higher income households continued to make money well into the latest recession. The other revealing attribute is the affect of a rising tide. A rising tide does in fact lift boats, but some higher than others and in the process it also sinks a few! In this case, households with higher incomes grew at a proportionally higher rate than those with lower incomes, and in some counties, household income (high and low) was hit particularly hard.
Conclusions
Now that you understand the formula, if you use these data, the Census Bureau has already done the Gini Coefficient income calculations for you! Yes, to my surprise the the Bureau has been doing this calculation since the 1990s. The file is B19083. It may sound like I have given you a shortcut but now you have to figure out the new GUI American Community Survey interface. Good Luck!
Acknowledgement: Thank you to the staff at the US Census Bureau for assisting me in understanding key drivers of the Gini Coefficient.
South Carolina Lazy? I don’t think so!
Lazy: When Noise Interferes with the Signal
Recently the Post and Courier ran an article highlighting a Business Week analysis that said South Carolina was the eighth laziest state in the union! Typically, subjective words used to describe data pop a red flag that warns me of impending data misuse doom.
The Data Set
The American Time Use Survey (ATUS), measures the time people spend doing various activities such as work, childcare, housework, watching television, volunteering and socializing. Hence this is an activity survey, not a lazy survey. The data are collected by the Census Bureau and sponsored by the Bureau of Labor Statistics (BLS). I ran a query to understand the nature of the survey, data availability and error rates. I called in the big guns from Global Pragmatica LLC to assist in converting the data from a ASCIDAT file to my JMP statistical software package format. These folks are experts in scripting and were a huge help. Thank you!
These data are collected regionally but analyzed nationally. There is about a 90-percent chance, or level of confidence, that an estimate based on a sample will differ by no more than 1.6 standard errors from the “true” population value because of sampling error. No estimates are made for state level data, and one University of Minnesota analyst stated she was not aware of state level error estimates.
It is inappropriate to analyze these data at the state level without calculating the error inherent in the data. If you did that, the analysis would be interesting but useless when comparing one state to another. Why?
Sports Activity Variable Analysis
For a test sample, I choose state level geography,with sports as a variable activity. This category captures the respondent’s participation in sports, exercise and recreational activities. To extract the data from the system, I used a tool created by the University of Minnesota called the American Time Use Survey -X. The data needs to be processed by a statistical package, in this case my JMP program. An analysis of people participating in sports activities indicates that South Carolina would rank 22nd out of 50 states in terms of average minutes spent participating in sports in a 24 hour period – not bad. However, upon further inspection of South Carolina’s 2009 detailed weighted data, the state could rank anywhere from 12th to 23rd,based on national error rates! (PDF) Unfortunately, since these are state data, the results are meaningless. That’s because the sample is simply too small, which is one of many buried statistical problems. This 2009 sample included a total of 200 people, where 166 recorded zero sports activity minutes. (PDF) In fact, the median is zero, which is another red flag for this data set. A review of other states’ data revealed the same issue. This is a fascinating national data set. But unfortunately, analysis of non-national geographies yields unreliable results.
Unemployment and Migration
One issue we note in unemployment levels is the relationship of employment and unemployment to migration and population change. I took the liberty to compare population change by county over the last eight years using the 2000 Decennial Census and the 2008 American Community Survey (ACS). Unfortunately, I have a data conflict since I am using two different sources. Early ACS data (2000 to 2003) provides data for a select group of counties in the state, while naturally the Decennial Census is done only once over a 10 year period. However, reviwing these data together revealed some startling results.
…
I analyzed population c
hange from 2000 through 2008 and compared that percentage change with December 2009 unemployment data. As one might imagine the numbers are all over the map (map to be included at a later date), literally, but there are general themes which float to the surface.
If you live in an expanding county, one that has added population from 2000, it is more likely that you have a job. Counties with a population change of over 10 percent, had the lowest median unemployment rate, 10.5 percent, while counties which expereinced a decrease in population had a median rate of 16.9 percent. Counties with insignificant change, had a median rate of 13.9 percent, while small counties experienced a 16.2 percent rate.
The December 2009 unemployment ranges between 8.8 to 21.4 percent. Population change ranges between minus 6.8 to plus 27.3 percent. This represents significant variation among counties and suggests a mismatch of population to available work. Can South Carolina match work to where workers live. This may be extremely difficult as a result competitiveness in transportation, technology and training. This is not to say that a rural workforce is less skilled, but instead has less access to opportunity.
Boeing is an excellent example of this phenomenon. Boeing is locating in a growing Metropolitan Statistical Area (MSA), supported by state of the art technology, a world class transportation infrastructure, and a primary education system which can adapt to the companies needs. In order to capture one of these opportunities, a rural workforce, in all probability, will need to move or commute.
The number of persons who make this tough decision, in some cases leaving family, property, and heritage, may hold in their hands the future of South Carolina’s unemployment rate.

