The Advent of the Algorithm
The Post and Courier recently printed an unusual column discussing “Deep Reading.” It is an excellent article by Laura Casey discussing a lost art.
“Deep Reading, or slow reading, is a sophisticated process in which people can critically think, reflect and understand the words they are looking at. With most this means slowing down – even stopping and rereading a page or paragraph if it doesn’t sink in…”
Deep reading seems out of sync with today’s “modern” communication tools, like Twitter, and it is. But what does this have to do with the algorithm? As a data guy, the algorithm is my life, and the only way to understand what makes it tick is deep reading! My choice for this subject is The Advent of the Algorithm, by David Berlinski.
We experience the algorithm everywhere in our daily lives – in the operation of the toaster, the dentist’s office, our vehicles and in most, if not all, of the communication tools we use daily. I reviewed definitions of the algorithm online, but none have the elegance of Berlinski, in his book, The Advent of the Algorithm:
“In the logician’s voice:
an algorithm is a finite procedure,
written in a fixed symbolic vocabulary,
governed by precise instructions,
moving in discrete steps, 1,2,3,…,
whose execution requires no insight, cleverness,
intuition, intelligence, or perspicuity,
and that sooner or later come to an end.”
Where did this idea come from? It turns out the algorithm we all know is the brain child of four people: Kurt Gödel, Alonzo Church, Alan M.Turning and Emil Post. Each contributed, in part, to the concept of the modern day algorithm, including functions, calculus of conversion and machines capable of manipulating symbols (computers).
Berlinski describes, in depth, the development of the algorithm concept. In one example, the Euler algorithm, he makes clear how important it is – for an analyst, anyway – to know, not just understand, the magic and flaws behind the algorithm. A case in point is the Numerical Solution For Ordinary Differential Equation, page 245.
“From a mathematical point of view, the original differential equation, contingent as it was upon the concept of the limit, has been replaced by a difference equation, one in which the derivative is approximated by a difference quotient, involving no limits whatsoever.
The Euler algorithm demonstrates this method:
BEGIN Euler
Input xΟ, yΟ, xf, h
x: = xΟ
y: = yΟ
WHILE (x<xf) DO
y: = y+ h* f(x,y)
x: = x+ h
OUTPUT x,y
ENDDO
END Euler
This simple algorithm, however, provides critical insight into the weakness of the algorithm:
“…the difference between an analytic and algorithmic solution to an ordinary differential equation is sharp and it is inescapable. An analytic solution completely penetrates the future or the past; an algorithmic solution acts only over a finite interval of time and space. The analytic solution returns a differential equation to a continuous world; an algorithmic solution, to a world that is discrete.”
Now I understand why the brakes failed in my Toyota.
Note: some of the reviews of the book were not very flattering, but this is DEEP READING with the good stuff starting on page 205. Do you think you have what it takes to DEEP READ? Have at it!
Unemployment Definition (BLS)
Unemployment Data
Unemployment numbers are one of the few data sets that are reported and analyzed in the media. Unfortunately, most of the current media analysis is flawed because writers don’t understand the definition of unemployment as reported by the Bureau of Labor Statistics (BLS). Here is a link to that definition(pdf). This post is to help you understand the basic definition of unemployment.
Key Points for Everyday Analysis:
Civilian Labor Force (labor force): These are the people who are counted, age 16 and older. It does not include folks in institutions such as prisons, nursing homes, military, etc.
Employed: This term applies to anyone did any work on the 12th of each month as paid employee at a farm or business, 15 hours or more in a family business, or had job but was on vacation, sick, absent due to bad weather, etc. Even those holding more than one job are counted only once.
Unemployed: People who weren’t employed on the 12th, but were available to work and were looking for work over the past four weeks.
Unemployment Rate Calculation: The ratio of unemployed to civilian labor force, expressed as percent.
Analysis Discussion:
What happens in the labor force makes a difference in the unemployment rate – specifically when people enter and exit. As an example, if more people enter the labor force than can find a job, the unemployment rate goes up.
Always consider the three classifications in the calculation; labor force, employment and unemployment. Focus on trends, not individual points. Compare trends, not points, from one year to the next. Think about what happens in the labor force during the year, such as a big layoff, teachers being hired in the fall, hurricanes. Review the Current Employment Statistics (CES) establishment survey data for clues to employment changes by industry.
Don’t confuse the neighborhood unemployment rate with the official BLS unemployment rate. True, if your neighbor is unemployed her unemployment rate is 100 percent, but this number has no correlation to the official unemployment rate.
Think of the unemployment rate as a tide. Thus, a drop of water tells you very little. Only by standing back and looking at the coastline can you discern the effects of water level change. You may not like the BLS definition, but the trend it produces is powerful.
How is unemployment calculated? See Unemployment Calculation Method and Documentation
LAUS Unemployment Calculation Method and Documentation
Unemployment Method Description
Each month the Bureau of Labor Statistics (BLS) publishes national, state and local unemployment statistics. The results are reported in the local media, usually with a brief analysis along with a human interest story. Unfortunately, the story often does not match the data. One reason is the users are not familiar with the strict definition of unemployment as defined by the BLS. I would encourage anyone who has doubts about that definition to review it first before jumping into this detailed post of unemployment calculation. See definition of unemployment.
Statistics: Root of the Published Results
Calculating unemployment is a statistical process. You could stop here, but I encourage you to keep reading since this post gives the sources of those calculations and breaks it all down into bit size (non-math) pieces. We will give a brief explanation of each part of the process with source documents, where available, and links, if necessary, to key terms.
Why is the unemployment calculation process so complex? There are two primary reasons: 1) timing and 2) cost. The series is published every month for a number of labor market regions. Wouldn’t it be great if we could go out and actually count the number of people who are employed or unemployed, and just for fun, determine how many people are in the labor force every month? This process would be labeled a census. In this country, that is done once every 10 years.
Even if we could compile and report the results each month, imagine the expense involved. The next best process then, is to survey (estimate) the population and estimate the number who fall into each category, along with some general demographic information. The process starts with a monthly employment survey, administered by the Census Bureau, named the Current Population Survey or CPS. The data from this survey are used by BLS in statistical models to calculate unemployment rates.
Background
Let’s keep in mind the unemployment rate published by general media is the U-3 rate. There are actually 6 rates that provide different estimates of unemployment. The U-3 is the middle estimate. In South Carolina, the U-1 rate was 5.6 percent and the U-6 rate was 18.4 percent, averaged between the third quarter of 2008 and the third quarter of 2009. The U-3 rate during this same time period was 10.6 percent. This tells us that the unemployment rate is exact, given a certain level of statistical accuracy based on specific criteria. The following statistical process looks at how the rate is developed, regardless of level.
Statistical Process: Four primary Steps
Step One: CPS
Step one is the CPS. This is a national survey, completed in each state, done on a monthly basis.
Like any statistical survey sample, we know there is truth and error in the data. The question is, what is the true value and what is error, or noise? In a survey we need to model (statistically) the difference, allowing us to calculate the accuracy of our results in a consistent fashion. In this case, it’s for states and special regions. The BLS LAUS program uses the monthly Census data in a signal-plus-noise (SNP) model – actually two models – which when combined, estimate the true labor force for divisions and states. (Page 37 pdf)
The SNP model estimates also incorporate historical CPS auxiliary data. The end result is seasonal-, trend- and irregularity-adjusted employment/unemployment characteristics at the national level. (Page 37 pdf)
Step Two: Monthly Benchmark
In the past, large adjustments in employment/unemployment data were required at year’s end to match the national CPS sample because state monthly totals were not summing to the national CPS totals. That process has now been modified. The monthly data is bench-marked, real time, in two ways. First, census division models are constructed and controlled to the national CPS level, and second, state models are controlled to their appropriate census division estimates. We now have a statistical model of labor force, employment and unemployment for the nation, census regions, states and other special geographies. (Page 38 pdf)
Summary: Steps One and Two
Clearly there is a fair amount of math within this process. However, in its simplest form, a survey is taken throughout the country by the Census Bureau for a number of different geographies each month. Larger regions are more accurate than smaller ones. Census regions total to the national CPS. The BLS then works with the CPS data to create state data that is controlled to the appropriate census region, providing consistency month to month with the national results. We now have an estimate of labor force, employment and unemployment at the state level that is consistent with the national CPS survey.
Keep in mind each step involves error. So it is important to remember that as good as this process is, variability is not completely eliminated. That is one reason that trend analysis is important when analyzing these data.
Step Three: Estimates for sub-State Labor Market Areas (LMA)
The third step estimates unemployment and employment for areas within a state, such as a metropolitan statistical areas (MSA), county or city (sub-state). These typically are data that the media reports. Up until now our estimates have been for states, census regions and the nation as a whole.
With state level controls, local unemployment estimates are derived from local unemployment insurance (UI) statistics, based on two covered employee building blocks: 1) those with benefits and 2) those with exhausted benefits. These data allow for estimates of those unemployed and expected to be unemployed. New entrants and re-entrants cannot be estimated using this process. Instead, those data are estimated from national data based on demographics.
Local employment is estimated using the Current Employment Statistics (CES) and Quarterly Census of Employment and Wages (QCEW), or covered workers. These place-of-work estimates need to be adjusted to place-of-residence. This is accomplished with decennial census data. Data for each labor market area is adjusted to sum to the state total, calculated above. Finally, estimates for parts of Local Market Areas (LMAs) are primarily computed using the number of claims versus local population. (Page 39 pdf)
Keep in mind that not all those in the labor force are estimated in this process. Primarily, two groups not covered are those in agriculture and “all other,” which includes self-employed workers.
Step Four: Year-End Benchmark Correction or Smoothing
Smoothing is a year-end process that collects and distributes any irregularities that are noted throughout the year that were not a part of the original series. Therefore, mid-year data, unlike final smoothed data from prior years, still needs to go through a smoothing process. Trend analysis, when comparing prior year data with current data, is recommended. This will reduce the risk of misinterpreting the variance between the two data sets as a result of computations alone. (Page 39 pdf)
Summary: Steps Three and Four
Generally step three uses local data to determine who is and who is not employed, but is still an estimate. Smoothing in step four is generally a clean-up process to make the data as robust as possible for future use.
Conclusion
The methodological sources I have provided are being updated from April 1997. The basic process (1997) is the same with the exception of the monthly benchmarking and year end smoothing, incorporated in 2010. One important note to this process is the results are only as good as the inputs. States that take their UI data collection seriously are more accurate and thus provide a better picture.
I want to thank the Southeast BLS Regional Analysis Team for the assistance in helping me understand and interpret the LAUS detailed statistical documentation.
Recession Dating
National Bureau of Economic Analysis NBER announced 12.01.2008 that the economy “officially” went into a recession in November of 2007. It is interesting that there was really no mention in the papers. This piece of data is, in fact, very important especially given the history of recessions. A quick evaluation of the past few recessions notes that the troughs were only 8 months from the peak. This recession is turning out to be quite a bit different. I believe we are nowhere near the bottom and may be in for a long period before recovery happens. This recession is already 13 months long with no significant signal of an end. There has been a lot of talk about the credit crisis, but the reality is that people are not credit worthy, thus not allowing them to spend (that is 2/3′s of our economy). The result thus plays havoc on the economy. In the past it appeas that consumers were using home equity as a source of borrowed income. With that source eliminated, as a result of the housing bubble, we find ourselves, as with the rest of the world, in a recession.

