Saturday, October 31, 2009

Because I felt like not everything should be about the US:

Because I wanted to focus on somewhere outside the US, I decided to do a scatterplot of suicide and homicide in the EU.
I have heard a theory that suicide and homicide are inverse expressions of the same feeling, and that in different cultures, violence is taken out differently, as is considered appropriate. But what does the data say? The bad, bad data. This is the most fragmented data I have looked at so far (off of wikipedia tables), even ignoring for the fact that depending on the country, reporting of homicides AND suicides might be a little suspect. But enough explanation:
There doesn't seem to be any inverse correlation between the two, quite the contrary, with this data, violence against self and others is correlated, although much of that has to do with three significant outliers, the Baltic Republics. Without those, there wouldn't be much correlation at all.

Friday, October 30, 2009

Doctors and dentists, possible explanations:

I was actually planning to go somewhere specific with the doctors and dentists per capita idea, but I got distracted into wondering what was the reason for the discrepancy. I had a theory going that it could be related to age. After all, older people might receive more medical care, and thus need more doctors, while regular dentist visits seem to be something that happen to kids more often than adults.
So, I took median age data from the US Census (although, as you know bob, median age doesn't totally represent age distribution in a popualtion).

First, for doctors:
There is a pretty strong correlation here, which is even stronger than "the formula" would attest to, (which is why I like looking at the graphic of a scatterplot, rather than just teh equations). Like many scatterplots, there seems to be something of a crescent shape. There are young states without many doctors, old states without many doctors, and old states with lots of doctors...but no young states with many doctors. There could be several reasons for thus, including my original thesis, or it could be that states with high median populations tend to be affluent with smaller family sizes, are anything else you want to think of.

While we are wondering about that, lets look at dentists and median age.
And you can look and look, and you will discover almost NOTHING. Dentists per capita and median age seem to be unrelated. At least my points aren't all bunched up! However, there is still a relative effect, since dentists seem to be scattered around all ages and doctors are concentrated with older people. So that might explain a bit of why the doctor/dentist ratio is not linear.

Thursday, October 29, 2009

Doctors and dentists

My internet was off for a day, which is why you all had to go a day without a scatterplot. I hope you all survived!

This one is going to be quick, but it is a prelude to all sorts of tricking reasoning!
This scatterplot compares doctors and dentists per capita (or per 1000 capita, more accurately).

I was actually expecting this to be a pretty strong correlation, but as you can see:
The only thing that was expected is that some of the usual suspects are in the expected place. Massachusetts has the most doctors and dentists, while Mississippi has the least of both. New England and the Middle Atlantic are likewise where I expect them to be.
Alaska, Utah and Idaho are in a somewhat surprising position, although it seems to be somewhat consistent with my scatterplot of highschool vs. college rates, where they also ended up in the upper left.
Anyway, this particular plot is just part of a plot...which I will show more of in coming days.
Data from . The data seems fairly accurate.

Tuesday, October 27, 2009

Alcohol and Life Expectancy, Continued:

After yesterday's alcohol and life expectancy post, I was curious about my findings (which, to summarize, were that alcohol use had no correlation with life expectancy).

I could have looked at alcohol consumption versus death through violence or accidents. I could have also looked at PAST alcohol consumption rates, since alcohol that causes chronic diseases wouldn't be having much of an effect on current life expectancy.

But what I did do was break down the life expectancy figures by each three of the types of alcohol mentioned:

First, beer consumption:

This graph is even more inconclusive than the overall alcohol/life expectancy one. So, having little to say about that, lets move on to wine:

This was surprising, even though I guessed that wine would have a positive correlation with life expectancy. That the shape of the scatterplot was so defined did come as a bit of a surprise, as well as the fact that some of our traditional outliers were not showing in their usual places. Utah, which is far behind in other forms of alcohol consumption, is not so far behind in wine consumption. West Viginia and Mississippi seem to have it beat. Also, I wouldn't have guessed that Idaho drank so much wine: more than wine producing states such as California and Oregon, apparently.
I don't think wine is causative of long life (or at least, I don't think that is what this diagrams shows). I think that wine drinking is correlated with SES, and that is what this graph shows. Specifically, notice Vermont, Connecticut and Massachusetts in the upper right hand corner.

Lastly, we look at consumption of hard liquor:
Once again, we are back in amorphous blob territory. Most of the states were bunched so closely together that there was no reason to label them. The usual outliers are in the usual places, especially the pesky New Hampshire and Delaware (and did anyone guess why they are where they are?)

It should also be noted that wine consumption, even though it does show an actual correlation, is drowned out (so to speak) in the overall alcohol diagram because in almost all states, it makes up such a small amount of the alcohol consumed, ranging from 1/17th in West Virginia to 1/3rd in Idaho.

Monday, October 26, 2009

Now for something totally different: alcohol and life expectancy

I am sure that everyone is tired of reading about politics, so I have decided to do something different today: comparing alcohol consumption and life expectancy, across US states.
My alcohol consumption data came from

which is a pretty official site, although there are lots of questions about methodology when it comes to alcohol consumption! The data there breaks it down into beer, wine, liquor and then the total consumption, and I just used the last.

The life expectancy data came from

this table in businessweek, which refers to the Harvard Center on Public Health. It might not be the best data, but it passes my giggle test.

So, when we put them together, we get:Almost nothing!
There are significant outliers in all four directions, some of which make sense at first (why Utah and Nevada have low and high alcohol consumption rates) and some of which might be surprising at first (Delaware and New Hampshire are not, to my mind, such hard drinkin' states, but I only had to think about it for a minute to figure out why they are where they are).
Otherwise, there doesn't seem to be much pattern between alcohol consumption and life expectancy.
There are a lot of different things that could be done with this data, and I already have some ideas. There may be a pattern when slightly different data are looked at, and I may do that soon.

Sunday, October 25, 2009

Much as before: college, high school and politics

This entry is a second set of three graphs, with the same premise as yesterday's. But these graphs look at Oregon and its counties, instead of the US and its states. I do this because I am (more or less) an Oregonian, and Oregon also provides somewhat of a political and social cross section.

Also, bowing to reader pressure, I have decided to "label" my "axes". I guess that is what the cool kids do.

So, starting off, this is a scatterplot of Oregon's high school and college graduation rates:
Much like with the US as a whole, there is a vague trend, with significant outliers on either end, and then a big ball in the middle. Much like with the US on the whole, high high school completion seems to be a necessary, but not sufficient cause for high college attainment.
For those not familiar with Oregon geography, some of the outliers should be explained: Benton County, in the upper right, is a small county that is the home of Oregon State University, which is why it has high education rates. Malhuer and Gilliam counties are both small, rural counties (although separated geographically). Multnomah, Clackamas and Washington counties are the counties that make up Metro Portland, and they have about 1/3rd of Oregon's population.

Our next diagram shows Obama's margin and high school attainment rates:

I show the percentage of high school completion in comparion to the state average for clarity's sake...although it might actually do the opposite.
Much as in the US map as a whole, there is not a lot of pattern to this scatterplot. The state with the highest percentage of diploma holders voted for Obama, and the state with the lowest voted for McCain, but otherwise it is a pretty vague shape: Multnomah and Grant counties have about the same rate of high school graduation, but had a 100 point difference in their margins in the elections.

Our scatterplot of Obama's margin and college graduation rates takes us safely back into the conventional wisdom: there is a fairly obvious relationship between
college attainment and being politically liberal. Of the five counties above the 25% mark, (which is also about the average for Oregon on the whole), all five of them voted for Obama. McCain's biggest support seems to come from states that are at or below the 15% mark of college attainment. Much like with the US on the whole, while a fair amount of low-college counties went for Obama, there were no high-attainment counties that voted for McCain.
Also, again for those without knowledge of Oregon geography, the five counties in the top-right quadrant of the graph make up close to half of Oregon's population, which makes the strength of Obama in high-college areas even more significant.

Saturday, October 24, 2009

High school & college attainment, and political leanings.

Our first set of scatterplots shows the relationship between high school and college education attainment, and how both of these do (or do not) correlate with political preferences, as measured by Obama's margin in the 2008 election.

The first scatterplot shows US states, by high school and college graduation rates. Intuitively, these are two variables that could assumed to be correlated. However, the trend is fairly weak: there are a cluster of states with low attainment in both high school and college (most of which are located in the South or Appalachia), but otherwise the trend isn't very strong. If you you look at the "USA" point, all of the states to the right and down of it are states with above average high school graduation rates, but below average rates of college attainment. Likewise, some states have the opposite pattern: California and New York being two of the most important. Also, notice at the very right of the diagram, Alaska and Wyoming have the two highest high school graduation rates.
Which brings us to our second point. There has been, at some points, conventional wisdom that Democratic candidates are more succesful with a better educated electorate. But the presence of Alaska and Wyoming over on the very right of the diagram makes one wonder if this correlation holds up for high school graduation rates.
This chart, which compares high school graduation rate (as measured in difference from the national average, a somewhat confusing trick I used to make the graph look better) to Obama's margin in the 2008 election, does not provide any obvious evidence that states with high high school graduation rates are more politically liberal. In fact, almost the opposite: the upperleft hand corner of the diagram shows that Utah, Alaska and Wyoming, three states with very high high school attainment are also some of the most conservative. Strangely enough, McCain's support seems to come in two clusters: mountain and prairie states with high high school attainment, and a group of southern and Appalachian states with low high school attainment. And then Oklahoma in the middle. Obama states seem to run the gamut.

This diagram seems to return us to our conventional wisdom: the states with the highest levels of college attainment also were the biggest Obama supporters. However, as with any real world data, this information is not always the stereotype engine it could be. Utah, Kansas and Hawaii all have very close levels of college attainment, and yet have very different politics. Another thing of interest is that there seems to be a number of low-college states that supported Obama, but the inverse is not true: there are not many high-college attainment states that supported McCain.

There are many conclusions and guesses to be made from this data, but I will leave that to the reader to determine. I should also point out that there are many caveats about trying to operationalize educational data, since graduation rates across states may not always mean the same thing.

The data for these scatterplots was taken from, and I tried to be accurate, but there may have been artithmatic or data entry errors. There are many other caveats I could make, but according to my cat I have to go to bed.