Thursday, December 31, 2009

Oregon, unemployment, and college

Once I get on a run, I tend to run with it.
Especially if it is something that I have data on, and can just cut and paste to different documents.
In the nation, there isn't much relation between college and unemployment, so lets look at the same data in Oregon.
(BTW, for those of you who don't know, I am an Oregonian. Mostly. Also, Oregon gives a good cross section of demographics.)Overall, there isn't a lot of direction in this diagram, although the four urban counties with most of the educated people and where much of the work goes on, are all grouped together in the middle.
So, again, another inconclusive result.

Oh, and also: 2010 is going to happen soon. It already did in London.
2010 means a number of things, including a CENSUS. However, the census results won't be out for two years, or so.

Monday, December 28, 2009

The same thing, with Oregon:

So, to make up for my terrible downturn in daily updating, and because I got interested,

I decided to look at the last plot I did, but for Oregon and its counties. Luckily, the census has growth figures by county, and the Department of Labor has a handy tool for looking at county level data

So the same graph for Oregon gives us:
There is a little bit more shape to this, but admittedly, not much. And again, although its not very clear, there is the same 5/6th pattern as we had last time: the only area where we don't see any points is high-growth, low-unemployment. Hood River and Benton Counties come close, though!
Another problem with this, as I have said for my Oregon diagrams before, is that bit all Oregon counties are equal in population. Especially noticeable in this, with Harney County, population 8,000, just sitting down there in the corner. I am tempted to do this graph with, for example, only the 10 or 15 most populous counties, and see what results I get from that.

Unemployment versus growth: once again, the easy explanation isn't the best one.

One piece of obvious conventional wisdom that I had been carrying around was that unemployment was higher in areas with high growth: that areas with high unemployment were areas with large influxes of population, and they therefore had high frictional unemployment, or had "oversold" themselves to potential workers. It seems like a good argument, and there are certainly a few data points to support it.
But you, my astute readers, know about "seems like a good argument" and "a few data points to support it". The actual scatter plot of the data is, as could be expected, scattered.
And, as is also often the case, there is a "three quarters" effect in here, although not a distinct one. Actually, It would be more a "five-sextet" effect. If we divide unemployment into high, medium and low, and growth into high and low...all of the five sextiles are occupied, except for "high-growth, low-unemployment". Utah, Texas and Colorado have high growth and medium unemployment, but there is nothing to the left of them.
Of course, in a normal economy, this graph might look different. So lets hope that we have a normal economy so I can find out! Also, I will have a job, and might not have time to scatterplot.
Incidentally, I found out this data several weeks ago, and just didn't bother to make a graph and post it. I actually have lots of stuff like that that I am sitting on!

Saturday, December 26, 2009

Content unrelated:

Now I am breaking down the sanctity of my scatterplot blog, just so I can post this image, and link from it for elsewhere.

But really, isn't it relevant?
Because aren't the data we mine out of the earth, the earth being random census documents, a secret to everyone? Like the money that rebellion-minded spear throwing dogs give to us?

Oregon is not Ohio. Neither is Washington.

After finding out that Ohio does indeed have a long pattern of following the nation's political trends, I decided to look at the same data for Oregon.
One thing to remember is that the electorate has been realigned many times since 1860 (which is as far back as I am going with these). Prohibition? Steel tarriffs? Vietnam? Female suffrage? The political and social and demographic issues that divide people have changed quite a bit over the years.
Which only makes it more important when a state does match up so closely with the nation. Whatever the political or social issue that divided the nation...Ohio somehow managed to feel pretty much the same way about it, since 1860.
I can't quite pin down a pattern to Oregon, though. Since 1980, it has been consistently more Democratic than the nation. Before that, it seemed to jump around, in a way that my knowledge of Oregon's demographics don't quite explain.
To wit:
Although, even with the fact that Oregon lines up less than Ohio does, there are still no major surprises here. While there are clusters of dots in the upper left and lower right quadrants, which represent not voting with the country, those dots are also pretty close to the origin, meaning that even though Oregon swung the other way (giggles), it didn't do so by a lot.
Along with that, I wanted to look at two states that would correlate with each other: Oregon and Washington. As expected,Oregon and Washington, going back to 1892, correlate pretty well. The major differences I think come from times when Washington was becoming industrialized, unionized and ethnicized before Oregon was, which gave it different demographics.

Thursday, December 24, 2009

Way to go Ohio...

One of the often quoted maxims of US politics is that the candidate that wins Ohio wins the nation. Of course, using modern technology, we can look at that a bit closer.
As can be seen, those numbers pretty much add up. They add up not just with the true/false test, but with the amount of the vote, as well. The exceptions are in a few landslide years, when Ohio doesn't always swing quite as wildly as the nation. But on the whole, it is true.
(This also explains the one exception to the rule: 1960, when Nixon won Ohio, but Kennedy won the nation. Nixon's victory in Ohio was small, as was Kennedy's victory overall).
According to the formula that shall not be named, there is quite a bit of correlation, and I bet that Ohio would indeed have a higher correlation than another other state, besides maybe Missouri.

I didn't need to snort all that GABA, anyway

I was e-Mailing with Dr. Stephen Wu, one of the authors of the "happiest states" reports, and I am happy to report that most of the spin put on the reports is due to the media, not to his research, which is much more modest in its claims.

Tuesday, December 22, 2009

Employment and hgihschool: once again, conventional wisdom reinstated

After finding out about the odd diamond shape that links college and unemployment, I decided to look at the same data and high school.

This gives us much more the shape we were thinking! States with lots of high school graduates have low unemployment! Unless they are Oregon and Michigan, of course.

Sunday, December 20, 2009

Education and unemployment:

Without doing the math for it (and you know I hate doing math), I would think the factorial combinations just of the things I have done so far would last me a good and long time. Election margins and U-3! Election margins and U-6! Wine consumption and election margins! Beer consumption and U-6!
Drowning in ideas, here. Which is perhaps why I have been actually posting less...I want to find things that are the most interesting, not just random shapes. (although random shapes are also good)
Anyway, here is something we want to know about, and that is also pretty interesting, shape wise:The bad news is: demographically, higher education levels (at the college level), don't seem to lead to higher employment. But then, neither do they lead to lower unemployment. In fact, we have an odd diamond shape: the states with the highest and lowest unemployment have about medium-range of college completion, while the states with the highest and lowest college completion have...a medium range of unemployment. There is also, somewhat meaningfully, a good amount of geographic/demographic clustering here, especially if you consider Colorado a Mid-Atlantic state (which you really should).

Obviously, like with all of my graphs, this bears more looking into.

Friday, December 18, 2009

Even with a broken monitor and a sprained finger, I can't let that type of foolishness go

You might think I am joking, but the "Happiest States" list annoyed me so much that I had to go and rail a line of GABA before I could deal with it.

I don't know if the original authors are as dumb as the media reports of it, but something tells me that they were not as critical of their own research as they should be. As for the media...

One problem with lists like these is that they are ranked cardinally, by order. Which is how sports teams are ranked. But, unlike making the playoffs, being ranked statistically is not so clean cut. When I get my hands on the actual data, I will plot that too, but as it is, I am just plotting the rankings, which might be very deceptive. There is a good chance that the separation on this list is by a small amount of degrees. This also comes up with lifespan measurements: a country may be ranked 30 places behind another country because people live three months shorter.

Anyway! I plotted the numbers against suicide rate. Suicide rate, is, of course, a bad way to measure people's general happiness. Suicide numbers are thankfully low, so even in a place where 99.9 of the people are very happy, a small subset might be suicidal. However, since suicide also probably correlates pretty well with suicide attempts, and major and minor depression, any measure of happiness that didn't somehow relate to it might be flawed.

So, I worked up this diagram:
On this diagram, the further to the left a state is, the more happy it is. The further up, the higher the suicide rate is. With that in mind, something should be noted: first, there is not much general trend at all. Second, what trend there is is towards the "unhappy" states having the lowest suicide rates. In fact, New York, the unhappiest state, has the lowest suicide rate. A big outlier like that should probably be a pretty big hint that there is something wrong with the data.

Of course, there are not many people reading this blog, so I am sure that BIG FANCY EAST COAST LIBERAL STATES=UNHAPPY BUT GOOD RELIGIOUS TRADITIONAL SOUTHERNERS=HAPPY will embed itself in at least some people's popular wisdom.

Oh, and the suicide statistics came from here:

I certainly have been slacking off, haven't I?

I have to admit that I have been slacking off.
One thing is, after doing so much research, I had lots of scatterplots, but I wanted to make them SPECIAL. No reason to just throw stuff up here.
Also, I sprained my finger
And, my monitor is broken.
But I will be back. Oh yes, I will be back, mostly because of this:

Which annoys me no end. In 2009, people ACTUALLY believe this? My god.
Of course, things like this will get repeated endlessly. And. The original data isn't even available. And. About three people read my blog of scatterplots. But. I must continue to FIGHT.

Saturday, December 12, 2009

U-3 and U-6, about what you would expect

As you might know, the "unemployment" numbers that are usually published are only one of the ways that unemployment is measured. It is technically called "U3", and covers people who are conventionally considered "unemployed". However, there are other numbers, ranging from U1 (Which is people who have not worked a single hour for pay in the last x months) to U6 (people who are underemployed).
I wondered how the different rates would match up, and thanks to the labor department I was able to find out:
As could be expected, the U-3 and the U-6 rates are very close to each other. Which makes sense, since U-6 by definition includes U-3. Since this diagram doesn't immediately tell us a lot, I took the U3/U6 ratio and plotted it against U3.
This diagram has some minimal good news: as a vague, general trend, as U3 goes up, the relative increase in U6 goes down. If the trend was flat, or even upwards, the underemployment ratio in Michigan could be over one-quarter.
But the trend isn't super important: as it is, it seems that within the limits we usually have, U3 and U6 go up pretty much up in tandem, regardless of what the number is.

Wednesday, December 9, 2009

Slight detour, Part II: the same thing, but different

So after yesterday's look at the urban/rural divide and college education, I did the obvious and did the same scatterplot for high school.
That is kind of a messy scatterplot! And not just because I made it at 2 AM! Within the expected confines, there seems to be a lot of variation in these numbers, more so than there is with college graduation rates. There is also some regional clumping, as could also be expected.
But, perhaps due to the lateness of the hour...I am not coming up with any magic bullets for this data.

Monday, December 7, 2009

A slight detour: rural and urban education

I became so besotted with the ERS and their gigantic stream of data, that I had to go slightly off the topic of farms, to find out about educational statistics, as they pertain to urban and rural America.
Did you know? "urban" and "rural" are hard statistics to operationalize. Which the ERS admits, they have an entire complex county-coding system. I can say, having lived in Montana and Vermont, both states that are considered "rural", that the word can mean very different things in different places.
But with those caveats aside, lets look at our scatterplot:As expected, urban areas have a higher percentage of people with bachelor's degrees (except in Massachusettes, where the rural population is almost non-existent, probably being the population of one resort community or something).
Also, Wyoming has perfect parity. The differences vary, from Virginia, with 5 urban degree holders for every 2 rural, up to states where the difference is almost unnoticable.
As could be expected, there seems to be some regional differences. New England and a chunk of Mountain/Plains states (Montana, Wyoming, Idaho, South Dakota) seem to have the smallest gaps, while the biggest gaps seem to be in Appalachia.

Sunday, December 6, 2009

Part II- hobby farms versus hobby farmers

I was thinking of a different way to phrase the question of how and why people farm...and a way that would work with the data presented by the USDA ERS. As I was drifting off to sleep, it occurred to me that it was pretty easy to operationalize "hobby farms" and "hobby farmers" with the data presented. One of the statistics is farms underneath 10,000 a year in SALES (that is gross, not net. 10K a year gross isn't a lot of money) and another is farmers who, as the saying goes "have kept their day job". The numbers of farms that produce less than a living amount, and the number of people who have to seek their living elsewhere, should more or less add up.
But you have been reading for a while, so lets see what really happens...

The basic view is, as is usually the case, more or less correct, but the trend doesn't jump out at me. Also, many of the states add up to much more than 100, meaning that there are lots of people whose primary income comes from a farm making less than 10K a year gross. I guess that would make these people more subsistence farmers than hobby farmers. At least in some cases; although it is hard to tell from the data presented.
Arizona is especially curious: I at first assumed that it was probably due to some loophole in zoning or tax laws in that state, but I later realized it might have to do with Native American subsistence farmers. Or, the data could have been a typo! Who knows!
It is also interesting to note that in very few states are most farmers primarily farmers, and in most states, most of the farms don't make much money.
"Further research is needed"

Saturday, December 5, 2009

Welcome back! For Laurel- Farm size versus ownership

I was gone trotting around Portland for two weeks, which meant that I had to leave you all scatter plot-less.
I hope this wasn't too sad.
So, today, I present the first of a series of scatterplots, requested by Laurel, centered around farming and the like. Specifically, she asked me about the link between non-corporate farming and farm output. I think. More or less.
So, thank you to , I have been able to start digging into this question. There will be more digging!
The first thing I wanted to look at was farm size versus private ownership. I would think that in the states where agriculture is a big business, farms would be larger and less privately owned.
I was, it seems, wrong. Farm size seems to be a lot more related to population density than anything else. Also, farm ownership seems to be pretty uniformly in the range of 80-90% private, across the board. Of course, some of those might be very small hobby farms. A median farm size would be an interesting thing to know.
Anyway, since this research wasn't very conclusive, I will play with more of the numbers in the coming days.

Saturday, November 21, 2009

More obscure stuff: poverty, median income and Hispanic population in Marion County

Since I am going on vacation soon, I have to make a good long, confusing post about demographic data in an area that most people wouldn't find too interesting.

After doing the plots of poverty rates and median incomes in suburban Portland, I decided to do the same for all incorporated communities in Marion County. I did this so I could be complete. Marion County has 19 incorporated communities, which is a good number to plot. These communities range from rural to urban, and some of them are heavily Hispanic. One of the problems with these data points is some of these towns are much smaller. Salem, the capital of Oregon, has a population of over 100,000, but a half dozen of these towns are 1000 people or less. I could actually do charts based on population, but we will save those for later.Unlike the Portland-area suburb chart, this chart doesn't seem to have an obvious angle in it. It has a clear progression downwards, although with significant outliers. One of the differences between this chart and the Portland area one is that there are not really wealthy towns/suburbs in Marion County. There is only one town on this diagram that has over 50,000 a year in median income, or under 5% poverty. So perhaps this entire chart just resembles the "flat" section of the Portland-area one.

Marion County has a high percentage of Hispanic residents, who tend to cluster in certain communities. I wondered if these Hispanic residents would have a correlation with poverty:

And once again, we have a three-quarters diagram! There are high poverty towns with both a high percentage of Hispanics and a low percentage, and there are low poverty towns with low Hispanic population, but there are no low poverty towns with high Hispanic populations.
However, since there is very few towns even in the category of "low poverty rate", it hardly makes for convincing data. And, as also discussed, some of the towns listed here are only a few hundred people. So, so far: nothing conclusive.

Friday, November 20, 2009

If you aren't cheating, you aren't trying: the importance of Cherry Picking.

Cherry picking is the often-derided term for picking out a limited supply of points, and then trying to prove a point from them.
But Cherry picking isn't always a bad thing, as long as you remember that its main use is for DISPROOF, not PROOF.
If I pick out two points that have a counter-intuitive result, it means that the intuitive result can not be totally true!
And, to illustrate, an example:This diagram shows the connection between a city's size and the percentage of its population that is African-American. African-Americans typically do live in larger cities, but as this diagram shows, there are at least some exceptions to this rule. Fairbanks, Alaska, a town of around 30,000 people, has the same percentage of African-Americans as Los Angeles, a town 100 times its size. And a higher percentage than some much larger towns.
Now, of course if I put more data points into this, it would probably have a line closer to what we expect. But as long as Fairbanks is there, the plot will never be perfect!

Part II, after several days:

So I seem to have missed a few days! I hope you all didn't miss me too much!
I am also going to be gone for a few weeks on vacation, so there may not be DAILY SCATTERPLOTS. However, you can carry on some type of cult following, commenting endlessly on the intricacies of the material I have presented thus far.

Anyway, we took a look at Maryland and election trends. So, lets look at the same graphs for Colorado.
This doesn't have a very high correlation in any direction, but the shape is somewhat something (that is a technical term). Obama seems to be missing some of the middle ground here. Which is kind of the opposite of what we saw in the national diagram, where it was the states with the highest and lowest high school numbers that voted for McCain.
Luckily, the college scatterplot gives us the warm hug of having our common knowledge reinforced. I haven't quite figured out why Douglas County is the outlier that it is, but otherwise everything is where it should be. This map has somewhat of a 3-quarters shape: there are Obama has both low and high college counties, while McCain has mostly low-college counties. And, Douglas County. Which exists just to make my scatterplots more interesting. Another thing about these Colorado college numbers is that in other places, the college numbers can merely be ways to operationalize general cultural attitudes. But here, where many of those numbers are over 40% and some over 50%, those are an actual electoral block that can't be ignored.

And finally, and mostly for the sake of completion:

Much as with the national diagram of African-American population and election results, the Colorado diagram seems to have no correlation. One part of this is that Colorado doesn't really have a high percentage of African American voters, even in urbanized counties. But even if we were to ignore the counties at the bottom, there would be little pattern in this diagram.

However, Colorado does have ethnic minorities, mostly Hispanic or Native American. I think these counties are probably the basis of Obama's support in lower-education counties. It is part of Obama's 2008 success that he could capture counties like Costilla, a rural, heavily Hispanic county, as well as Pitkin County, home of Aspen, Colorado, which (I have read) has the 4th-highest income of any US county.

But then, I probably didn't need scatterplots to know that part!

Monday, November 16, 2009

Exhaustive exploration of election trends that we probably already know:

I think I've already made my explanations about statistics, politics and my overwhelming drive to make pretty pictures. Also, to do endless data entry. Seriously, looking through census data and then entering it into a spreadsheet is my idea of a fun time.

So the fruits of all of this is a look at three statistics in two states, and how the correlated with the outcome of the 2008 election. The two states are Maryland and Colorado, which are alike (and different) in several ways. Colorado and Maryland are both very well educated, but have pockets of rural areas that are less well educated. One of the major differences between them is that Maryland has a large amount of African-Americans, while Colorado is more ethnically homogeneous.

First, lets look at Maryland:
This is an interesting graph, (compounded with the fact that I didn't properly label it: that is Baltimore City, not Baltimore County). Unlike some of the Western states I looked at (such as Oregon), there is a trend line towards high school graduation rates and Obama's margin. Not a very strong trend line, and even weaker because of Baltimore City.

But much as with other states, and with the country as a whole, college rates seem to be a much better guide to election outcome. However, as with many other trends we have seen, there seems to be several things going on here.
Although previously I have plotted this same thing nationally, and found no correlation, in Maryland it seems to have a much bigger effect. However, much as with the above graph, I think I am looking at a combination of different things. The "True" trend line could go through Prince George and Baltimore City, with Montgomery County an outlier, or it could go through Montgomery, with the two on the top right outliers.

And to avoid hitting you with too much all at once... I will do Colorado tomorrow.

Sunday, November 15, 2009

More counter-intuitive findings: poverty versus median income

Another thing that I have noticed during data-glancing in the past is that income is not highly correlated with poverty rates. One of the problems with this is that there are many different ways to compute income. Using one criterion (income tax returns), the richest zip code in Oregon also has the 2nd-highest poverty rate.

Median household income is usually a pretty good meter. (In fact, maybe I should do a scatterplot of mean household income versus median household income...hmmm) For the same Oregon communities I have been reporting on lately (and which I should probably move on from), I did a plot of median income versus poverty rates, and:

Unfortunately, I can't lay out some mind-bending statement like "the richer the town, the more poor people). But we do see that as with most social science statistics, the results might not be as obvious as at first guess.
There are actually two different trends. Starting from the richer suburbs (although for the three richest, poverty increases as income goes up), until about halfway through the graph, the trend is obvious and sharp. And then, from the middle of the graph to the right edge, there is a large increase in poverty level among communities without much difference in median income. The differences in income between these cities are probably within the standard error, or a methodological error.
I could have included more data points in this diagram, and reached different conclusions. Portland has a number of "micro-suburbs", whereas these are (mostly) the major suburbs and surrounding communities.
So what to make of this odd curve?

Saturday, November 14, 2009

This is like Lord Voldemort, going further down the path of Scatterplotting than anyone has ever gone.

So, some questions were raised about whether the last diagram, if corrected for population density, might produce different results.


I went out and did some VOLDEMORT-GRADE mucking around to answer a question I already knew the answer to, which is no scatterplot in the social sciences ever gives you a clear answer.

What I did was to correct the population down to 3000 people per square mile, and then figure out what the SFDH rate would be if corrected. Of course, the correction is mostly a mathematical trick, but it does show a few things. Like, Portland has a really, really high SFDH rate compared to its high population density.


if you can pull meaning out of this, you are a victim of the apophenia juice cookie shield

Friday, November 13, 2009

I missed a day, but only to blow your mind more:

I missed a day, but it wasn't because I was lazy: it was because I was trying to avoid the same old, same old.

I came up with something to look at.

In research, one of the things that they teach you to do is "operationalize your variables". For example, if you wanted to know whether a community was "wealthy", you would have to operationalize that into...median income, mean income, median household income, mean household income, net worth per individual, net worse per household, etc.

But its also important to DEOPERATIONALIZE variables. Some questions come up often, and they are taken to "mean something", but what do they mean? For example, housing statistics can be used as a stand in for income or more broadly for (as we say on THE STREETS) "SES", Socio-Economic Status.

So I decided to take a look at one of those housing statistics, and compare it to a more immediate statistic. The statistic was "Single family detached homes", meaning homes, (owned or rented), unattached to another home and occupied by a single family. Picket fences and suburban smiles, so to speak.

Before I get to this, there are some methodological problems: I selected 23 communities around the Portland area, including Portland itself. I did not select every community, so there is a chance that with more data points, the trend would become more clear. Honestly, though, I think this captured most of what we need to know, and the fact that I left Cornelius out of my scatterplotting probably is not going to throw off my results much.
This plot shows is that there is a correlation between poverty level and single-family homes, but that it is so vague, with so many outliers, that the correlation probably has to do with something else.
Another interesting thing is that Portland is more "suburban" than some of its suburbs. Beaverton and Hillsboro, once fairly expensive suburbs, have smaller percentages of people living in single family homes. And even Lake Oswego, which has (somewhat unjustly) been painted as an ultra-rich town where the streets are paved with gold, has no significant differences in single-family home percentage than Portland.

Wednesday, November 11, 2009


So, that last post was about ASTRONOMY. The nine, I said NINE planets, scatterplotted AU vs. eccentricity.
And today:

AU vs. inclination!



Its almost 3 AM!
Its time to throw out a random scatterplot!
Can you guess what this is?
Does it have real data, or is this just a bunch of random dots?!?!

Monday, November 9, 2009

Education and poverty: a great blow-your-mind-paradox sunk by DATA!

So I used to like to surprise people by asking them what they thought the demographic correlation between education and poverty rate was.

Because, I would tell them, HAHA, that as education went up, so did poverty.

Like so many great "blow your mind" things...this one isn't true. But it still might be truer than most people would think.

I compared Oregon counties for high school and college attainment rates, and for poverty rates.

First, high school:
There is a general downward trend, with one very significant outlier: Benton County, home of Oregon State University. Like many diagrams, this has a missing quarter: low graduation, low poverty counties.
Second, college:
And here we see even less pattern, with something of a four-quarter look. Although The high-graduation, high-poverty quarter really only has two points: Benton, again, and Multnomah, Oregon's largest county. There are also at least some low-poverty, low-graduation counties. The scale here is also much different than the previous scale. Benton County has 20% more high school graduates (for its population) than Malheur, but it has something like 300% more college graduates (for its population) than Malheur. So the paradox, althoguh not as clear when I actually looked at the information in non-graphic form, is still there.

Sunday, November 8, 2009

Seaching for correlation in all the wrong places: African-Americans and post-graduate students.

I try not to turn this into a politics blog, because my main point is to look at PRETTY PICTURES. But nothing occurs in a vacuum, and politics actually makes some pretty pictures.

One of the obvious bases of support in Obama's victory was African-Americans, who tended to back Obama by very large margins. So, following this, it might be assumed that states with large percentage of black voters would be strong Obama states.And here we find a graph that doesn't even pretend that there is such a thing as correlation. There are four quarters we could turn this graph into, and each one of the quarters would be filled. Wyoming, Vermont, Maryland, and Mississippi: four states with different outcomes and different demographics. The strongest McCain states were also the most African American. And Vermont, with no appreciable black voters, was a very strong state for Obama. Hawaii is an outlier for two reasons: it is Obama's home state, and it has a high number of people who aren't classified as either "white" or "black".
(I could make another graph looking at "white" people, and see if the added Hispanic and Asian population in a few states makes this graph make much more sense...but I don't think it would. Also, there is that tricky bit where "white" and "Hispanic" can be overlapping".)

So, one stereotype was shot down, at least on the statistical level. So how about another stereotype, that Obama is supported by the Latte-sipping, Prius-driving overeducated coastal types?
And after delivering that last shock to people's sensibilities, I have safely established the strength of stereotypes. The connection between Obama's margin and people with advanced degrees is the strongest correlation I have found for the election so far. It has a characteristic three-quarters approach: there are many Obama states that have low numbers of graduate students, but there are no McCain states that have a large number.

One thing about both of these charts it that neither African-Americans nor people with advanced degrees make up a very large part of the electorate. However, as with many things in statistics, I consider them to be a way to "operationalize" underlying social trends.

But more on that...later. After all this serious writing, maybe my next post will be about POMELOS.

High school growrth, 1990-2007

After yesterday's scatterplots of college and graduate school growth, I thought that for the sake of completeness, I should look at the same figures for high school. I was assuming I would have a pretty similar scatterplot.

And I was mostly right, although the correlation is less defined here, and there is a significant group of outliers. Also, much as with the college and graduate school graphs, this is a pretty good repudiation of the "Saturday Night Live syndrome" about US education---Americans are more educated than they were in 1990. (Although, of course, someone can always "prove" via an e-Mail forward that students in the 1950s all learned calculus and Latin in 8th grade, so our educational system was stronger then).
The rate of high school graduation increase varied from 4% in Alaska, to 24% in Kentucky. Which would seem to be bad news for Alaska, besides that Kentucky's numbers are still below what Alaska's were in 1990. The greatest growth was in the southern and Appalachian states that were the furthest behind, while the slowest growth was in states where the rates were already the highest. There just aren't many people left in Utah or Alaska that could get diplomas that don't have them already. The other slow-increase states are the states in the lower-right of the diagram: all four states that border Mexico, and Nevada. I imagine this is the result of Hispanic immigration, since recent Mexican-American immigrants tend to have low graduation rates.

Friday, November 6, 2009

Graduate school is the new Bachelors: 1990 to 2007

Before we start today's post, I have discovered that The Formula that Shall Not Be Named, along with not working well in general, doesn't work well in specific in openoffice, since it seems to only want to give me the ABSOLUTE VALUE. This came up when I was doing a bit of work on South Carolina, but that is going to be like Queen Beruthiel's cats for a while.

So, instead, we will look at two diagrams that both don't need any formula to be clear. The both deal with education, and the fact that (at least from my subjective viewpoint), bachelor's and graduate degrees are the new high school diploma and bachelor's degrees, respectively. (And while that sentence might be confusing, the situation is even more so.)

But, is the change across the country, or are all these overeducated people just a New England and Pacific Northwest thing?
As we can see, Bachelor's degrees seem to have increased fairly uniformly across all regions of the country, with about the same rate of increase, and with no significant outliers. This is one of the strongest correlations I have found to date.

So how about the more expensive and exclusive graduate degree? Is this, so to speak, not playing in Arkansas?

And it looks like I forgot to label my graduate school chart. Not that it matters: there are, once again, no outliers. Massachusetts is in the top right though! So it looks like the growth in graduate school is also pretty uniform, across the states.

Thursday, November 5, 2009

Pomelos and a happy life: no, seriously

So as a joke between myself and Qousqous (or maybe it wasn't a joke!), I decided to plot production of Pomelos and the human development index in the world's ten leading Pomelo producing nations.

Well, I bet you can figure out the conclusion yourself.
Tomorrow: maybe something relevant.

Wednesday, November 4, 2009

Doctors and dentists: a return to my sneaky ways

So I got side tracked about a week ago, after I did the initial doctors versus dentists post.

What I decided to look at here is which has more correlation with life expectancy: doctors per capita or dentists per capita.
Here we have doctors, and as we can see, we have a three-quarters diagram. All of the states with low life expectancies have few doctors, and there are states with high life expectancies and few doctors, and there are are states with high life expectancies and many doctors. There are not, thankfully enough, many states with many doctors and low life expectancies. But if we look at states with a life expectancy over 76, it seems that more doctors doesn't do much good. If were to be foolish enough to try to find a causation in here, we could say that at the 76 marks, adding more doctors is the point of diminishing returns.
Our dentistry and life expectancy does give us more correlation. For one thing, it shows the correlation between me being tired and being sloppy while making a diagram, which is very strong. Secondly, it shows that there is a much clearer link between dentists and life expectancy than there is for doctors. Although even the dentists are not that clearcut.

One thing to remember is that the doctors that are currently in a state and the people who are currently dying in a state are not that closely related. If someone was born in South Dakota 80 years ago and is currently dying in Florida, the doctors now in Florida don't really have much to do with however many decades of life that man was living elsewhere. Of course, this should be obvious.

I think the source of this correlation is elsewhere though, although I will leave my guesses for another day.

Tuesday, November 3, 2009

Montana: high school versus college. Fascinating to about three dozen people in the world, none of whom are reading this.

So mostly because I felt like doing lots of data entry after an invigorating bike ride, I decided to enter the names, high school rates and college rates of all of Montana's 56 counties into a spreadsheet, and see what I would come up with:
According to the Formula that Shall not Be Named,there is almost the exact same numerical correlation as in the diagram of Oregon counties. However, visually the diagrams are quite different. The Montana diagram looks like two separate charts. Up until about the 85% mark, there doesn't seem to be much correlation between high school and college. And then after 85, the line is pretty clear and pretty obvious.
A few things about Montana geography have to be explained here. Much of like in Oregon, the counties in the upper right have a good percentage of the population. the exceptions are Gallatin and Beaverhead, which both have colleges. This diagram gives me proof that Gallatin, which is a lot like Oregon's Benton County, actually is Montana's version of said.

So based on this, and the national and Oregon data, what do you think the correlations between the 2008 election and high school and college rates is?

Monday, November 2, 2009

In which I finally discover some real correlation

After looking at the data on high school and college a few weeks ago, and finding not much significant correlation, I decided to look at graduate school numbers. Because, as all my poor and confused and unemployed hipster friends know, graduate school is the new college.One of the things I have discovered many times since beginning this blog is that correlation is much less than what intuition would tell us. And this is a good example of that: the correlation between having a high school diploma and having an advanced degree is close to nothing. The only thing about this diagram that is expected is that many of the expected outliers show up in the expected places.

Now, lets look at the correlation between Bacherlor's Degrees and advanced degrees!

And finally, I find a strong correlation! The strongest one I have found yet in any of my scatterplots. Not only is the overall correlation clear, there are no significant outliers, at all.

Taken together, these two diagrams tell some type of story, and a curious one, at that. High school and advanced degrees are not related, while bachelor's and advanced degrees are very strongly related. What does this all mean?

Sunday, November 1, 2009

Like yesterday, but with the G20

I suspected yesterday's plot was less than successful because the EU, as a group, is...quite a group. Homogeneity is a double edged sword in doing comparisons!
So I decided to do the same plot, but with the G20 countries, instead of the EU (some of which are the same countries). Besides, I only had 17 data points, since I couldn't find data for Saudi Arabia and Indonesia, and one of the 20 is the EU as a whole.
So, after that bit of introduction:

And again, we find almost nothing. There are some countries with high suicide but low homicide (South Korea, Japan), some countries with lots of both (Russia, South Africa), and one country with low suicide but high homicide (Brazil), and then everyone else. There does not seem to be any particular pattern to this data.

Maybe my next post should be something I should be SURE to find a pattern in. Hmmm...

Saturday, October 31, 2009

Because I felt like not everything should be about the US:

Because I wanted to focus on somewhere outside the US, I decided to do a scatterplot of suicide and homicide in the EU.
I have heard a theory that suicide and homicide are inverse expressions of the same feeling, and that in different cultures, violence is taken out differently, as is considered appropriate. But what does the data say? The bad, bad data. This is the most fragmented data I have looked at so far (off of wikipedia tables), even ignoring for the fact that depending on the country, reporting of homicides AND suicides might be a little suspect. But enough explanation:
There doesn't seem to be any inverse correlation between the two, quite the contrary, with this data, violence against self and others is correlated, although much of that has to do with three significant outliers, the Baltic Republics. Without those, there wouldn't be much correlation at all.

Friday, October 30, 2009

Doctors and dentists, possible explanations:

I was actually planning to go somewhere specific with the doctors and dentists per capita idea, but I got distracted into wondering what was the reason for the discrepancy. I had a theory going that it could be related to age. After all, older people might receive more medical care, and thus need more doctors, while regular dentist visits seem to be something that happen to kids more often than adults.
So, I took median age data from the US Census (although, as you know bob, median age doesn't totally represent age distribution in a popualtion).

First, for doctors:
There is a pretty strong correlation here, which is even stronger than "the formula" would attest to, (which is why I like looking at the graphic of a scatterplot, rather than just teh equations). Like many scatterplots, there seems to be something of a crescent shape. There are young states without many doctors, old states without many doctors, and old states with lots of doctors...but no young states with many doctors. There could be several reasons for thus, including my original thesis, or it could be that states with high median populations tend to be affluent with smaller family sizes, are anything else you want to think of.

While we are wondering about that, lets look at dentists and median age.
And you can look and look, and you will discover almost NOTHING. Dentists per capita and median age seem to be unrelated. At least my points aren't all bunched up! However, there is still a relative effect, since dentists seem to be scattered around all ages and doctors are concentrated with older people. So that might explain a bit of why the doctor/dentist ratio is not linear.

Thursday, October 29, 2009

Doctors and dentists

My internet was off for a day, which is why you all had to go a day without a scatterplot. I hope you all survived!

This one is going to be quick, but it is a prelude to all sorts of tricking reasoning!
This scatterplot compares doctors and dentists per capita (or per 1000 capita, more accurately).

I was actually expecting this to be a pretty strong correlation, but as you can see:
The only thing that was expected is that some of the usual suspects are in the expected place. Massachusetts has the most doctors and dentists, while Mississippi has the least of both. New England and the Middle Atlantic are likewise where I expect them to be.
Alaska, Utah and Idaho are in a somewhat surprising position, although it seems to be somewhat consistent with my scatterplot of highschool vs. college rates, where they also ended up in the upper left.
Anyway, this particular plot is just part of a plot...which I will show more of in coming days.
Data from . The data seems fairly accurate.

Tuesday, October 27, 2009

Alcohol and Life Expectancy, Continued:

After yesterday's alcohol and life expectancy post, I was curious about my findings (which, to summarize, were that alcohol use had no correlation with life expectancy).

I could have looked at alcohol consumption versus death through violence or accidents. I could have also looked at PAST alcohol consumption rates, since alcohol that causes chronic diseases wouldn't be having much of an effect on current life expectancy.

But what I did do was break down the life expectancy figures by each three of the types of alcohol mentioned:

First, beer consumption:

This graph is even more inconclusive than the overall alcohol/life expectancy one. So, having little to say about that, lets move on to wine:

This was surprising, even though I guessed that wine would have a positive correlation with life expectancy. That the shape of the scatterplot was so defined did come as a bit of a surprise, as well as the fact that some of our traditional outliers were not showing in their usual places. Utah, which is far behind in other forms of alcohol consumption, is not so far behind in wine consumption. West Viginia and Mississippi seem to have it beat. Also, I wouldn't have guessed that Idaho drank so much wine: more than wine producing states such as California and Oregon, apparently.
I don't think wine is causative of long life (or at least, I don't think that is what this diagrams shows). I think that wine drinking is correlated with SES, and that is what this graph shows. Specifically, notice Vermont, Connecticut and Massachusetts in the upper right hand corner.

Lastly, we look at consumption of hard liquor:
Once again, we are back in amorphous blob territory. Most of the states were bunched so closely together that there was no reason to label them. The usual outliers are in the usual places, especially the pesky New Hampshire and Delaware (and did anyone guess why they are where they are?)

It should also be noted that wine consumption, even though it does show an actual correlation, is drowned out (so to speak) in the overall alcohol diagram because in almost all states, it makes up such a small amount of the alcohol consumed, ranging from 1/17th in West Virginia to 1/3rd in Idaho.