Saturday, November 21, 2009

More obscure stuff: poverty, median income and Hispanic population in Marion County

Since I am going on vacation soon, I have to make a good long, confusing post about demographic data in an area that most people wouldn't find too interesting.

After doing the plots of poverty rates and median incomes in suburban Portland, I decided to do the same for all incorporated communities in Marion County. I did this so I could be complete. Marion County has 19 incorporated communities, which is a good number to plot. These communities range from rural to urban, and some of them are heavily Hispanic. One of the problems with these data points is some of these towns are much smaller. Salem, the capital of Oregon, has a population of over 100,000, but a half dozen of these towns are 1000 people or less. I could actually do charts based on population, but we will save those for later.Unlike the Portland-area suburb chart, this chart doesn't seem to have an obvious angle in it. It has a clear progression downwards, although with significant outliers. One of the differences between this chart and the Portland area one is that there are not really wealthy towns/suburbs in Marion County. There is only one town on this diagram that has over 50,000 a year in median income, or under 5% poverty. So perhaps this entire chart just resembles the "flat" section of the Portland-area one.

Marion County has a high percentage of Hispanic residents, who tend to cluster in certain communities. I wondered if these Hispanic residents would have a correlation with poverty:

And once again, we have a three-quarters diagram! There are high poverty towns with both a high percentage of Hispanics and a low percentage, and there are low poverty towns with low Hispanic population, but there are no low poverty towns with high Hispanic populations.
However, since there is very few towns even in the category of "low poverty rate", it hardly makes for convincing data. And, as also discussed, some of the towns listed here are only a few hundred people. So, so far: nothing conclusive.

Friday, November 20, 2009

If you aren't cheating, you aren't trying: the importance of Cherry Picking.

Cherry picking is the often-derided term for picking out a limited supply of points, and then trying to prove a point from them.
But Cherry picking isn't always a bad thing, as long as you remember that its main use is for DISPROOF, not PROOF.
If I pick out two points that have a counter-intuitive result, it means that the intuitive result can not be totally true!
And, to illustrate, an example:This diagram shows the connection between a city's size and the percentage of its population that is African-American. African-Americans typically do live in larger cities, but as this diagram shows, there are at least some exceptions to this rule. Fairbanks, Alaska, a town of around 30,000 people, has the same percentage of African-Americans as Los Angeles, a town 100 times its size. And a higher percentage than some much larger towns.
Now, of course if I put more data points into this, it would probably have a line closer to what we expect. But as long as Fairbanks is there, the plot will never be perfect!

Part II, after several days:

So I seem to have missed a few days! I hope you all didn't miss me too much!
I am also going to be gone for a few weeks on vacation, so there may not be DAILY SCATTERPLOTS. However, you can carry on some type of cult following, commenting endlessly on the intricacies of the material I have presented thus far.

Anyway, we took a look at Maryland and election trends. So, lets look at the same graphs for Colorado.
This doesn't have a very high correlation in any direction, but the shape is somewhat something (that is a technical term). Obama seems to be missing some of the middle ground here. Which is kind of the opposite of what we saw in the national diagram, where it was the states with the highest and lowest high school numbers that voted for McCain.
Luckily, the college scatterplot gives us the warm hug of having our common knowledge reinforced. I haven't quite figured out why Douglas County is the outlier that it is, but otherwise everything is where it should be. This map has somewhat of a 3-quarters shape: there are Obama has both low and high college counties, while McCain has mostly low-college counties. And, Douglas County. Which exists just to make my scatterplots more interesting. Another thing about these Colorado college numbers is that in other places, the college numbers can merely be ways to operationalize general cultural attitudes. But here, where many of those numbers are over 40% and some over 50%, those are an actual electoral block that can't be ignored.

And finally, and mostly for the sake of completion:

Much as with the national diagram of African-American population and election results, the Colorado diagram seems to have no correlation. One part of this is that Colorado doesn't really have a high percentage of African American voters, even in urbanized counties. But even if we were to ignore the counties at the bottom, there would be little pattern in this diagram.

However, Colorado does have ethnic minorities, mostly Hispanic or Native American. I think these counties are probably the basis of Obama's support in lower-education counties. It is part of Obama's 2008 success that he could capture counties like Costilla, a rural, heavily Hispanic county, as well as Pitkin County, home of Aspen, Colorado, which (I have read) has the 4th-highest income of any US county.

But then, I probably didn't need scatterplots to know that part!

Monday, November 16, 2009

Exhaustive exploration of election trends that we probably already know:

I think I've already made my explanations about statistics, politics and my overwhelming drive to make pretty pictures. Also, to do endless data entry. Seriously, looking through census data and then entering it into a spreadsheet is my idea of a fun time.

So the fruits of all of this is a look at three statistics in two states, and how the correlated with the outcome of the 2008 election. The two states are Maryland and Colorado, which are alike (and different) in several ways. Colorado and Maryland are both very well educated, but have pockets of rural areas that are less well educated. One of the major differences between them is that Maryland has a large amount of African-Americans, while Colorado is more ethnically homogeneous.

First, lets look at Maryland:
This is an interesting graph, (compounded with the fact that I didn't properly label it: that is Baltimore City, not Baltimore County). Unlike some of the Western states I looked at (such as Oregon), there is a trend line towards high school graduation rates and Obama's margin. Not a very strong trend line, and even weaker because of Baltimore City.

But much as with other states, and with the country as a whole, college rates seem to be a much better guide to election outcome. However, as with many other trends we have seen, there seems to be several things going on here.
Although previously I have plotted this same thing nationally, and found no correlation, in Maryland it seems to have a much bigger effect. However, much as with the above graph, I think I am looking at a combination of different things. The "True" trend line could go through Prince George and Baltimore City, with Montgomery County an outlier, or it could go through Montgomery, with the two on the top right outliers.

And to avoid hitting you with too much all at once... I will do Colorado tomorrow.

Sunday, November 15, 2009

More counter-intuitive findings: poverty versus median income

Another thing that I have noticed during data-glancing in the past is that income is not highly correlated with poverty rates. One of the problems with this is that there are many different ways to compute income. Using one criterion (income tax returns), the richest zip code in Oregon also has the 2nd-highest poverty rate.

Median household income is usually a pretty good meter. (In fact, maybe I should do a scatterplot of mean household income versus median household income...hmmm) For the same Oregon communities I have been reporting on lately (and which I should probably move on from), I did a plot of median income versus poverty rates, and:

Unfortunately, I can't lay out some mind-bending statement like "the richer the town, the more poor people). But we do see that as with most social science statistics, the results might not be as obvious as at first guess.
There are actually two different trends. Starting from the richer suburbs (although for the three richest, poverty increases as income goes up), until about halfway through the graph, the trend is obvious and sharp. And then, from the middle of the graph to the right edge, there is a large increase in poverty level among communities without much difference in median income. The differences in income between these cities are probably within the standard error, or a methodological error.
I could have included more data points in this diagram, and reached different conclusions. Portland has a number of "micro-suburbs", whereas these are (mostly) the major suburbs and surrounding communities.
So what to make of this odd curve?

Saturday, November 14, 2009

This is like Lord Voldemort, going further down the path of Scatterplotting than anyone has ever gone.

So, some questions were raised about whether the last diagram, if corrected for population density, might produce different results.


I went out and did some VOLDEMORT-GRADE mucking around to answer a question I already knew the answer to, which is no scatterplot in the social sciences ever gives you a clear answer.

What I did was to correct the population down to 3000 people per square mile, and then figure out what the SFDH rate would be if corrected. Of course, the correction is mostly a mathematical trick, but it does show a few things. Like, Portland has a really, really high SFDH rate compared to its high population density.


if you can pull meaning out of this, you are a victim of the apophenia juice cookie shield

Friday, November 13, 2009

I missed a day, but only to blow your mind more:

I missed a day, but it wasn't because I was lazy: it was because I was trying to avoid the same old, same old.

I came up with something to look at.

In research, one of the things that they teach you to do is "operationalize your variables". For example, if you wanted to know whether a community was "wealthy", you would have to operationalize that into...median income, mean income, median household income, mean household income, net worth per individual, net worse per household, etc.

But its also important to DEOPERATIONALIZE variables. Some questions come up often, and they are taken to "mean something", but what do they mean? For example, housing statistics can be used as a stand in for income or more broadly for (as we say on THE STREETS) "SES", Socio-Economic Status.

So I decided to take a look at one of those housing statistics, and compare it to a more immediate statistic. The statistic was "Single family detached homes", meaning homes, (owned or rented), unattached to another home and occupied by a single family. Picket fences and suburban smiles, so to speak.

Before I get to this, there are some methodological problems: I selected 23 communities around the Portland area, including Portland itself. I did not select every community, so there is a chance that with more data points, the trend would become more clear. Honestly, though, I think this captured most of what we need to know, and the fact that I left Cornelius out of my scatterplotting probably is not going to throw off my results much.
This plot shows is that there is a correlation between poverty level and single-family homes, but that it is so vague, with so many outliers, that the correlation probably has to do with something else.
Another interesting thing is that Portland is more "suburban" than some of its suburbs. Beaverton and Hillsboro, once fairly expensive suburbs, have smaller percentages of people living in single family homes. And even Lake Oswego, which has (somewhat unjustly) been painted as an ultra-rich town where the streets are paved with gold, has no significant differences in single-family home percentage than Portland.

Wednesday, November 11, 2009


So, that last post was about ASTRONOMY. The nine, I said NINE planets, scatterplotted AU vs. eccentricity.
And today:

AU vs. inclination!



Its almost 3 AM!
Its time to throw out a random scatterplot!
Can you guess what this is?
Does it have real data, or is this just a bunch of random dots?!?!

Monday, November 9, 2009

Education and poverty: a great blow-your-mind-paradox sunk by DATA!

So I used to like to surprise people by asking them what they thought the demographic correlation between education and poverty rate was.

Because, I would tell them, HAHA, that as education went up, so did poverty.

Like so many great "blow your mind" things...this one isn't true. But it still might be truer than most people would think.

I compared Oregon counties for high school and college attainment rates, and for poverty rates.

First, high school:
There is a general downward trend, with one very significant outlier: Benton County, home of Oregon State University. Like many diagrams, this has a missing quarter: low graduation, low poverty counties.
Second, college:
And here we see even less pattern, with something of a four-quarter look. Although The high-graduation, high-poverty quarter really only has two points: Benton, again, and Multnomah, Oregon's largest county. There are also at least some low-poverty, low-graduation counties. The scale here is also much different than the previous scale. Benton County has 20% more high school graduates (for its population) than Malheur, but it has something like 300% more college graduates (for its population) than Malheur. So the paradox, althoguh not as clear when I actually looked at the information in non-graphic form, is still there.

Sunday, November 8, 2009

Seaching for correlation in all the wrong places: African-Americans and post-graduate students.

I try not to turn this into a politics blog, because my main point is to look at PRETTY PICTURES. But nothing occurs in a vacuum, and politics actually makes some pretty pictures.

One of the obvious bases of support in Obama's victory was African-Americans, who tended to back Obama by very large margins. So, following this, it might be assumed that states with large percentage of black voters would be strong Obama states.And here we find a graph that doesn't even pretend that there is such a thing as correlation. There are four quarters we could turn this graph into, and each one of the quarters would be filled. Wyoming, Vermont, Maryland, and Mississippi: four states with different outcomes and different demographics. The strongest McCain states were also the most African American. And Vermont, with no appreciable black voters, was a very strong state for Obama. Hawaii is an outlier for two reasons: it is Obama's home state, and it has a high number of people who aren't classified as either "white" or "black".
(I could make another graph looking at "white" people, and see if the added Hispanic and Asian population in a few states makes this graph make much more sense...but I don't think it would. Also, there is that tricky bit where "white" and "Hispanic" can be overlapping".)

So, one stereotype was shot down, at least on the statistical level. So how about another stereotype, that Obama is supported by the Latte-sipping, Prius-driving overeducated coastal types?
And after delivering that last shock to people's sensibilities, I have safely established the strength of stereotypes. The connection between Obama's margin and people with advanced degrees is the strongest correlation I have found for the election so far. It has a characteristic three-quarters approach: there are many Obama states that have low numbers of graduate students, but there are no McCain states that have a large number.

One thing about both of these charts it that neither African-Americans nor people with advanced degrees make up a very large part of the electorate. However, as with many things in statistics, I consider them to be a way to "operationalize" underlying social trends.

But more on that...later. After all this serious writing, maybe my next post will be about POMELOS.

High school growrth, 1990-2007

After yesterday's scatterplots of college and graduate school growth, I thought that for the sake of completeness, I should look at the same figures for high school. I was assuming I would have a pretty similar scatterplot.

And I was mostly right, although the correlation is less defined here, and there is a significant group of outliers. Also, much as with the college and graduate school graphs, this is a pretty good repudiation of the "Saturday Night Live syndrome" about US education---Americans are more educated than they were in 1990. (Although, of course, someone can always "prove" via an e-Mail forward that students in the 1950s all learned calculus and Latin in 8th grade, so our educational system was stronger then).
The rate of high school graduation increase varied from 4% in Alaska, to 24% in Kentucky. Which would seem to be bad news for Alaska, besides that Kentucky's numbers are still below what Alaska's were in 1990. The greatest growth was in the southern and Appalachian states that were the furthest behind, while the slowest growth was in states where the rates were already the highest. There just aren't many people left in Utah or Alaska that could get diplomas that don't have them already. The other slow-increase states are the states in the lower-right of the diagram: all four states that border Mexico, and Nevada. I imagine this is the result of Hispanic immigration, since recent Mexican-American immigrants tend to have low graduation rates.

Friday, November 6, 2009

Graduate school is the new Bachelors: 1990 to 2007

Before we start today's post, I have discovered that The Formula that Shall Not Be Named, along with not working well in general, doesn't work well in specific in openoffice, since it seems to only want to give me the ABSOLUTE VALUE. This came up when I was doing a bit of work on South Carolina, but that is going to be like Queen Beruthiel's cats for a while.

So, instead, we will look at two diagrams that both don't need any formula to be clear. The both deal with education, and the fact that (at least from my subjective viewpoint), bachelor's and graduate degrees are the new high school diploma and bachelor's degrees, respectively. (And while that sentence might be confusing, the situation is even more so.)

But, is the change across the country, or are all these overeducated people just a New England and Pacific Northwest thing?
As we can see, Bachelor's degrees seem to have increased fairly uniformly across all regions of the country, with about the same rate of increase, and with no significant outliers. This is one of the strongest correlations I have found to date.

So how about the more expensive and exclusive graduate degree? Is this, so to speak, not playing in Arkansas?

And it looks like I forgot to label my graduate school chart. Not that it matters: there are, once again, no outliers. Massachusetts is in the top right though! So it looks like the growth in graduate school is also pretty uniform, across the states.

Thursday, November 5, 2009

Pomelos and a happy life: no, seriously

So as a joke between myself and Qousqous (or maybe it wasn't a joke!), I decided to plot production of Pomelos and the human development index in the world's ten leading Pomelo producing nations.

Well, I bet you can figure out the conclusion yourself.
Tomorrow: maybe something relevant.

Wednesday, November 4, 2009

Doctors and dentists: a return to my sneaky ways

So I got side tracked about a week ago, after I did the initial doctors versus dentists post.

What I decided to look at here is which has more correlation with life expectancy: doctors per capita or dentists per capita.
Here we have doctors, and as we can see, we have a three-quarters diagram. All of the states with low life expectancies have few doctors, and there are states with high life expectancies and few doctors, and there are are states with high life expectancies and many doctors. There are not, thankfully enough, many states with many doctors and low life expectancies. But if we look at states with a life expectancy over 76, it seems that more doctors doesn't do much good. If were to be foolish enough to try to find a causation in here, we could say that at the 76 marks, adding more doctors is the point of diminishing returns.
Our dentistry and life expectancy does give us more correlation. For one thing, it shows the correlation between me being tired and being sloppy while making a diagram, which is very strong. Secondly, it shows that there is a much clearer link between dentists and life expectancy than there is for doctors. Although even the dentists are not that clearcut.

One thing to remember is that the doctors that are currently in a state and the people who are currently dying in a state are not that closely related. If someone was born in South Dakota 80 years ago and is currently dying in Florida, the doctors now in Florida don't really have much to do with however many decades of life that man was living elsewhere. Of course, this should be obvious.

I think the source of this correlation is elsewhere though, although I will leave my guesses for another day.

Tuesday, November 3, 2009

Montana: high school versus college. Fascinating to about three dozen people in the world, none of whom are reading this.

So mostly because I felt like doing lots of data entry after an invigorating bike ride, I decided to enter the names, high school rates and college rates of all of Montana's 56 counties into a spreadsheet, and see what I would come up with:
According to the Formula that Shall not Be Named,there is almost the exact same numerical correlation as in the diagram of Oregon counties. However, visually the diagrams are quite different. The Montana diagram looks like two separate charts. Up until about the 85% mark, there doesn't seem to be much correlation between high school and college. And then after 85, the line is pretty clear and pretty obvious.
A few things about Montana geography have to be explained here. Much of like in Oregon, the counties in the upper right have a good percentage of the population. the exceptions are Gallatin and Beaverhead, which both have colleges. This diagram gives me proof that Gallatin, which is a lot like Oregon's Benton County, actually is Montana's version of said.

So based on this, and the national and Oregon data, what do you think the correlations between the 2008 election and high school and college rates is?

Monday, November 2, 2009

In which I finally discover some real correlation

After looking at the data on high school and college a few weeks ago, and finding not much significant correlation, I decided to look at graduate school numbers. Because, as all my poor and confused and unemployed hipster friends know, graduate school is the new college.One of the things I have discovered many times since beginning this blog is that correlation is much less than what intuition would tell us. And this is a good example of that: the correlation between having a high school diploma and having an advanced degree is close to nothing. The only thing about this diagram that is expected is that many of the expected outliers show up in the expected places.

Now, lets look at the correlation between Bacherlor's Degrees and advanced degrees!

And finally, I find a strong correlation! The strongest one I have found yet in any of my scatterplots. Not only is the overall correlation clear, there are no significant outliers, at all.

Taken together, these two diagrams tell some type of story, and a curious one, at that. High school and advanced degrees are not related, while bachelor's and advanced degrees are very strongly related. What does this all mean?

Sunday, November 1, 2009

Like yesterday, but with the G20

I suspected yesterday's plot was less than successful because the EU, as a group, is...quite a group. Homogeneity is a double edged sword in doing comparisons!
So I decided to do the same plot, but with the G20 countries, instead of the EU (some of which are the same countries). Besides, I only had 17 data points, since I couldn't find data for Saudi Arabia and Indonesia, and one of the 20 is the EU as a whole.
So, after that bit of introduction:

And again, we find almost nothing. There are some countries with high suicide but low homicide (South Korea, Japan), some countries with lots of both (Russia, South Africa), and one country with low suicide but high homicide (Brazil), and then everyone else. There does not seem to be any particular pattern to this data.

Maybe my next post should be something I should be SURE to find a pattern in. Hmmm...