The daily scatter plot

Saturday, November 21, 2009

More obscure stuff: poverty, median income and Hispanic population in Marion County

Since I am going on vacation soon, I have to make a good long, confusing post about demographic data in an area that most people wouldn't find too interesting.

After doing the plots of poverty rates and median incomes in suburban Portland, I decided to do the same for all incorporated communities in Marion County. I did this so I could be complete. Marion County has 19 incorporated communities, which is a good number to plot. These communities range from rural to urban, and some of them are heavily Hispanic. One of the problems with these data points is some of these towns are much smaller. Salem, the capital of Oregon, has a population of over 100,000, but a half dozen of these towns are 1000 people or less. I could actually do charts based on population, but we will save those for later.

Unlike the Portland-area suburb chart, this chart doesn't seem to have an obvious angle in it. It has a clear progression downwards, although with significant outliers. One of the differences between this chart and the Portland area one is that there are not really wealthy towns/suburbs in Marion County. There is only one town on this diagram that has over 50,000 a year in median income, or under 5% poverty. So perhaps this entire chart just resembles the "flat" section of the Portland-area one.

Marion County has a high percentage of Hispanic residents, who tend to cluster in certain communities. I wondered if these Hispanic residents would have a correlation with poverty:

And once again, we have a three-quarters diagram! There are high poverty towns with both a high percentage of Hispanics and a low percentage, and there are low poverty towns with low Hispanic population, but there are no low poverty towns with high Hispanic populations.
However, since there is very few towns even in the category of "low poverty rate", it hardly makes for convincing data. And, as also discussed, some of the towns listed here are only a few hundred people. So, so far: nothing conclusive.

Friday, November 20, 2009

If you aren't cheating, you aren't trying: the importance of Cherry Picking.

Cherry picking is the often-derided term for picking out a limited supply of points, and then trying to prove a point from them.
But Cherry picking isn't always a bad thing, as long as you remember that its main use is for DISPROOF, not PROOF.
If I pick out two points that have a counter-intuitive result, it means that the intuitive result can not be totally true!
And, to illustrate, an example:

This diagram shows the connection between a city's size and the percentage of its population that is African-American. African-Americans typically do live in larger cities, but as this diagram shows, there are at least some exceptions to this rule. Fairbanks, Alaska, a town of around 30,000 people, has the same percentage of African-Americans as Los Angeles, a town 100 times its size. And a higher percentage than some much larger towns.
Now, of course if I put more data points into this, it would probably have a line closer to what we expect. But as long as Fairbanks is there, the plot will never be perfect!

Part II, after several days:

So I seem to have missed a few days! I hope you all didn't miss me too much!
I am also going to be gone for a few weeks on vacation, so there may not be DAILY SCATTERPLOTS. However, you can carry on some type of cult following, commenting endlessly on the intricacies of the material I have presented thus far.

Anyway, we took a look at Maryland and election trends. So, lets look at the same graphs for Colorado.

This doesn't have a very high correlation in any direction, but the shape is somewhat something (that is a technical term). Obama seems to be missing some of the middle ground here. Which is kind of the opposite of what we saw in the national diagram, where it was the states with the highest and lowest high school numbers that voted for McCain.

Luckily, the college scatterplot gives us the warm hug of having our common knowledge reinforced. I haven't quite figured out why Douglas County is the outlier that it is, but otherwise everything is where it should be. This map has somewhat of a 3-quarters shape: there are Obama has both low and high college counties, while McCain has mostly low-college counties. And, Douglas County. Which exists just to make my scatterplots more interesting. Another thing about these Colorado college numbers is that in other places, the college numbers can merely be ways to operationalize general cultural attitudes. But here, where many of those numbers are over 40% and some over 50%, those are an actual electoral block that can't be ignored.

And finally, and mostly for the sake of completion:

Much as with the national diagram of African-American population and election results, the Colorado diagram seems to have no correlation. One part of this is that Colorado doesn't really have a high percentage of African American voters, even in urbanized counties. But even if we were to ignore the counties at the bottom, there would be little pattern in this diagram.

However, Colorado does have ethnic minorities, mostly Hispanic or Native American. I think these counties are probably the basis of Obama's support in lower-education counties. It is part of Obama's 2008 success that he could capture counties like Costilla, a rural, heavily Hispanic county, as well as Pitkin County, home of Aspen, Colorado, which (I have read) has the 4th-highest income of any US county.

But then, I probably didn't need scatterplots to know that part!

Monday, November 16, 2009

Exhaustive exploration of election trends that we probably already know:

I think I've already made my explanations about statistics, politics and my overwhelming drive to make pretty pictures. Also, to do endless data entry. Seriously, looking through census data and then entering it into a spreadsheet is my idea of a fun time.

So the fruits of all of this is a look at three statistics in two states, and how the correlated with the outcome of the 2008 election. The two states are Maryland and Colorado, which are alike (and different) in several ways. Colorado and Maryland are both very well educated, but have pockets of rural areas that are less well educated. One of the major differences between them is that Maryland has a large amount of African-Americans, while Colorado is more ethnically homogeneous.

First, lets look at Maryland:

This is an interesting graph, (compounded with the fact that I didn't properly label it: that is Baltimore City, not Baltimore County). Unlike some of the Western states I looked at (such as Oregon), there is a trend line towards high school graduation rates and Obama's margin. Not a very strong trend line, and even weaker because of Baltimore City.

But much as with other states, and with the country as a whole, college rates seem to be a much better guide to election outcome. However, as with many other trends we have seen, there seems to be several things going on here.

Although previously I have plotted this same thing nationally, and found no correlation, in Maryland it seems to have a much bigger effect. However, much as with the above graph, I think I am looking at a combination of different things. The "True" trend line could go through Prince George and Baltimore City, with Montgomery County an outlier, or it could go through Montgomery, with the two on the top right outliers.

And to avoid hitting you with too much all at once... I will do Colorado tomorrow.

Sunday, November 15, 2009

More counter-intuitive findings: poverty versus median income

Another thing that I have noticed during data-glancing in the past is that income is not highly correlated with poverty rates. One of the problems with this is that there are many different ways to compute income. Using one criterion (income tax returns), the richest zip code in Oregon also has the 2nd-highest poverty rate.

Median household income is usually a pretty good meter. (In fact, maybe I should do a scatterplot of mean household income versus median household income...hmmm) For the same Oregon communities I have been reporting on lately (and which I should probably move on from), I did a plot of median income versus poverty rates, and:

Unfortunately, I can't lay out some mind-bending statement like "the richer the town, the more poor people). But we do see that as with most social science statistics, the results might not be as obvious as at first guess.
There are actually two different trends. Starting from the richer suburbs (although for the three richest, poverty increases as income goes up), until about halfway through the graph, the trend is obvious and sharp. And then, from the middle of the graph to the right edge, there is a large increase in poverty level among communities without much difference in median income. The differences in income between these cities are probably within the standard error, or a methodological error.
I could have included more data points in this diagram, and reached different conclusions. Portland has a number of "micro-suburbs", whereas these are (mostly) the major suburbs and surrounding communities.
So what to make of this odd curve?

Saturday, November 14, 2009

This is like Lord Voldemort, going further down the path of Scatterplotting than anyone has ever gone.

So, some questions were raised about whether the last diagram, if corrected for population density, might produce different results.

SO!

I went out and did some VOLDEMORT-GRADE mucking around to answer a question I already knew the answer to, which is no scatterplot in the social sciences ever gives you a clear answer.

What I did was to correct the population down to 3000 people per square mile, and then figure out what the SFDH rate would be if corrected. Of course, the correction is mostly a mathematical trick, but it does show a few things. Like, Portland has a really, really high SFDH rate compared to its high population density.

anyway,

if you can pull meaning out of this, you are a victim of the apophenia juice cookie shield

Friday, November 13, 2009

I missed a day, but only to blow your mind more:

I missed a day, but it wasn't because I was lazy: it was because I was trying to avoid the same old, same old.

I came up with something to look at.

In research, one of the things that they teach you to do is "operationalize your variables". For example, if you wanted to know whether a community was "wealthy", you would have to operationalize that into...median income, mean income, median household income, mean household income, net worth per individual, net worse per household, etc.

But its also important to DEOPERATIONALIZE variables. Some questions come up often, and they are taken to "mean something", but what do they mean? For example, housing statistics can be used as a stand in for income or more broadly for (as we say on THE STREETS) "SES", Socio-Economic Status.

So I decided to take a look at one of those housing statistics, and compare it to a more immediate statistic. The statistic was "Single family detached homes", meaning homes, (owned or rented), unattached to another home and occupied by a single family. Picket fences and suburban smiles, so to speak.

Before I get to this, there are some methodological problems: I selected 23 communities around the Portland area, including Portland itself. I did not select every community, so there is a chance that with more data points, the trend would become more clear. Honestly, though, I think this captured most of what we need to know, and the fact that I left Cornelius out of my scatterplotting probably is not going to throw off my results much.
This plot shows is that there is a correlation between poverty level and single-family homes, but that it is so vague, with so many outliers, that the correlation probably has to do with something else.
Another interesting thing is that Portland is more "suburban" than some of its suburbs. Beaverton and Hillsboro, once fairly expensive suburbs, have smaller percentages of people living in single family homes. And even Lake Oswego, which has (somewhat unjustly) been painted as an ultra-rich town where the streets are paved with gold, has no significant differences in single-family home percentage than Portland.