This blog is becoming increasingly about two numbers: the high school graduation rate and college graduation rates of counties in the US. This could just be my peculiar obsession, but these two numbers together do tell a lot about a county.
It is harder to add them up across states, since "counties" have many different meanings in different states. In the eastern part of the US, counties are much smaller in area, and often much smaller in population.
After I did the Western states, I went through and entered the information for the New England/Middle Atlantic states. Which, for my purposes, are: Maine, Massachusetts, New Hampshire, Vermont, New York, Connecticut, New Jersey, Pennsylvania, Delaware, and Maryland. These states all have relative small numbers of counties, and they also share common demographics, which makes them a good set to compare amongst.
The first thing of interest is that this is not a crescent: the Northeast doesn't seem to have many areas that have high highschool rates and low college rates. It seems to have more of a traditional X=Y relationship. I am also wondering if perhaps I chose the best grouping of states: it seems that what does occur in the lower right might be based heavily around rural Pennsylvania.
Another obvious thing is that there is one major grouping of states, and then a bunch of points to the left. The points to the left make up some really significant outliers. They consist of Sussex County (Boston), three of New York City's burroughs, Philadelphia County and Baltimore City. As could be expected, big metropolitan areas pull in both college educated people, and non-high school educated people.
Also, notice the trio at the top right: two counties in Maryland, just outside of Washington, DC, and Thompkins County, home of Cornell University.
Sunday, February 28, 2010
Thursday, February 25, 2010
Hispanic education in California's counties:
As I mentioned in the New Mexico post, sometimes while scanning through data, I notice certain patterns, and then later on I get curious enough to do the data entry and see if they are true.
One pattern I have noticed is that counties with high Hispanic populations also have low high school graduation rates. So I decided to actually do the data and see if it proved true:The data lines up surprisingly well, which data very rarely does. There are different ways to analyze this data, and some of them are pretty politically and socially charged. Another is that often this is an artifact: heavily Hispanic counties tend to be big, economically active counties that attract workers of every stripe.
And while you are thinking about that, we can also think about this:Here, there is not a lot of real correlation. The biggest story here is that the cluster of counties that were at the bottom right are now at the bottom left. Much of California is like the rest of the Western states, where you have a lot of counties with high high school rates and low college rates. So again, as with so much, we have three quadrants filled up in the college diagram.
Anyway, while this is marinating, I am still working on my MASTER PLAN.
One pattern I have noticed is that counties with high Hispanic populations also have low high school graduation rates. So I decided to actually do the data and see if it proved true:The data lines up surprisingly well, which data very rarely does. There are different ways to analyze this data, and some of them are pretty politically and socially charged. Another is that often this is an artifact: heavily Hispanic counties tend to be big, economically active counties that attract workers of every stripe.
And while you are thinking about that, we can also think about this:Here, there is not a lot of real correlation. The biggest story here is that the cluster of counties that were at the bottom right are now at the bottom left. Much of California is like the rest of the Western states, where you have a lot of counties with high high school rates and low college rates. So again, as with so much, we have three quadrants filled up in the college diagram.
Anyway, while this is marinating, I am still working on my MASTER PLAN.
Wednesday, February 24, 2010
Making these things is hard: a special treat for my viewers. Both of you.
One of the problems with scatterplots is that for the most part, you can only plot two variables.
But, to quote RZA: "The fourth dimension is time, it comes alive, when the chakras energize up the back of your spine"
Or, in this case, the third dimension comes alive with a BADLY MADE ANIMATED GIF.
I am so proud of my animated gif, or rather proud of the idea and ashamed of the execution, that I don't know tooooo much to say about this, besides WTF California?
But, to quote RZA: "The fourth dimension is time, it comes alive, when the chakras energize up the back of your spine"
Or, in this case, the third dimension comes alive with a BADLY MADE ANIMATED GIF.
I am so proud of my animated gif, or rather proud of the idea and ashamed of the execution, that I don't know tooooo much to say about this, besides WTF California?
Tuesday, February 23, 2010
New Mexico, and the ALMOST joy of having a wildly counterintuitive result
While entering some other data, I had a chance to see that on paper, much as in real life, New Mexico is perhaps one of the most nuanced and diverse of states. Specifically, New Mexico has a high percentage of people who do not speak English at home. And in many places, a low percentage of people who are foreign born. I wanted to look at these numbers more closely, and I got:
It would be awesome if I could say there is a negative correlation between English speaking and native born people, but actually there is...no correlation. Don't let the blue line fool you, its very gradual slope is close enough to flat.
If I knew more about New Mexico, I could probably make more sense out of this.
It would be awesome if I could say there is a negative correlation between English speaking and native born people, but actually there is...no correlation. Don't let the blue line fool you, its very gradual slope is close enough to flat.
If I knew more about New Mexico, I could probably make more sense out of this.
Sunday, February 21, 2010
Sneak preview!
Friday, February 19, 2010
Not a crescent
In the Grand Crescent post, I postulated that Denver, as an outlier, was where it was because Denver County (which is also Denver City) would be following more the pattern of a large metropolis, then following the pattern of a county in a western state.
I just made that up when I typed it, but it sounded good.
But of course, then I started wondering, so I wanted to plot the high school versus college rates of some of the US' biggest cities. I chose 30 as my number (mostly so Portland could be on there), and started digging for data.
Cities, as demographic units, are not very good. There tends to be lots of artifacts in the data, depending on how the city borders are drawn. Two metro areas might have similar demographics, but the largest city in both of them might exclude or include suburbs. For example, Detroit metro and Portland metro might be more similar than someone would guess, but whereas many of Portland's wealthy areas (The West Hills for example) are included in the city, in Detroit those areas are, I believe, separate suburbs. So this data has problems. All data has problems.
The first thing to notice about this data:
NOT A CRESCENT.
It has a more predictable X=Y shape, although one that is spread out irregularly.
There seems to be a little bit of evidence that parts of the Western States are both well educated, and egalitarian about it. Not a lot, though. Although the five cities in the upper right do have two things in common: they are smaller, and they are outside the most traditional urbanized areas of the United States.
Actually, the most obvious thing that jumps out at me is size, which I should probably do another plot for. NYC, LA, Chicago and Houston, the 4 biggest cities, are all clustered pretty close together. They all have low education levels, and within what they do have, are more "elitist" in the sense of having lots of college graduates for the amount of high school graduates.
I actually am probably going to run plots on these numbers for a number of different factors. Sometimes soon!
I just made that up when I typed it, but it sounded good.
But of course, then I started wondering, so I wanted to plot the high school versus college rates of some of the US' biggest cities. I chose 30 as my number (mostly so Portland could be on there), and started digging for data.
Cities, as demographic units, are not very good. There tends to be lots of artifacts in the data, depending on how the city borders are drawn. Two metro areas might have similar demographics, but the largest city in both of them might exclude or include suburbs. For example, Detroit metro and Portland metro might be more similar than someone would guess, but whereas many of Portland's wealthy areas (The West Hills for example) are included in the city, in Detroit those areas are, I believe, separate suburbs. So this data has problems. All data has problems.
The first thing to notice about this data:
NOT A CRESCENT.
It has a more predictable X=Y shape, although one that is spread out irregularly.
There seems to be a little bit of evidence that parts of the Western States are both well educated, and egalitarian about it. Not a lot, though. Although the five cities in the upper right do have two things in common: they are smaller, and they are outside the most traditional urbanized areas of the United States.
Actually, the most obvious thing that jumps out at me is size, which I should probably do another plot for. NYC, LA, Chicago and Houston, the 4 biggest cities, are all clustered pretty close together. They all have low education levels, and within what they do have, are more "elitist" in the sense of having lots of college graduates for the amount of high school graduates.
I actually am probably going to run plots on these numbers for a number of different factors. Sometimes soon!
Sunday, February 14, 2010
Drivers licenses and rurality.
An intuitive conclusion to draw is that people are more dependent on cars in rural areas. According to anecdotes, New York City is one of the few places where it is normal for an adult to not drive.
Ah, but what does the data say about this intuitive idea?As usual, it says that this idea is not quite that absolute. There is a trend in that direction, but it is quite outdone by other things. New York and Connecticut are a good example: both are urbanized states, with much of Connecticut laying inside the NYC metro area. And yet there numbers of licensed drivers seems quite heavy.
(For whatever reason, Vermont and Alabama have more licensed drivers than they have driving age population, which could be an artifact of keeping their system updated, or could mean there are people who have fraudulent licenses)
Ah, but what does the data say about this intuitive idea?As usual, it says that this idea is not quite that absolute. There is a trend in that direction, but it is quite outdone by other things. New York and Connecticut are a good example: both are urbanized states, with much of Connecticut laying inside the NYC metro area. And yet there numbers of licensed drivers seems quite heavy.
(For whatever reason, Vermont and Alabama have more licensed drivers than they have driving age population, which could be an artifact of keeping their system updated, or could mean there are people who have fraudulent licenses)
Saturday, February 13, 2010
Because Mouse is mad:
So I needed to update this, because I was informed of what "Daily" meant. I was confusing it with "Daly", as in "At the Daly Mansion, you can visit the world of yesterday today".
So I just clicked around on Statemaster until I could find some data to debunk one of those old stereotypes, about taxes and liberalism.
As people have pointed out, "Taxachusetts" is a stereotype. Massachusetts has a tax burden a penny less, per $10, then average. States' tax burden don't seem to correlate with their national politics, at least from this data.
What is interesting about this data is that it doesn't show as much regionalism as could be expected. On most scatterplots, North Dakota and South Dakota show up pretty close to each other, for example. But not here! And although it isn't clear because I haven't labeled enough data points, Appalachia and the Prairie/Mountains are all mixed and matched up, instead of being separate and unequal like they usually are.
So I just clicked around on Statemaster until I could find some data to debunk one of those old stereotypes, about taxes and liberalism.
As people have pointed out, "Taxachusetts" is a stereotype. Massachusetts has a tax burden a penny less, per $10, then average. States' tax burden don't seem to correlate with their national politics, at least from this data.
What is interesting about this data is that it doesn't show as much regionalism as could be expected. On most scatterplots, North Dakota and South Dakota show up pretty close to each other, for example. But not here! And although it isn't clear because I haven't labeled enough data points, Appalachia and the Prairie/Mountains are all mixed and matched up, instead of being separate and unequal like they usually are.
Thursday, February 11, 2010
It has been a while, but only because I have more exhaustive detail then ever before:
So when I did the political correlation for all of those Western states, I also did the high school/college numbers. And then, having those numbers, I also had the Colorado numbers, from previously. I added the Utah numbers in, and ended up with a bunch of data points. All the data points together showed something to me: that data entry is a lovely and fun hobby. They also showed me this:There are 297 data points there. I didn't bother labeling many of them. I did label Denver, because it is a major metropolitan area, and also because its unusual place shows that it has a pattern different from most of what you would find in the Western States. This type of (relatively) high-college, low-highschool speaks of an urban area that attracts less skilled workers, and is more common in the urbanized east than in the Western states.What is most interesting about this diagram for me is that there aren't a lot of outliers. And that it has a specific shape. For some reason, in my mind, not a lot of outliers would make more sense on an X=Y curve. In this case, we have this complicated crescent pattern, that seems to hold true across 297 counties in seven states.
One of the things I wondered is if this was actually several different graphs layed out on each other. Did the three parts of the crescent represent three different types of counties?
So what I did was sort these counties by "Rural Urban Continuum" code. This is a set of codes put out by the Economic Research Service of the USDA that sorts counties by how urban and rural they are. As with any demographic measure, they are not perfect, but they are a useful tool.
So here is the plot for counties in metropolitan areas of some sort, define as RUC codes 1-3
There is a lot of diversity in these counties, since some "metropolitan" counties can be fairly small in population. After all, the summit of Mt. Hood is a "metropolitan" area by this reckoning. In other words, Owyhee County and King County maybe shouldn't belong in the same plot.
But! Despite the fact that this diagram is more spread out, the shape remains. The four counties in the upper right are also not the most urbanized counties. By contrast, the three counties that are the most urbanized (Denver, King, Multnomah) all hang to the left, because like most urban counties, they have lots of college graduates, but also attract less educated workers as well.
So next we will look at counties with codes from 4 to 7: counties that are not metropolitan, but have some urban population. As with above "some urban population" can mean many different things.
Two things here: although once again the picture is somewhat blurred, it is also again, vaguely crescent shaped. Secondly, there is pretty big gap in this diagram. Most of the counties seem to be bunched up right below the 20% mark for college graduation, with a few over 20%. Then, between 30 and 40%: only two counties. Above 40%, there is a lot of counties showing up. From what I know about those counties over 40%, they seem to be mostly resort communities. Gallatin, Montana, for example, is the county adjacent to Yellowstone Park, and so has had a big influx of wealthy residents in the past few decades.
Finally, lets look at the truly rural counties, those considered to have no urban population whatsoever: codes 8 and 9.
And once again...crescent. In fact, if I do say so myself, this is the prettiest of all the crescents we have seen so far. I can't think of anything particularly interesting to say about this crescent, besides its pretty, and how about those San Juans?
One of the things I wondered is if this was actually several different graphs layed out on each other. Did the three parts of the crescent represent three different types of counties?
So what I did was sort these counties by "Rural Urban Continuum" code. This is a set of codes put out by the Economic Research Service of the USDA that sorts counties by how urban and rural they are. As with any demographic measure, they are not perfect, but they are a useful tool.
So here is the plot for counties in metropolitan areas of some sort, define as RUC codes 1-3
There is a lot of diversity in these counties, since some "metropolitan" counties can be fairly small in population. After all, the summit of Mt. Hood is a "metropolitan" area by this reckoning. In other words, Owyhee County and King County maybe shouldn't belong in the same plot.
But! Despite the fact that this diagram is more spread out, the shape remains. The four counties in the upper right are also not the most urbanized counties. By contrast, the three counties that are the most urbanized (Denver, King, Multnomah) all hang to the left, because like most urban counties, they have lots of college graduates, but also attract less educated workers as well.
So next we will look at counties with codes from 4 to 7: counties that are not metropolitan, but have some urban population. As with above "some urban population" can mean many different things.
Two things here: although once again the picture is somewhat blurred, it is also again, vaguely crescent shaped. Secondly, there is pretty big gap in this diagram. Most of the counties seem to be bunched up right below the 20% mark for college graduation, with a few over 20%. Then, between 30 and 40%: only two counties. Above 40%, there is a lot of counties showing up. From what I know about those counties over 40%, they seem to be mostly resort communities. Gallatin, Montana, for example, is the county adjacent to Yellowstone Park, and so has had a big influx of wealthy residents in the past few decades.
Finally, lets look at the truly rural counties, those considered to have no urban population whatsoever: codes 8 and 9.
And once again...crescent. In fact, if I do say so myself, this is the prettiest of all the crescents we have seen so far. I can't think of anything particularly interesting to say about this crescent, besides its pretty, and how about those San Juans?
Subscribe to:
Posts (Atom)