Someone sent around a link this morning to data on grade inflation at Duke, which shows a table of average GPAs for undergraduates from 1932 on. Looking at the table you can sort of get a sense of when GPA’s really started increasing (the ’60s), but it would be nicer to just plot them:
Or to plot the year over year change in average GPA, with some missing values interpolated:
I’ve never tried to scrape a website with R before, but it turns out for this it was pretty easy (with some help).
A few weeks ago we were asked to teach the basics of (interpreting) duration models to a group of consumers without using any math. When I learned about this it involved a lot of math and Stata, and when you look around the web it’s usually presented similarly. So this was a bit of a challenge.
A nice thing about duration analysis though is that a lot of the key concepts are already explicitly graphical, like survival curves (wikipedia) and hazard rates. Below, for example, is a survival curve for cancer patients diagnosed with acute lymphoblastic leukemia between 1988 and 2008 in the US, from SEER fast stats:
Will Moore, Kentaro Fukumoto, and I have been working on a random walk negative binomial model for time-series of counts, based on earlier work by Kentaro on a negative binomial integrated (NB I(1)) model. We just presented a related poster in which we look at monthly civilian deaths in Iraq at Peace Science in Savannah, Georgia. Here is the actual pdf poster (it’s a big file, be warned), but the basic point is that ARIMA or classical count-models are not a good way to deal with time-series of counts, like monthly deaths in a conflict, and that we have a tested model for non-stationary counts that has some attractive features.
We are working on a draft paper, so I don’t want to go through the whole story, but if you’d like to try it out yourself and know how to use JAGS, all the R and JAGS code is available on github.
A few months ago I produced some thematic maps of Bosnia (paper) using
maptools and other packages in R, but I didn’t include scales or a north arrow. It sounds simple and
sp has functions for doing those things, but I couldn’t get it to work well with my maps. Here is a basic map of Bosnia’s pre-war municipalities:
The Iraq Body Count project collects reports of civilian deaths, and makes their event data publicly available. Each event gives the date, location, description and civilian deaths associated with an incident. Looking at a few examples [1, 2, 3], you can see that while the data values for the date and deaths are straightforward, the place values get a little bit complicated. I’m looking for the province in which incidents occurred, so the challenge is to associate each place value with a province.
Using the incident data from 2003 to February 2012, about 27,500 records, I’ve written an R script that assign provinces to ~95 percent of the records, 26,000.
Here’s a basic overview of how it works:
Almost all states, at least at some point between 1995 and 2005.
The Ill-Treatment and Torture (ITT) project by Courtenay Conrad and Will Moore codes Amnesty International (AI) allegations of government torture, including the perpetrator, motive, and judicial response. The aggregated, country-year version of their data shows whether AI made allegations against a country in a given year and if so, what the extent of alleged torture or ill-treatment was, on a 5-point scale from “infrequent” to “systematic”.
Here is a video showing the AI torture allegations from 1995 to 2005 using their country-year data and shape files for world borders from Thematic Mapping.
The initial impression I had from this is the sheer extent of (alleged) torture and ill-treatment. It looks like pretty much all major states engaged in torture at some point between 1995 and 2005. Only 8 out of 151 states had no allegations of torture at all (Costa Rica, Uruguay, Finland, Benin, Gabon, Quatar, Singapore, and New Zealand), and in those remaining states with AI allegations of torture, on average there were allegations for 7 out of 10 years. More than a quarter of states were accused of torture or ill-treatment in all 10 years covered by the data.
That doesn’t necessarily mean that a lot of torture or ill-treatment is going on in any specific country, nor that it is systematic. It doesn’t reflect what the specific acts of torture or ill-treatment were, e.g. whether someone was tortured to death or water-boarded (which may not be different). But, nevertheless, unpleasant stuff happens.
R code and source. This produces images for each year that I strung together in iMovie.
In 1991 a census was conducted in Bosnia and Herzegovina, which then was still part of the disintegrating federal state of Yugoslavia. Bosnia was the most diverse republic in the former Yugoslavia, with significant populations of Bosnian Muslims (or Bosniaks, 43 percent), Serbs (31 percent), Croats (17 percent), and others. Bosniaks, Serbs, and Croats were more or less well-established identities with historical roots. Unlike in most multiethnic countries however, the census respondents also had the option to identify themselves as Yugoslavs, rather than a particular ethnic or national group. It turns out that this played an interesting role in the way violence occurred in the Bosnian War from 1992 to 1995.