The Iraq Body Count project collects reports of civilian deaths, and makes their event data publicly available. Each event gives the date, location, description and civilian deaths associated with an incident. Looking at a few examples [1, 2, 3], you can see that while the data values for the date and deaths are straightforward, the place values get a little bit complicated. I’m looking for the province in which incidents occurred, so the challenge is to associate each place value with a province.
Using the incident data from 2003 to February 2012, about 27,500 records, I’ve written an R script that assign provinces to ~95 percent of the records, 26,000.
Here’s a basic overview of how it works: