Associating points with polygons in R

Some time ago I posted on how to find geographic coordinates given a list of village or city names in R. Somebody emailed me about how to do the reverse: the person had a list of villages in France along with the population in 2010, and wanted to find which administrative unit each village was located in. The problem boils down to associating points, the village coordinates, with polygons, the administrative division which they are a part of.

The village data look like this:


munic <- read.xls("France-Population.xlsx")
                  Name       long      lat pop_2010
1                 Aast -0.0887339 43.28919 182.5416
2           Abainville  5.4947440 48.53057 327.2407
3            Abancourt  1.7649060 49.69672 687.2479
4            Abancourt  3.2127010 50.23528 448.1252
5            Abaucourt  6.2579230 48.89637 285.9438
6 Abaucourt-Hautecourt  5.5405000 49.19700  93.0353

Read the rest of this entry »

Quick lookup for country codes

After more than half a decade at this, it has finally dawned upon me that instead of downloading the Correlates of War state system membership table, or the Gleditsch and Ward refinement of it, every time I wonder what country “338″ is, it might be easier to upload them to Google:

COW codes and state system membership

G&W codes and state system membership

And, for the sake of self-promoting completeness, code to produce panel data reflecting COW or G&W state system membership, and old Stata code to change country names to COW codes.

Defense doublethink

If you had to take a look at the chart below, what would you say about the overall trend in US defense spending? There’s a bump fairly early on for World War 2, but otherwise it seems to generally increase over time. I’m actually surprised to see that we spend more, in terms of constant US dollars, today than we did at the height of the Korean War, and in fact at any point in US history save World War 2.


Read the rest of this entry »

Code for blank state panel data

The short version:

Here is some R code to create arbitrary country-year or country-month data sets reflecting the Gleditsch and Ward or COW state system membership lists, using the cshapes R package.

The longer version:

Read the rest of this entry »

Time to learn Python

Apparently Python is taking over the world (from a post by Tal Yarkoni):

In 2013, my toolbox looks like this:

  • Python for text processing and miscellaneous scripting;
  • Ruby on Rails/JavaScript for web development, except for an occasional date with Django or Flask (Python frameworks);
  • Python (NumPy/SciPy) for numerical computing;
  • Python (Neurosynth, NiPy etc.) for neuroimaging data analysis;
  • Python (NumPy/SciPy/pandas/statsmodels) for statistical analysis;
  • Python (MatPlotLib) for plotting and visualization, except for web-based visualizations (JavaScript/d3.js);
  • Python (scikit-learn) for machine learning;
  • Excursions into other languages have dropped markedly.

I can’t speak on the relative merits of Python over R, other than a general impression that R has stronger stats but some quirks as a language (pdf), while Python is generally more powerful, but less capable beyond basic statistical tools. I did spend some time trying to learn Python during my last year in graduate school, but it was while I was really still becoming comfortable with R and so I didn’t put much effort into it. Seems like it’s time to head back in that direction again.

Taking over the world

I work as a Postdoctoral Fellow in the Ward Lab here at Duke University. The Lab currently consists of Mike Ward, me, and a group of very smart graduate students. There are a lot of exciting projects within the lab, like ICEWS and other work for the US government, but also a broader set of projects by our lab members. One of the things we wanted to do this semester is to publicize this work a little bit more, and to this end we’re taking a new blog live today: Predictive Heuristics.

Read the rest of this entry »

Map, now!

Sometimes, for whatever reason, you want to plot something fast. Last week I had some coordinates associated with event data that I was hoping were all from Egypt. But the coordinates were for locations that are only indirectly associated with the events I had, so I wanted to do a quick plot to check. The ggmap package in R makes that pretty easy.

Read the rest of this entry »


Get every new post delivered to your Inbox.

Join 115 other followers