The short version:
The longer version:
In 2013, my toolbox looks like this:
- Python for text processing and miscellaneous scripting;
- Python (NumPy/SciPy) for numerical computing;
- Python (Neurosynth, NiPy etc.) for neuroimaging data analysis;
- Python (NumPy/SciPy/pandas/statsmodels) for statistical analysis;
- Python (scikit-learn) for machine learning;
- Excursions into other languages have dropped markedly.
I can’t speak on the relative merits of Python over R, other than a general impression that R has stronger stats but some quirks as a language (pdf), while Python is generally more powerful, but less capable beyond basic statistical tools. I did spend some time trying to learn Python during my last year in graduate school, but it was while I was really still becoming comfortable with R and so I didn’t put much effort into it. Seems like it’s time to head back in that direction again.
I work as a Postdoctoral Fellow in the Ward Lab here at Duke University. The Lab currently consists of Mike Ward, me, and a group of very smart graduate students. There are a lot of exciting projects within the lab, like ICEWS and other work for the US government, but also a broader set of projects by our lab members. One of the things we wanted to do this semester is to publicize this work a little bit more, and to this end we’re taking a new blog live today: Predictive Heuristics.
Sometimes, for whatever reason, you want to plot something fast. Last week I had some coordinates associated with event data that I was hoping were all from Egypt. But the coordinates were for locations that are only indirectly associated with the events I had, so I wanted to do a quick plot to check. The
ggmap package in R makes that pretty easy.
Recently I’ve set up both a PostgreSQL and MySQL server to host databases related to some of our projects in the Ward Lab. I should note that I have no idea what I’m doing, and this is the first time I’ve dealt with databases and how to get them working. It’s been a very humbling experience, although in the end, we now have two different databases that can be accessed remotely from a laptop through R or other tools like Quantum GIS:
# setup connection to database library(rgdal) dsn <- "PG: dbname='db' host='someIP' port='5432' user='me' password='guest'" # Load Afghanistan boundary (source: GADM) state <- readOGR(dsn, layer="afg_adm0") plot(state)
Blast from the past, in which I used Stata for things: here are some do files that will convert ISO or string country names/codes to COW country codes.
Country names to COW [.do] Do-file that creates COW country codes for a dataset that only has country names in it. The file does not take the current year of an observation into account, and does not adjust for state membership (e.g. there was no unified Germany between 1946 and 1989). It should work fine for any cross-section between 1946 and 2004, or for cross-section time-series datasets if you are not concerned about adjusting for state membership.
ISO country code to COW [.do] Do-file that creates COW country codes for a dataset that only has ISO country codes in it. The file does not take the current year of an observation into account, and does not adjust for state membership (e.g. there was no unified Germany between 1946 and 1989). It should work fine for any cross-section between 1946 and 2004, or for cross-section time-series datasets if you are not concerned about adjusting for state membership.
Time series ISO to COW [.do] Do-file that creates COW country codes fora dataset that only has ISO country codes in it. The file accurately assigns country codes for the time period 1946 to 2004 according to the COW state membership list, but does not drop any country-years that do not conform to COW. If you want a dataset that conforms to COW, this will allow you to merge your dataset to a blank COW state membership list.
The problem: you have data that includes the name of a village or city in which something happened, but not coordinates for that village or city. This seems to be a pretty common problem, or at least I’ve come across it a few times.
The (or a) solution is to look up the city names with the GeoNames database. Their website has a search feature and with that it’s pretty easy to figure out the coordinates for a city or village name. Of course, that stops being convenient when you are dealing with more than a few city names for which you want coordinates.
So, can we do this in R? Yes, and it’s not that difficult. Read the rest of this entry »