<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>MI Regression</title>
	<atom:link href="http://andybeger.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://andybeger.wordpress.com</link>
	<description>Quantitative Research and Military Intelligence</description>
	<lastBuildDate>Thu, 09 May 2013 13:32:51 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='andybeger.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>MI Regression</title>
		<link>http://andybeger.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://andybeger.wordpress.com/osd.xml" title="MI Regression" />
	<atom:link rel='hub' href='http://andybeger.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Plot of Duke grade inflation</title>
		<link>http://andybeger.wordpress.com/2013/05/08/plot-of-duke-grade-inflation/</link>
		<comments>http://andybeger.wordpress.com/2013/05/08/plot-of-duke-grade-inflation/#comments</comments>
		<pubDate>Tue, 07 May 2013 18:36:07 +0000</pubDate>
		<dc:creator>Andreas Beger</dc:creator>
				<category><![CDATA[Data and code]]></category>
		<category><![CDATA[Duke]]></category>
		<category><![CDATA[grade inflation]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://andybeger.wordpress.com/?p=243</guid>
		<description><![CDATA[Someone sent around a link this morning to data on grade inflation at Duke, which shows a table of average GPAs for undergraduates from 1932 on. Looking at the table you can sort of get a sense of when GPA&#8217;s really started increasing (the &#8217;60s), but it would be nicer to just plot them: Or [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andybeger.wordpress.com&#038;blog=19223690&#038;post=243&#038;subd=andybeger&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Someone sent around a link this morning to data on <a href="http://www.gradeinflation.com/Duke.html">grade inflation at Duke</a>, which shows a table of average GPAs for undergraduates from 1932 on. Looking at the table you can sort of get a sense of when GPA&#8217;s really started increasing (the &#8217;60s), but it would be nicer to just plot them:</p>
<p style="text-align:center;"><a href="http://andybeger.files.wordpress.com/2013/05/duke_grades_splines.png"><img class=" wp-image-269 aligncenter" alt="duke_grades_splines" src="http://andybeger.files.wordpress.com/2013/05/duke_grades_splines.png?w=354&#038;h=236" width="354" height="236" /></a></p>
<p style="text-align:left;">Or to plot the year over year change in average GPA, with some missing values interpolated:</p>
<p style="text-align:center;"><a href="http://andybeger.files.wordpress.com/2013/05/duke_grades_diff1.png"><img class=" wp-image-266 aligncenter" alt="duke_grades_diff" src="http://andybeger.files.wordpress.com/2013/05/duke_grades_diff1.png?w=354&#038;h=236" width="354" height="236" /></a></p>
<p>I&#8217;ve never tried to scrape a website with R before, but it turns out for this it was pretty easy (<a href="http://giventhedata.blogspot.com/2012/08/r-and-web-for-beginners-part-iii.html">with some help</a>).</p>
<p><span id="more-243"></span></p>
<p>Using the <code>XML</code> package and <code>readLines()</code> from the <code>base</code> package you can read the html file which has the grade inflation data. The result of this is not really useful yet since it contains all of the original html and xml tags, but with another function, <code>readHTMLTable</code> one can pull out just the table itself. Since R will by default convert character vectors to factors when creating a data frame, there are a few extra lines in the code below to create a data frame with two numeric vectors for year and GPA:</p>
<pre class="brush: r; title: ; notranslate">
library(XML)
library(plyr)

# Get and format html file
duke.html &lt;- readLines(&quot;http://www.gradeinflation.com/Duke.html&quot;)
duke.doc &lt;- htmlParse(duke.html)

# Get table as data frame
duke &lt;- readHTMLTable(duke.doc, header=F, as.data.frame=F)
duke &lt;- data.frame(duke, stringsAsFactors=F)
colnames(duke) &lt;- c(&quot;year&quot;, &quot;gpa&quot;)

# Format columns
duke$year &lt;- as.numeric(duke$year)
duke$gpa &lt;- as.numeric(ifelse(duke$gpa==&quot;n.d.&quot;, NA, duke$gpa))
</pre>
<p>So now we have a data frame with correct types for year and GPA:</p>
<pre class="brush: plain; title: ; notranslate"> &gt; head(duke)
  year  gpa
1 1932 2.25
2 1933 2.28
3 1934 2.27
4 1935 2.23
5 1936 2.21
6 1937 2.26
</pre>
<p>But if you look at the table on the <a href="http://www.gradeinflation.com/Duke.html">original website</a>, you might notice something funny with the years listed&#8230;there are gaps, or missing years, like the skip between &#8217;47 and &#8217;56. After a few lines to add the missing years (link to the full code is below), we can plot the average undergrad GPA for Duke from 1932 on:</p>
<pre class="brush: r; title: ; notranslate">
plot(duke$year, duke$gpa)
</pre>
<p style="text-align:center;"><a href="http://andybeger.files.wordpress.com/2013/05/duke_grades_simple_plot.png"><img class=" wp-image-268 aligncenter" alt="duke_grades_simple_plot" src="http://andybeger.files.wordpress.com/2013/05/duke_grades_simple_plot.png?w=378&#038;h=251" width="378" height="251" /></a></p>
<p>With a little bit of extra code we can make the plot look a bit nicer: add grid lines, mark every decade instead of every 20 years on the x-axis, axis labels, and set the y-axis limits to round numbers:</p>
<pre class="brush: r; title: ; notranslate">
par(cex=1.2)
plot(duke$year, duke$gpa, ylim=c(2, 4), type=&quot;p&quot;, pch=20, xaxt=&quot;n&quot;, xlab=&quot;year&quot;,
     ylab=&quot;GPA&quot;)
x_ticks &lt;- seq(round_any(min(years_covered), 10), 
               round_any(max(years_covered), 10), 10)
axis(1, at=x_ticks)
grid(col=&quot;gray50&quot;, lty=3)
</pre>
<p style="text-align:center;"><a href="http://andybeger.files.wordpress.com/2013/05/duke_grades_nicer_plot.png"><img class="wp-image-267 aligncenter" alt="duke_grades_nicer_plot" src="http://andybeger.files.wordpress.com/2013/05/duke_grades_nicer_plot.png?w=378&#038;h=251" width="378" height="251" /></a></p>
<p>Coding aside, two things stand out from this plot. First, there are significant gaps in the data, e.g. in the 50&#8242;s and 60&#8242;s. Second, something crazy happened with grade inflation in the 1960&#8242;s. We can do something about the missing values by using splines to interpolate them:</p>
<pre class="brush: r; title: ; notranslate">
duke$gpa_inter &lt;- spline(duke$gpa, n=length(duke$year))$y
lines(duke$year, duke$gpa_inter, col=&quot;gray80&quot;)
</pre>
<p style="text-align:center;"><a href="http://andybeger.files.wordpress.com/2013/05/duke_grades_splines.png"><img class=" wp-image-269 aligncenter" alt="duke_grades_splines" src="http://andybeger.files.wordpress.com/2013/05/duke_grades_splines.png?w=378&#038;h=251" width="378" height="251" /></a></p>
<p>Finally, since in the context of grade inflation we might be less interested in the absolute values for the average GPA, it might make more sense to plot the changes in GPA from year to year. It looks like there was a period of grad inflation in the late &#8217;40s (WW2?) and another, bigger period of grade inflation in the 1960s. This actually seems to have happened nation-wide in the 1960s, if you look at the writeup at <a href="http://www.gradeinflation.com/">http://www.gradeinflation.com/</a>.</p>
<pre class="brush: r; title: ; notranslate">
plot(duke$year[2:dim(duke)[1]], diff(duke$gpa_inter), col=&quot;gray50&quot;, pch=20,
     xlab=&quot;year&quot;, ylab=&quot;Change from prev. year&quot;, type=&quot;h&quot;, lwd=2, xaxt=&quot;n&quot;)
axis(1, at=x_ticks)
grid(col=&quot;gray50&quot;, lty=3)
abline(h=0, col=&quot;red&quot;)
</pre>
<p style="text-align:center;"><a href="http://andybeger.files.wordpress.com/2013/05/duke_grades_diff1.png"><img class=" wp-image-266 aligncenter" alt="duke_grades_diff" src="http://andybeger.files.wordpress.com/2013/05/duke_grades_diff1.png?w=378&#038;h=251" width="378" height="251" /></a></p>
<p>Here is a <a href="https://gist.github.com/andybega/5533454">gist of the complete code</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/andybeger.wordpress.com/243/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/andybeger.wordpress.com/243/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andybeger.wordpress.com&#038;blog=19223690&#038;post=243&#038;subd=andybeger&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://andybeger.wordpress.com/2013/05/08/plot-of-duke-grade-inflation/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/749f983a6497902acd74f1377866e385?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">andybeger</media:title>
		</media:content>

		<media:content url="http://andybeger.files.wordpress.com/2013/05/duke_grades_splines.png" medium="image">
			<media:title type="html">duke_grades_splines</media:title>
		</media:content>

		<media:content url="http://andybeger.files.wordpress.com/2013/05/duke_grades_diff1.png" medium="image">
			<media:title type="html">duke_grades_diff</media:title>
		</media:content>

		<media:content url="http://andybeger.files.wordpress.com/2013/05/duke_grades_simple_plot.png" medium="image">
			<media:title type="html">duke_grades_simple_plot</media:title>
		</media:content>

		<media:content url="http://andybeger.files.wordpress.com/2013/05/duke_grades_nicer_plot.png" medium="image">
			<media:title type="html">duke_grades_nicer_plot</media:title>
		</media:content>

		<media:content url="http://andybeger.files.wordpress.com/2013/05/duke_grades_splines.png" medium="image">
			<media:title type="html">duke_grades_splines</media:title>
		</media:content>

		<media:content url="http://andybeger.files.wordpress.com/2013/05/duke_grades_diff1.png" medium="image">
			<media:title type="html">duke_grades_diff</media:title>
		</media:content>
	</item>
		<item>
		<title>Building a survivor curve from observed data</title>
		<link>http://andybeger.wordpress.com/2012/12/22/building-a-survivor-curve-from-observed-data/</link>
		<comments>http://andybeger.wordpress.com/2012/12/22/building-a-survivor-curve-from-observed-data/#comments</comments>
		<pubDate>Fri, 21 Dec 2012 19:52:38 +0000</pubDate>
		<dc:creator>Andreas Beger</dc:creator>
				<category><![CDATA[Data and code]]></category>
		<category><![CDATA[Teaching]]></category>
		<category><![CDATA[duration modeling]]></category>
		<category><![CDATA[survival function]]></category>

		<guid isPermaLink="false">http://andybeger.wordpress.com/?p=211</guid>
		<description><![CDATA[A few weeks ago we were asked to teach the basics of (interpreting) duration models to a group of consumers without using any math. When I learned about this it involved a lot of math and Stata, and when you look around the web it&#8217;s usually presented similarly. So this was a bit of a challenge. [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andybeger.wordpress.com&#038;blog=19223690&#038;post=211&#038;subd=andybeger&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>A few weeks ago <a title="Ward Lab" href="http://www.mdwardlab.com">we</a> were asked to teach the basics of (interpreting) duration models to a group of consumers without using any math. When I learned <a href="https://files.nyu.edu/mrg217/public/homepage.htm">about this</a> it involved a lot of math and Stata, and when you look around the web it&#8217;s usually presented similarly. So this was a bit of a challenge.</p>
<p>A nice thing about duration analysis though is that a lot of the key concepts are already explicitly graphical, like <a href="http://en.wikipedia.org/wiki/Survival_function">survival curves</a> (wikipedia) and hazard rates. Below, for example, is a survival curve for cancer patients diagnosed with acute lymphoblastic leukemia between 1988 and 2008 in the US, from <a href="http://seer.cancer.gov/faststats/index.php">SEER fast stats</a>:</p>
<p style="text-align:center;"><a href="http://andybeger.wordpress.com/2012/12/22/building-a-survivor-curve-from-observed-data/all_surv/" rel="attachment wp-att-217"><img class="wp-image-217 aligncenter" title="Survival curve for ALL, 1988-2008" alt="all_surv" src="http://andybeger.files.wordpress.com/2012/12/all_surv.png?w=531&#038;h=266" width="531" height="266" /></a></p>
<p><span id="more-211"></span></p>
<p>Starting from the moment at which a patient is diagnosed with cancer (year 0), it shows the proportion of patients who survive without relapse or death any given number of years from diagnosis. Two years from diagnosis, 50 percents of patients are still event free. Five years from diagnosis about 35 percent are event free (what you might call cured), etc. Alternatively, one could interpret it as the probability that a given ALL patient will be alive 2 or 5 years from diagnosis, with probabilities of 0.5 and 0.35 respectively.</p>
<p>So far so good, except that in practice one has to estimate survival functions on the basis of limited empirical data, e.g. using a <a href="http://en.wikipedia.org/wiki/Kaplan–Meier_estimator">Kaplan-Meier estimate</a> (wikipedia). The resulting estimated survivor curves are not smooth like the curve above, but ragged. Using another example of lightbulbs, with data for the number of days until five bulbs burned out, one might get something like this:</p>
<p style="text-align:center;"><a href="http://andybeger.files.wordpress.com/2012/12/first5_s.png"><img class="size-full wp-image-229 aligncenter" alt="first5_S" src="http://andybeger.files.wordpress.com/2012/12/first5_s.png?w=590&#038;h=295" width="590" height="295" /></a></p>
<p>The black line shows the survivor curve estimate, based on data for the five lightbulbs. I added the grey bars to represent the number of days each of the five bulbs lasted to show how the survivor estimate is built up from individual failures. E.g. the top bar is the first bulb, which lasted for ~80 days. Thus on day 25 all 5 bulbs are still burning and our survivor curve is at 1.0. Around day 80, after the first bulb has failed, 4 of 5 or 0.8 of the original bulbs are still &#8220;alive&#8221;, bringing the curve down to 0.8. And so on for each additional failure.</p>
<p>Taking a cue from Bayesian Biologist&#8217;s <a href="http://bayesianbiologist.com/2012/08/17/an-update-on-visualizing-bayesian-updating/">video of Bayesian updating</a>, and Drew Conway&#8217;s <a href="http://www.drewconway.com/zia/?p=2741">video of Chicago crime</a> , I thought it would be nice to create a video that shows how an empirical survivor curve like this is built up from observed failure data, and how it changes as you add more data.</p>
<p>First, a simple backstory: Imagine you move into a new apartment, and having an obsession with measurement, you start keeping track of how long the lightbulbs in your five light fixtures last (it&#8217;s a small apartment). In the video below, the top half shows the five fixtures, where each bar represents a lightbulb and the number of days it is in operation now. On day 0 all 5 fixtures have new bulbs, i.e. there are no failures yet. Thus the survivor curve estimate, shown at the bottom, is not very useful (we disregard bulbs that haven&#8217;t burned out yet, i.e. no censoring).</p>
<div class='embed-vimeo' style='text-align:center;'><iframe src='http://player.vimeo.com/video/56935195' width='640' height='640' frameborder='0'></iframe></div>
<p>Time goes by and after 80 days bulbs are starting to burn out. With data on failures we can now start updating our survivor curve to reflect those failures. The first updates still leave us with a very rough survivor curve estimate, but as more bulbs fail the curve starts getting a nicer shape. Note also that the mean time to failure (MTTF) in the bottom left corner starts getting closer to it&#8217;s theoretical value. The video ends after a year&#8217;s worth of simulation, but the longer we let it run the smoother the KM estimate would get. Eventually the KM estimate should converge to the theoretical survivor curve shown in red at the end of the video.</p>
<p>I created the video using R, with the code below. You&#8217;ll need <a href="http://ffmpeg.org">ffmpeg</a> to combine the individual frames into a video at the end.</p>
<pre class="brush: r; title: ; notranslate">
library(ggplot2)
library(gridExtra)

## functions for simulated dgp
createSurvivalFrame &amp;amp;lt;- function(f.survfit) {
 ## Create data frame to pass to ggplot2 using survfit result
 if (class(f.survfit)!='survfit') stop('Need &amp;amp;quot;survfit&amp;amp;quot; class object.')
 f.surv &amp;amp;lt;- data.frame(time=f.survfit$time, surv=f.survfit$surv)
 f.start &amp;amp;lt;- data.frame(time=c(0, f.surv$time[1]), surv=c(1, 1))
 # add row at end (0 to end of time)
 f.end &amp;amp;lt;- data.frame(time=125, surv=0)
 f.surv &amp;amp;lt;- rbind(f.start, f.surv, f.end)

 return(f.surv)
}

qplot_survival &amp;amp;lt;- function(f.surv) {
 require(ggplot2)
 p &amp;amp;lt;- ggplot(data=f.surv) + geom_step(aes(x=time, y=surv), direction='hv', lwd=1.5)
 p &amp;amp;lt;- p + theme_bw() + ylim(c(0, 1)) +
 scale_x_continuous(breaks=c(0, 25, 50, 75, 100, 125), limits=c(0,125)) +
 labs(y='Survivors', x='Day')
 return(p)
}

## function to simulate lightbulbs
lights_sim &amp;amp;lt;- function(max.days=100) {
 require(survival)
 require(ggplot2)
 require(gridExtra)

 # Initalize
 obs.bulbs &amp;amp;lt;- NULL
 fixtures &amp;amp;lt;- data.frame(no=1:5, bulb=NA, bulb.life=NA, bulb.spell=0, event=F)
 bulb &amp;amp;lt;- 0
 true.mttf &amp;amp;lt;- round(gamma(1+1/10)*100)
 mttf &amp;amp;lt;- 'NA'

 # Initial survival plot
 f.surv &amp;amp;lt;- data.frame(time=c(0, 125), surv=c(1, 1))
 p &amp;amp;lt;- qplot_survival(f.surv) +
 annotate('text', x=0, y=0.2, hjust=0,
 label='Estimated MTTF:', size=4) +
 annotate('text', x=35, y=0.2, hjust=0,
 label=mttf, size=4) +
 annotate('text', x=0, y=0.1, hjust=0,
 label='Theoretical MTTF:', size=4) +
 annotate('text', x=35, y=0.1, hjust=0,
 label=true.mttf, size=4)

 # Simulate through max.days
 for (day in 1:max.days) {
 # Update spell counters, reset events
 fixtures$bulb.spell &amp;amp;lt;- fixtures$bulb.spell + 1
 # Place bulbs in empty fixtures
 while (any(is.na(fixtures$bulb))) {
 bulb &amp;amp;lt;- bulb + 1
 fixtures[match(NA, fixtures$bulb), c('bulb', 'bulb.life')] &amp;amp;lt;- c(bulb, round(rweibull(1, shape=10, scale=100)))
 }
 # Check for bulbs to burn out now
 fixtures$event &amp;amp;lt;- with(fixtures, ifelse(bulb.life==bulb.spell, T, F))
 if (any(fixtures$event)) {
 obs.bulbs &amp;amp;lt;- rbind(obs.bulbs, fixtures[fixtures$event==T, c('bulb', 'bulb.life', 'event')])
 fixtures[fixtures$event==T, c('bulb', 'bulb.life', 'bulb.spell', 'event')] &amp;amp;lt;-
 matrix(rep(c(NA, NA, 0, F), sum(fixtures$event)), ncol=4, byrow=T)
 # Mean time to failure estimate
 mttf &amp;amp;lt;- round(mean(obs.bulbs$bulb.life), digits=1)
 # Kaplan-Meier surv curve
 surv.data &amp;amp;lt;- with(obs.bulbs, Surv(bulb.life, event))
 f.surv &amp;amp;lt;- createSurvivalFrame(survfit(surv.data ~ 1, surv.data))
 p &amp;amp;lt;- qplot_survival(f.surv) +
 annotate('text', x=0, y=0.2, hjust=0,
 label='Estimated MTTF:', size=4) +
 annotate('text', x=35, y=0.2, hjust=0,
 label=mttf, size=4) +
 annotate('text', x=0, y=0.1, hjust=0,
 label='Theoretical MTTF:', size=4) +
 annotate('text', x=35, y=0.1, hjust=0,
 label=true.mttf, size=4)

 }

# fixture plot for each day
 p.fix &amp;amp;lt;- ggplot(data=fixtures) +
 geom_bar(aes(x=factor(no), y=bulb.spell), fill=rgb(0,0,0.61), width=0.3) +
 scale_y_continuous(limits=c(0, 125), name='', breaks=c(0, 25, 50, 75, 100, 125)) +
 scale_x_discrete(name='Lamp') + coord_flip() + theme_bw() + theme(axis.title.y=element_text(vjust=0.1)) +
 theme(plot.margin=unit(c(1,1,0.1,1.45), 'lines')) + ggtitle(paste0('Day: ', day))

 # Plot frames for each day
 png(paste0('graphics/frames/', sprintf('%03d', day), '.png'))
 grid.arrange(p.fix, p)
 dev.off()

 # Progress bar
 pb &amp;amp;lt;- txtProgressBar(min=0, max=max.days, style=3, width=50)
 setTxtProgressBar(pb, day)
 }

 ## Add last frame with true survivor curve
 # unobserved data-generating process
 dgp &amp;amp;lt;- data.frame(t=1:125, f=dweibull(1:125, shape=10, scale=100))
 dgp$F &amp;amp;lt;- cumsum(dgp$f)
 dgp$S &amp;amp;lt;- 1 - dgp$F

 true &amp;amp;lt;- p + geom_line(data=dgp, aes(x=t, y=S), col='red', lwd=1, linetype='dashed') +
 annotate('text', label='Empirical', x=70, y=0.65) +
 annotate('text', label='Theoretical', x=105, y=0.72)

 png(paste0('graphics/frames/', sprintf('%03d', max.days+1), '.png'))
 grid.arrange(p.fix, true)
 dev.off()

# done, return failures
 return(list(obs = obs.bulbs, current = fixtures, dgp = dgp))
}

### End of functions ###

## Simulate
set.seed(1152359)
sims &amp;amp;lt;- lights_sim(365)

system('ffmpeg -f image2 -r 10 -i ~/path/to/frames/%03d.png ~/path/to/video/lightbulbs.mp4')

</pre>
<p>r code and github links</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/andybeger.wordpress.com/211/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/andybeger.wordpress.com/211/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andybeger.wordpress.com&#038;blog=19223690&#038;post=211&#038;subd=andybeger&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://andybeger.wordpress.com/2012/12/22/building-a-survivor-curve-from-observed-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/749f983a6497902acd74f1377866e385?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">andybeger</media:title>
		</media:content>

		<media:content url="http://andybeger.files.wordpress.com/2012/12/all_surv.png" medium="image">
			<media:title type="html">Survival curve for ALL, 1988-2008</media:title>
		</media:content>

		<media:content url="http://andybeger.files.wordpress.com/2012/12/first5_s.png" medium="image">
			<media:title type="html">first5_S</media:title>
		</media:content>
	</item>
		<item>
		<title>Random walk negative binomial model for persistent count series.</title>
		<link>http://andybeger.wordpress.com/2012/10/29/random-walk-negative-binomial-model-for-persistent-count-series/</link>
		<comments>http://andybeger.wordpress.com/2012/10/29/random-walk-negative-binomial-model-for-persistent-count-series/#comments</comments>
		<pubDate>Mon, 29 Oct 2012 15:07:55 +0000</pubDate>
		<dc:creator>Andreas Beger</dc:creator>
				<category><![CDATA[Data and code]]></category>
		<category><![CDATA[Iraq]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Bayesian]]></category>
		<category><![CDATA[random walk negative binomial]]></category>
		<category><![CDATA[time-series event count]]></category>

		<guid isPermaLink="false">http://andybeger.wordpress.com/?p=202</guid>
		<description><![CDATA[Will Moore, Kentaro Fukumoto, and I have been working on a random walk negative binomial model for time-series of counts, based on earlier work by Kentaro on a negative binomial integrated (NB I(1)) model. We just presented a related poster in which we look at monthly civilian deaths in Iraq at Peace Science in Savannah, [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andybeger.wordpress.com&#038;blog=19223690&#038;post=202&#038;subd=andybeger&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a title="Will Moore" href="http://mailer.fsu.edu/~whmoore/garnet-whmoore/">Will Moore</a>, <a title="Kentaro Fukumoto" href="http://www-cc.gakushuin.ac.jp/~e982440/index_e.htm">Kentaro Fukumoto</a>, and I have been working on a random walk negative binomial model for time-series of counts, based on earlier work by Kentaro on a negative binomial integrated (NB I(1)) model. We just presented a related poster in which we look at monthly civilian deaths in Iraq at Peace Science in Savannah, Georgia. Here is the actual <a href="http://andybeger.files.wordpress.com/2012/10/pssi_poster.pdf">pdf poster</a> (it&#8217;s a big file, be warned), but the basic point is that ARIMA or classical count-models are not a good way to deal with time-series of counts, like monthly deaths in a conflict, and that we have a tested model for non-stationary counts that has some attractive features.</p>
<p>We are working on a draft paper, so I don&#8217;t want to go through the whole story, but if you&#8217;d like to try it out yourself and know how to use JAGS, all the R and JAGS code is <a href="https://github.com/andybega/PSSI_2012_TSEC_Iraq">available on github</a>.</p>
<p><span id="more-202"></span></p>
<p>Basically we are using a state-space model which consists of two separate equations: (1) a state transition equation that models a latent system state and describes how that latent state changes over time, and (2) a measurement equation that describes the process by which we observe actual outcomes (y) at any given time point. As a result, this model separates error in the temporal process that creates our data from the measurement error with which we observe it. An implication is that you can have covariates in either equation, where variables in the state transition equation have effects that propagate through time, i.e. short and long-run effects, while variables entering the measurement equation only have an instantaneous impact for that measurement period.</p>
<div id="attachment_200" class="wp-caption alignright" style="width: 410px"><a href="http://andybeger.files.wordpress.com/2012/10/deaths_iraq_total.png"><img class=" wp-image-200  " title="deaths_iraq_total" alt="" src="http://andybeger.files.wordpress.com/2012/10/deaths_iraq_total.png?w=400&#038;h=200" height="200" width="400" /></a><p class="wp-caption-text">Monthly Iraqi civilian deaths (black=total, red=Baghdad, blue=rest of Iraq).</p></div>
<p>We&#8217;ve tested this model against simulated data we create, and replication code for these simulations is included as well. For an empirical application, I ran the model against some data on Iraqi civilian deaths from the Iraq Body Count that I was <a href="http://andybeger.wordpress.com/2012/03/21/coding-provinces-for-the-iraq-body-count-data/">working on earlier</a>. The graph on the right shows the  monthly civilian deaths in Iraq we are working with. The black line shows totals, and red and blue show deaths in Baghdad and the rest of Iraq respectively. I&#8217;ve highlighted some the of the spikes in deaths, like during the invasion or government offensives.</p>
<div id="attachment_201" class="wp-caption alignright" style="width: 410px"><a href="http://andybeger.files.wordpress.com/2012/10/iraq_fitted.png"><img class=" wp-image-201 " title="Random walk negative binomial fitted values for Iraq monthly civilian deaths." alt="" src="http://andybeger.files.wordpress.com/2012/10/iraq_fitted.png?w=400&#038;h=200" height="200" width="400" /></a><p class="wp-caption-text">In-sample fitted values (red) and observed civilian deaths (black). The red band shows 80% interval.</p></div>
<p>We estimated a very basic random walk negative binomial model for total monthly civilian deaths, with four binary indicators for the initial invasion period, elections, government offensives, and Ramadan. The in-sample model fit is in the graph on the right. I think this is a pretty amazing fit considering how simple this model is, and the mean predicted values seem to be a pretty good reflection of local levels at any given time. At this stage I would feel fairly confident in using this for hypothesis testing, but fore forecasting it would be nice to have a more explicit temporal structure, e.g. running estimates of trends and changes in trends.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/andybeger.wordpress.com/202/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/andybeger.wordpress.com/202/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andybeger.wordpress.com&#038;blog=19223690&#038;post=202&#038;subd=andybeger&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://andybeger.wordpress.com/2012/10/29/random-walk-negative-binomial-model-for-persistent-count-series/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/749f983a6497902acd74f1377866e385?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">andybeger</media:title>
		</media:content>

		<media:content url="http://andybeger.files.wordpress.com/2012/10/deaths_iraq_total.png?w=1024" medium="image">
			<media:title type="html">deaths_iraq_total</media:title>
		</media:content>

		<media:content url="http://andybeger.files.wordpress.com/2012/10/iraq_fitted.png" medium="image">
			<media:title type="html">Random walk negative binomial fitted values for Iraq monthly civilian deaths.</media:title>
		</media:content>
	</item>
		<item>
		<title>Scale and north arrow for maps in R</title>
		<link>http://andybeger.wordpress.com/2012/08/25/scale-and-north-arrow-for-maps-in-r/</link>
		<comments>http://andybeger.wordpress.com/2012/08/25/scale-and-north-arrow-for-maps-in-r/#comments</comments>
		<pubDate>Fri, 24 Aug 2012 20:59:14 +0000</pubDate>
		<dc:creator>Andreas Beger</dc:creator>
				<category><![CDATA[Data and code]]></category>
		<category><![CDATA[Bosnia]]></category>
		<category><![CDATA[map scale]]></category>
		<category><![CDATA[north arrow]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://andybeger.wordpress.com/?p=180</guid>
		<description><![CDATA[A few months ago I produced some thematic maps of Bosnia (paper) using maptools and other packages in R, but I didn&#8217;t include scales or a north arrow. It sounds simple and sp has functions for doing those things, but I couldn&#8217;t get it to work well with my maps. Here is a basic map [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andybeger.wordpress.com&#038;blog=19223690&#038;post=180&#038;subd=andybeger&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>A few months ago I produced some thematic maps of Bosnia (<a href="http://andybeger.wordpress.com/research/" title="Research papers">paper</a>) using <code>maptools</code> and other packages in R, but I didn&#8217;t include scales or a north arrow. It sounds simple and <code>sp</code> has functions for doing those things, but I couldn&#8217;t get it to work well with my maps. Here is a basic map of Bosnia&#8217;s pre-war municipalities:</p>
<pre class="brush: r; title: ; notranslate">
library(maptools)

plot(bosnia)
</pre>
<p style="text-align:center;"><a href="http://andybeger.files.wordpress.com/2012/08/bosnia_munic.png"><img class="size-medium wp-image-183 aligncenter" title="bosnia_munic" src="http://andybeger.files.wordpress.com/2012/08/bosnia_munic.png?w=300&#038;h=300" alt="" width="300" height="300" /></a></p>
<p><span id="more-180"></span></p>
<p>The function <code>map.scale()</code> from the <code>maps</code> package adds a scale. The position is in map units, latitude/longitude in this case:</p>
<pre class="brush: r; title: ; notranslate">
library(maps)
map.scale(x=15.5, y=42.75, ratio=FALSE, relwidth=0.2)
</pre>
<p>And <code>GISTools</code> <code>north.arrow</code> for the north arrow. Units are also in map units. This package has a map scale function as well, which looks nicer but is a little bit more complicated to set up. </p>
<pre class="brush: r; title: ; notranslate">
library(GISTools)
north.arrow(xb=15.75, yb=43.25, len=0.05, lab=&quot;N&quot;)
</pre>
<p>This will produce the following map:</p>
<p><a href="http://andybeger.files.wordpress.com/2012/08/bosnia_munic2.png"><img src="http://andybeger.files.wordpress.com/2012/08/bosnia_munic2.png?w=300&#038;h=300" alt="" title="bosnia_munic2" width="300" height="300" class="aligncenter size-medium wp-image-190" /></a></p>
<p>Adding these to a function I wrote for producing thematic maps of Bosnia produces this pretty nice map (with scale and north arrow!) of document per capita civil war deaths:</p>
<p><a href="http://andybeger.files.wordpress.com/2012/08/map_dead.png"><img src="http://andybeger.files.wordpress.com/2012/08/map_dead.png?w=300&#038;h=300" alt="" title="map_dead" width="300" height="300" class="aligncenter size-medium wp-image-192" /></a></p>
<p>Here is the function (which is pretty specific to the data I use):</p>
<pre class="brush: r; title: ; notranslate">
ThematicMap&lt;-function(vector, breaks, title, legend) {
  require(maptools)
  require(shape)
  require(RColorBrewer)
  require(GISTools)
  require(maps)
  
  plotvar &lt;- unlist(vector)
  nclr &lt;- 9
  plotclr &lt;- brewer.pal(nclr, &quot;Reds&quot;)
  fillRed &lt;- colorRampPalette(plotclr)
  plotvar[plotvar &gt;= maxy] &lt;- maxy -1
  colcode &lt;- fillRed(maxy)[round(plotvar) + 1]
  plot(bosnia, col = colcode, lty = 0, border = &quot;gray&quot;)
  plot(bosnia.st, add=TRUE, lwd=1, border = &quot;gray30&quot;)
  plot(bosnia.front93, add = TRUE, lty=&quot;solid&quot;, lwd=1.5, col=&quot;darkblue&quot;)
  map.scale(x=15.5, y=42.75, relwidth=0.2, ratio=FALSE)
  north.arrow(xb=15.75, yb=43.25, len=0.05, lab=&quot;N&quot;)
  title(main = title)
  colorlegend(posy = c(0.05,0.9), posx = c(0.9,0.92),
              col = fillRed(maxy),
              zlim=c(0, maxy), zval = breaks,
              main = legend,
              main.cex = 0.9)
  par(bg='white')
}

killed.pk &lt;- (killed/pop91*1000)
maxy &lt;- 50
breaks &lt;- c(0, 10, 20, 30, 40, Inf)
png(&quot;images/map_dead.png&quot;)
print(ThematicMap(killed.pk,breaks,&quot;Documented killings during the Bosnian War&quot;,&quot;per 1,000&quot;))
dev.off()
</pre>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/andybeger.wordpress.com/180/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/andybeger.wordpress.com/180/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andybeger.wordpress.com&#038;blog=19223690&#038;post=180&#038;subd=andybeger&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://andybeger.wordpress.com/2012/08/25/scale-and-north-arrow-for-maps-in-r/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/749f983a6497902acd74f1377866e385?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">andybeger</media:title>
		</media:content>

		<media:content url="http://andybeger.files.wordpress.com/2012/08/bosnia_munic.png?w=300" medium="image">
			<media:title type="html">bosnia_munic</media:title>
		</media:content>

		<media:content url="http://andybeger.files.wordpress.com/2012/08/bosnia_munic2.png?w=300" medium="image">
			<media:title type="html">bosnia_munic2</media:title>
		</media:content>

		<media:content url="http://andybeger.files.wordpress.com/2012/08/map_dead.png?w=300" medium="image">
			<media:title type="html">map_dead</media:title>
		</media:content>
	</item>
		<item>
		<title>Coding provinces for the Iraq Body Count data</title>
		<link>http://andybeger.wordpress.com/2012/03/21/coding-provinces-for-the-iraq-body-count-data/</link>
		<comments>http://andybeger.wordpress.com/2012/03/21/coding-provinces-for-the-iraq-body-count-data/#comments</comments>
		<pubDate>Tue, 20 Mar 2012 18:34:29 +0000</pubDate>
		<dc:creator>Andreas Beger</dc:creator>
				<category><![CDATA[Data and code]]></category>
		<category><![CDATA[Iraq]]></category>
		<category><![CDATA[iraq body count]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://andybeger.wordpress.com/?p=163</guid>
		<description><![CDATA[The Iraq Body Count project collects reports of civilian deaths, and makes their event data publicly available. Each event gives the date, location, description and civilian deaths associated with an incident. Looking at a few examples [1, 2, 3], you can see that while the data values for the date and deaths are straightforward, the place [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andybeger.wordpress.com&#038;blog=19223690&#038;post=163&#038;subd=andybeger&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The <a href="http://www.iraqbodycount.org/">Iraq Body Count project</a> collects reports of civilian deaths, and makes their event data publicly available. Each event gives the date, location, description and civilian deaths associated with an incident. Looking at a few examples [<a href="http://www.iraqbodycount.org/database/incidents/k18564">1</a>, <a href="http://www.iraqbodycount.org/database/incidents/k18560">2</a>, <a href="http://www.iraqbodycount.org/database/incidents/k18554">3</a>], you can see that while the data values for the date and deaths are straightforward, the place values get a little bit complicated. I&#8217;m looking for the province in which incidents occurred, so the challenge is to associate each place value with a province.</p>
<p>Using the incident data from 2003 to February 2012, about 27,500 records, I&#8217;ve written an <a href="https://github.com/andybega/Code_IBC_Province/blob/master/code_province.R">R script that assign provinces</a> to ~95 percent of the records, 26,000.</p>
<p>Here&#8217;s a basic overview of how it works:</p>
<p><span id="more-163"></span></p>
<p>The process basically consists of four steps:</p>
<ol>
<li>Split the location string into separate words</li>
<li>Identify a candidate word</li>
<li>Strip if of unnecessary parts (&#8220;al-&#8221;, &#8220;&#8216;s&#8221;,&#8230;)</li>
<li>Check against known city-province list</li>
<li>Repeat until a match is found</li>
</ol>
<p>Two functions cover steps 3 and 4. The first strips out the &#8220;al-&#8221; prefix, which just is an article (&#8220;the&#8221;), the possessive suffix &#8220;&#8216;s&#8221;, and a couple of other characters that are in a words sometimes.</p>
<pre class="brush: r; title: ; notranslate">
DelChar &lt;- function(x) {
    word &lt;- sub(&quot;al-&quot;, &quot;&quot;, x, ignore.case = TRUE)
    word &lt;- gsub(&quot;'[s]&quot;, &quot;&quot;, word)
    word &lt;- gsub(&quot;[?,']&quot;, &quot;&quot;, word)
    return(word)
  }
</pre>
<p>The other function takes a word and checks it against a list of known city and province pairs (&#8220;city.prov&#8221;) to see whether it matches either, and if so returns that province name:</p>
<pre class="brush: r; title: ; notranslate">
ProvLook &lt;- function(x) {
  province &lt;- NULL
  repeat {
    if (x %in% city.prov[, 2]) {
      province &lt;- x
      break
    } else if (x %in% city.prov[, 1]) {
      province &lt;- city.prov[(x==city.prov[, 1]), 2]
      break
    } else {
      province &lt;- &quot;no match&quot;
      break
    }
  }
  return(province)
}
</pre>
<p>The rest is a loop over the words in location to evaluate each one. It&#8217;s not very efficient and takes 7 seconds to run over the 27,500 records I&#8217;m using, but it&#8217;s effective enough. Any ideas on how to improve efficiency? I don&#8217;t know enough to have any clue on this.</p>
<p>I&#8217;m pretty happy for now with getting 95 percent of the records, but it would be nice to get all in the future. Here are some ideas for how to do that:</p>
<ol>
<li>Figure out how to accommodate common misspellings. There are certain patterns in how place names get misspelled. For example, &#8220;h&#8221; at the end of words that end in a vowel (Basra, Basrah), and repeated or substituted vowels (Qadisiya, Qadisiyya; Baiji, Baaji)</li>
<li>Accomodate place names that consist of more than one word, like &#8220;Salman Pak&#8221;.</li>
<li>Extent the list of known city-province pairs. Ultimately this could become a brute force solution, although unpleasant considering there are ~1,500 uncoded places.</li>
</ol>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/andybeger.wordpress.com/163/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/andybeger.wordpress.com/163/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andybeger.wordpress.com&#038;blog=19223690&#038;post=163&#038;subd=andybeger&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://andybeger.wordpress.com/2012/03/21/coding-provinces-for-the-iraq-body-count-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/749f983a6497902acd74f1377866e385?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">andybeger</media:title>
		</media:content>
	</item>
		<item>
		<title>Which governments torture?</title>
		<link>http://andybeger.wordpress.com/2012/01/20/which-governments-torture/</link>
		<comments>http://andybeger.wordpress.com/2012/01/20/which-governments-torture/#comments</comments>
		<pubDate>Fri, 20 Jan 2012 00:45:32 +0000</pubDate>
		<dc:creator>Andreas Beger</dc:creator>
				<category><![CDATA[Data and code]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[human rights]]></category>
		<category><![CDATA[ITT]]></category>
		<category><![CDATA[torture]]></category>

		<guid isPermaLink="false">http://andybeger.wordpress.com/?p=143</guid>
		<description><![CDATA[Almost all states, at least at some point between 1995 and 2005. The Ill-Treatment and Torture (ITT) project by Courtenay Conrad and Will Moore codes Amnesty International (AI) allegations of government torture, including the perpetrator, motive, and judicial response. The aggregated, country-year version of their data shows whether AI made allegations against a country in [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andybeger.wordpress.com&#038;blog=19223690&#038;post=143&#038;subd=andybeger&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Almost all states, at least at some point between 1995 and 2005.</p>
<p>The <a title="http://www.politicalscience.uncc.edu/cconra16/UNCC/ITT_Data_Collection.html" href="http://www.politicalscience.uncc.edu/cconra16/UNCC/ITT_Data_Collection.html">Ill-Treatment and Torture</a> (ITT) project by Courtenay Conrad and Will Moore codes Amnesty International (AI) allegations of government torture, including the perpetrator, motive, and judicial response. The aggregated, <a title="http://www.politicalscience.uncc.edu/cconra16/UNCC/Data_files/ConHagMooCY16Oct11.pdf" href="http://www.politicalscience.uncc.edu/cconra16/UNCC/Data_files/ConHagMooCY16Oct11.pdf">country-year version</a> of their data shows whether AI made allegations against a country in a given year and if so, what the extent of alleged torture or ill-treatment was, on a 5-point scale from &#8220;infrequent&#8221; to &#8220;systematic&#8221;.</p>
<p>Here is a video showing the AI torture allegations from 1995 to 2005 using their country-year data and <a title="http://thematicmapping.org/downloads/world_borders.php" href="http://thematicmapping.org/downloads/world_borders.php">shape files for world borders</a> from Thematic Mapping.</p>
<p><div class='embed-vimeo' style='text-align:center;'><iframe src='http://player.vimeo.com/video/35348578' width='640' height='360' frameborder='0'></iframe></div>
<p><a href="http://vimeo.com/35348578">Government torture or ill-treatment, 1995 to 2005</a> from <a href="http://vimeo.com/user10078439">Andreas Beger</a> on <a href="http://vimeo.com">Vimeo</a>.</p>
<p>The initial impression I had from this is the sheer extent of (alleged) torture and ill-treatment. It looks like pretty much all major states engaged in torture at some point between 1995 and 2005. Only 8 out of 151 states had no allegations of torture at all (Costa Rica, Uruguay, Finland, Benin, Gabon, Quatar, Singapore, and New Zealand), and in those remaining states with AI allegations of torture, on average there were allegations for 7 out of 10 years. More than a quarter of states were accused of torture or ill-treatment in all 10 years covered by the data.</p>
<p>That doesn&#8217;t necessarily mean that a lot of torture or ill-treatment is going on in any specific country, nor that it is systematic. It doesn&#8217;t reflect what the specific acts of torture or ill-treatment were, e.g. whether someone was tortured to death or water-boarded (which may not be different). But, nevertheless, unpleasant stuff happens.</p>
<p><a title="https://github.com/andybega/WorldMaps_ITT" href="https://github.com/andybega/WorldMaps_ITT">R code and source</a>. This produces images for each year that I strung together in iMovie.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/andybeger.wordpress.com/143/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/andybeger.wordpress.com/143/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andybeger.wordpress.com&#038;blog=19223690&#038;post=143&#038;subd=andybeger&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://andybeger.wordpress.com/2012/01/20/which-governments-torture/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/749f983a6497902acd74f1377866e385?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">andybeger</media:title>
		</media:content>
	</item>
		<item>
		<title>Yugoslavs die less</title>
		<link>http://andybeger.wordpress.com/2012/01/12/yugoslavs-die-less/</link>
		<comments>http://andybeger.wordpress.com/2012/01/12/yugoslavs-die-less/#comments</comments>
		<pubDate>Wed, 11 Jan 2012 17:03:33 +0000</pubDate>
		<dc:creator>Andreas Beger</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[Bayesian]]></category>
		<category><![CDATA[Bosnia]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[spatial]]></category>

		<guid isPermaLink="false">http://andybeger.wordpress.com/?p=137</guid>
		<description><![CDATA[In 1991 a census was conducted in Bosnia and Herzegovina, which then was still part of the disintegrating federal state of Yugoslavia. Bosnia was the most diverse republic in the former Yugoslavia, with significant populations of Bosnian Muslims (or Bosniaks, 43 percent), Serbs (31 percent), Croats (17 percent), and others. Bosniaks, Serbs, and Croats were [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andybeger.wordpress.com&#038;blog=19223690&#038;post=137&#038;subd=andybeger&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>In 1991 a <a href="http://en.wikipedia.org/wiki/1991_population_census_in_Bosnia_and_Herzegovina">census</a> was conducted in Bosnia and Herzegovina, which then was still part of the disintegrating federal state of Yugoslavia. Bosnia was the most diverse republic in the former Yugoslavia, with significant populations of Bosnian Muslims (or Bosniaks, 43 percent), Serbs (31 percent), Croats (17 percent), and others. Bosniaks, Serbs, and Croats were more or less well-established identities with historical roots. Unlike in most multiethnic countries however, the census respondents also had the option to identify themselves as Yugoslavs, rather than a particular ethnic or national group. It turns out that this played an interesting role in the way violence occurred in the Bosnian War from 1992 to 1995.</p>
<p><span id="more-137"></span></p>
<p>The term Yugoslav came about sometime in the 19th century while the Ottoman Empire still ruled the Balkans, and was associated with attempts to unite all South Slavs in the Balkans (i.e. Croat, Muslim, Serb, and at some point Bulgarians). The Kingdom of Serbs, Croats, and Slovenes, which had formed in the aftermath of World War 1, was renamed Yugoslavia in 1929 after imposition of a royal dictatorship. It was explicitly used in a way opposed to nationalism.</p>
<p>After World War 2 the communists also used the name and technically every citizen of Yugoslavia was a Yugoslav. But six traditional nationalities, Serb, Croat, Slovene, Montenegrin, Macedonian, and Muslim (by nationality), were officially recognized and most people identified with those or other ethnic minorities. The concept of a Yugoslav was kind of a nascent nationalism, similar to the way Bavarians and Saxons eventually came to identify as Germans and the way people from Virginia or Massachusetts eventually came to see themselves as American. Just not quite that accepted.</p>
<p><a href="http://andybeger.files.wordpress.com/2012/01/map_yugo91.png"><img class="alignright  wp-image-133" title="Map of Yugoslavs in Bosnia" src="http://andybeger.files.wordpress.com/2012/01/map_yugo91.png?w=384&#038;h=384" alt="" width="384" height="384" /></a></p>
<p>In the 1991 census 5 percent of people in Bosnia choose to identify themselves as Yugoslav, the highest percentage in any of the former Yugoslav republics. The map not he right shows the percentage of Yugoslavs in each of the 109 pre-war municipalities. It was highest in parts of Sarajevo and Tuzla, upwards of 15 percent, and lower in rural areas in the south and east. The blue line on the map shows the approximate 1993 front lines during the war by the way. (The R code for this map is originally from <a href="http://blog.diegovalle.net/2010/06/statistical-analysis-and-visualization.html">Diego Valle-Jones</a>.)</p>
<p>For ethnic conflict research this is pretty nice since in a way it is a direct measure of anti-nationalist sentiment: you could either identify as a member of your particular ethnic group (or nationality as it was used there), or as a Yugoslav, someone associated with the federal state encompassing all the nationalities. Most people who did so probably had married across ethnic lines or were children who did not fit quite into the traditional ethnic categories (like one of my parents), as well as communists and anti-nationalist idealists. Either way, if extreme nationalism is what leads to violence and &#8220;ethnic conflict&#8221;, then we should expect to see less violence in areas with low nationalist sentiment, i.e. high Yugoslav identification.</p>
<p>This is a pretty particular argument specific to Bosnia and other Yugoslav republics. Conflict research in studies examining more countries than that typically use a summary measure of ethnic diversity that is more generalizable. The most common one is the ethnolinguistic fractionalization index, with ranges from 0 to 1. Higher values indicate more diversity (technically, the chance that two randomly picked individuals will belong to different ethnic groups), and usually are expected to increase the potential for violence.</p>
<p><a href="http://andybeger.files.wordpress.com/2012/01/bubbleplot_yugo.png"><img class="alignright  wp-image-134" title="Bubble chart Bosnia" src="http://andybeger.files.wordpress.com/2012/01/bubbleplot_yugo.png?w=384&#038;h=384" alt="" width="384" height="384" /></a></p>
<p>The same index can be calculated for the 109 municipalities in Bosnia. The bubble chart on the right shows the relationship between this ethnic diversity index, Yugoslav self-identification, and violence. Each red circle is one of the municipalities, with it&#8217;s diversity score on the y-axis (ELF), and proportion of self-identified Yugoslavs on the x-axis. Why are there no circles on the bottom right quarter of the plot? Few people identified as Yugoslavs in ethnically homogenous municipalities. In ethnically mixed municipalities (the top half of the chart), there is more variation in how many Yugoslavs there were. So there were some municipalities that were ethnically diverse but where few people thought of themselves as Yugoslav. We might call these &#8220;nationalist&#8221; municipalities. In others a lot of people identified as Yugoslav.</p>
<p>Now consider that the area of each circle is proportional to the number of deaths per 1,000 in that municipality during the war from 1992 to 1995 (data is from the <a href="http://www.idc.org.ba/">Research and Documentation Center</a>). Few people died in areas where one ethnic group clearly dominated. In ethnically diverse places, the death rate was highest in &#8220;nationalist municipalities&#8221;, and much lower in &#8220;Yugoslav&#8221; municipalities.</p>
<p>One should be careful to infer anything about causes here, especially since the census was conducted a few months before war broke out. It may be that in places where there was a lot of tension, people choose to respond to the census as non-Yugoslavs. Still, it&#8217;s a pretty interesting pattern and shows that although ethnic diversity can be associated with high violence, it doesn&#8217;t necessarily have to.</p>
<p>I&#8217;ve been looking into this stuff as part of my dissertation, and it&#8217;s part of a paper you can find <a title="Research" href="http://andybeger.wordpress.com/research/">here</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/andybeger.wordpress.com/137/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/andybeger.wordpress.com/137/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andybeger.wordpress.com&#038;blog=19223690&#038;post=137&#038;subd=andybeger&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://andybeger.wordpress.com/2012/01/12/yugoslavs-die-less/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/749f983a6497902acd74f1377866e385?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">andybeger</media:title>
		</media:content>

		<media:content url="http://andybeger.files.wordpress.com/2012/01/map_yugo91.png" medium="image">
			<media:title type="html">Map of Yugoslavs in Bosnia</media:title>
		</media:content>

		<media:content url="http://andybeger.files.wordpress.com/2012/01/bubbleplot_yugo.png" medium="image">
			<media:title type="html">Bubble chart Bosnia</media:title>
		</media:content>
	</item>
		<item>
		<title>Running RStudio on Amazon EC2</title>
		<link>http://andybeger.wordpress.com/2011/12/20/running-rstudio-on-amazon-ec2/</link>
		<comments>http://andybeger.wordpress.com/2011/12/20/running-rstudio-on-amazon-ec2/#comments</comments>
		<pubDate>Tue, 20 Dec 2011 01:51:54 +0000</pubDate>
		<dc:creator>Andreas Beger</dc:creator>
				<category><![CDATA[Data and code]]></category>
		<category><![CDATA[EC2]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[RStudio]]></category>

		<guid isPermaLink="false">http://andybeger.wordpress.com/?p=121</guid>
		<description><![CDATA[For the most part I don&#8217;t do things that are computationally so intensive that I can&#8217;t run them on my work desktop. There have been a few times however where I ran simulations or bootstrapped models, and now Bayesian models with MCMC, that take a while to run. One solution has been to run things [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andybeger.wordpress.com&#038;blog=19223690&#038;post=121&#038;subd=andybeger&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>For the most part I don&#8217;t do things that are computationally so intensive that I can&#8217;t run them on my work desktop. There have been a few times however where I ran simulations or bootstrapped models, and now Bayesian models with MCMC, that take a while to run. One solution has been to run things on <a href="http://www.hpc.fsu.edu/">FSU&#8217;s high performance computing cluster</a>. It takes a little bit of effort, for someone without a background in computer science or programming like me, and it is inconvenient in several ways.</p>
<p>An alternative is to use Amazon&#8217;s <a href="http://aws.amazon.com/ec2/">EC2 cloud computing service</a>. A few weeks ago I started playing around with it, and running basic instances for limited time is actually free. I use R/RStudio for the most part, but was overwhelmed by which AMI to use without having to instal R/RStudio. Fortunately someone has created AMI&#8217;s (Amazon Machine Images) with RStudio Server, which, once running, lets you use RStudio through your web browser.</p>
<p>To begin, get an Amazon Web Services account and log in to the management console. <a href="http://www.louisaslett.com/">Louis Aslett</a> has created a number of AMI&#8217;s&#8211;<a href="http://www.louisaslett.com/RStudio_AMI/">choose</a> one and start an instance on EC2 with that AMI id.</p>
<p>If you started the instance with the default security group settings like me, you will also have to open port 80 to get access. Go to the security group settings in Amazon management console, select whichever group your instance runs under (e.g. default), and add a custom TCP rule for port 80 (i.e. port range 80). Add the rule and apply. Find out your instance address (instances, at the bottom, it&#8217;s the string that ends with amazonaws.com, e.g. ec2-184-88-8-888.compute-1.amazonaws.com), paste into your browser and you should get to a RStudio log in. The defaults are &#8220;rstudio&#8221; for both. And, there.</p>
<p style="text-align:center;"><a href="http://andybeger.files.wordpress.com/2011/12/screen-shot-2011-12-19-at-20-47-est-e1324346025459.png"><img class="size-full wp-image-124 aligncenter" title="Screen Shot 2011-12-19 at 20.47. EST" src="http://andybeger.files.wordpress.com/2011/12/screen-shot-2011-12-19-at-20-47-est-e1324346025459.png?w=590" alt=""   /></a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/andybeger.wordpress.com/121/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/andybeger.wordpress.com/121/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andybeger.wordpress.com&#038;blog=19223690&#038;post=121&#038;subd=andybeger&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://andybeger.wordpress.com/2011/12/20/running-rstudio-on-amazon-ec2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/749f983a6497902acd74f1377866e385?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">andybeger</media:title>
		</media:content>

		<media:content url="http://andybeger.files.wordpress.com/2011/12/screen-shot-2011-12-19-at-20-47-est-e1324346025459.png" medium="image">
			<media:title type="html">Screen Shot 2011-12-19 at 20.47. EST</media:title>
		</media:content>
	</item>
		<item>
		<title>The Iraq War, 2003 to 2011</title>
		<link>http://andybeger.wordpress.com/2011/10/22/the-iraq-war-2003-to-2011/</link>
		<comments>http://andybeger.wordpress.com/2011/10/22/the-iraq-war-2003-to-2011/#comments</comments>
		<pubDate>Fri, 21 Oct 2011 19:41:34 +0000</pubDate>
		<dc:creator>Andreas Beger</dc:creator>
				<category><![CDATA[Iraq]]></category>

		<guid isPermaLink="false">http://andybeger.wordpress.com/?p=97</guid>
		<description><![CDATA[President Obama announced today that U.S. troops will leave Iraq by the end of 2011. What has the 9 year long war accomplished? Iraq is a democracy, sort of. Although it has a parliament, prime minister, elections and that sort of stuff, it is also pretty corrupt. There has not yet been a peaceful transition [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andybeger.wordpress.com&#038;blog=19223690&#038;post=97&#038;subd=andybeger&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>President Obama announced today that U.S. troops will leave Iraq by the end of 2011. What has the 9 year long war accomplished?</p>
<p><span id="more-97"></span></p>
<p>Iraq is a democracy, sort of. Although it has a parliament, prime minister, elections and that sort of stuff, it is also pretty corrupt. There has not yet been a peaceful transition of power, a key feature of democracy, as prime minister al-Maliki was reelected to a second term in a troubled election in 2010. It has a <a href="http://en.wikipedia.org/wiki/Polity_data_series">polity score</a>  of <a href="http://www.systemicpeace.org/polity/irq2.htm">3</a> on a scale from -10 to 10, which doesn&#8217;t quite qualify it as a democracy in common usage. According to the Economist Intelligence Unit&#8217;s <a href="http://www.eiu.com/Handlers/WhitepaperHandler.ashx?fi=Democracy_Index_2010_Web.pdf&amp;mode=wp">Democracy Index 2010</a> it is a hybrid regime just a tad above authoritarianism and according to Freedom House it is <a href="http://freedomhouse.org/template.cfm?page=22&amp;year=2011&amp;country=8058">not free</a>.</p>
<p>In 9 years of war, 111,000+ civilian deaths (<a href="http://www.iraqbodycount.org/">Iraq Body Count</a>), and taking into account excess mortality maybe even upwards of 1 million (<a href="http://www.opinion.co.uk/Newsroom_details.aspx?NewsId=120">Opinion Research Business survey</a>) out of a population of 31 million. The Iraq Body Count data is based on press reports of violent deaths and actual deaths are most likely higher. But basically around 1 out of every 50 was killed.</p>
<p>Upwards of 15,00o Iraq Security Forces deaths and around 24,000 insurgent deaths (<a href="http://www.iraqbodycount.org/analysis/numbers/warlogs/">Iraq War Logs summary at IBC</a>).</p>
<p>About 4,800 U.S. and coalition deaths (<a href="http://icasualties.org/Iraq/Index.aspx">iCasualties.org</a>) and 32,000 U.S. wounded (<a href="http://www.defense.gov/news/casualty.pdf">DoD</a>). Also, direct cost of <a href="http://costofwar.com/en/">$800 billion or so</a>. But since most of that was financed through debt, the actual cost will probably end up in the <a href="http://www.washingtonpost.com/wp-dyn/content/article/2008/03/07/AR2008030702846.html">low trillions</a>.</p>
<p>Surprisingly, Iraq has surpassed it&#8217;s pre-war <a href="http://www.indexmundi.com/g/g.aspx?c=iz&amp;v=67">GDP per capita</a>, even during the war. But then, oil prices have risen as well and most of Iraq&#8217;s economy is based on oil.</p>
<p>Weapons of mass destruction. Never mind.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/andybeger.wordpress.com/97/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/andybeger.wordpress.com/97/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andybeger.wordpress.com&#038;blog=19223690&#038;post=97&#038;subd=andybeger&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://andybeger.wordpress.com/2011/10/22/the-iraq-war-2003-to-2011/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/749f983a6497902acd74f1377866e385?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">andybeger</media:title>
		</media:content>
	</item>
		<item>
		<title>Report on COIN intelligence collection</title>
		<link>http://andybeger.wordpress.com/2011/08/12/report-on-coin-intelligence-collection/</link>
		<comments>http://andybeger.wordpress.com/2011/08/12/report-on-coin-intelligence-collection/#comments</comments>
		<pubDate>Fri, 12 Aug 2011 00:35:49 +0000</pubDate>
		<dc:creator>Andreas Beger</dc:creator>
				<category><![CDATA[Iraq]]></category>
		<category><![CDATA[coin]]></category>

		<guid isPermaLink="false">http://andybeger.wordpress.com/?p=77</guid>
		<description><![CDATA[A few months ago the Defense Science Board Task Force on Defense Intelligence released a report on Counterinsurgency (COIN) Intelligence, Surveillance, and Reconnaissance (ISR) Operations (pdf). Among its recommandations is that: the government generally should increase investment in social science disciplines (anthropology, ethnography, human geography, sociology, social-psychology, political science, and economics) to inform a whole-of-government [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andybeger.wordpress.com&#038;blog=19223690&#038;post=77&#038;subd=andybeger&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>A few months ago the Defense Science Board Task Force on Defense Intelligence released a report on <a href="http://www.carlisle.army.mil/dime/documents/Defense%20Science%20Board%20-%20Coin%202011-05-COIN%5B1%5D.pdf">Counterinsurgency (COIN) Intelligence, Surveillance, and Reconnaissance (ISR) Operations (pdf)</a>. Among its recommandations is that:</p>
<blockquote><p>the government generally should increase investment in social science disciplines (anthropology, ethnography, human geography, sociology, social-psychology, political science, and economics) to inform a whole-of-government approach to understanding local cultures and customs and to support future COIN campaigns.</p></blockquote>
<p>and the report notes that:</p>
<blockquote><p>the United States government is not investing adequately in the development of social and behavioral information that is critically important for COIN.</p></blockquote>
<p>Great news for someone hopeful to have a degree in political science in the near future! <span id="more-77"></span>But beyond that, the report deals with larger issues with modern intelligence and information collection by the military. Much of what people in the military think of first in association with ISR, like unmanned drones, battlefield radars, or ground <a href="http://en.wikipedia.org/wiki/Moving_target_indication">moving target indicators</a> (GMTI), was developed in the context of the Cold War with the purpose of identifying concentrations of troops, tanks, and so on. It might be great to see that a hexagonal array of tracks on the ground likely is a specific type of <a href="http://en.wikipedia.org/wiki/S-75_Dvina">surface to air missile site</a>. Knowing that there are a lot of moving ground targets traveling on certain paths in Baghdad probably on the other hand is not so useful. One would maybe rather know about lone targets heading into remote areas (weapons caches?). So a first challenge is figuring out how to appropriately use that type of ISR in a counterinsurgency campaign.</p>
<p>More importantly though, as the report notes, this technical kind of ISR is not well suited to supporting warfare when the population, rather than terrain or equipment, is the focus. Rather, there is more need for the kind of research and analysis that is common in the social sciences, with questions appropriately focused on the local population. In Afghanistan this has been happening under the Humman Terrain System program, which among other things collects local surveys to inform military policy.</p>
<p>Most exciting for me is not only the acknowledgement the people are the key, but a concurrent admission that more rigorous, scientific analytical methods are called for: &#8220;analysts will have to operate with a better balance of regional knowledge and theoretical/methodological competence,&#8221; compared to the current approach based on indication, specific instances of an event and its analogues, and ultimately subjective intuition. Having done some very limited work along the lines this suggests when I was an intelligence officer in Iraq, and seeing how it is usually received, it will take a large change in attitude for a shift like this to be accepted. Unfortunately.</p>
<p>Some other comments:</p>
<ul>
<li>&#8220;there are few effective, temporally‐acceptable methodologies for the integration (or fusion) of current levels of data streaming from the many space‐based, airborne, mobile, in situ, and terrestrial remote sensors, let alone real‐ time integration&#8230;&#8221; Yes! I would love to have a dashboard with choice apps like live video (e.g. unmanned drone), custom satellite imagery, GMTI, &#8230;</li>
<li>In response to a question about emerging technologies and methodologies, the report directly mentions computation social sciences, social network analysis, behavior modeling and simulation, and associated programs like <a href="http://www.darpa.mil/Our_Work/I2O/Programs/Integrated_Crisis_Early_Warning_System_(ICEWS).aspx">ICEWS</a>. Most of the programs I am familiar with are still at a fairly high level, and might for example try to predict which states are under threat of collapse. But there is a lot that can be done at a much more immediate level with the heaps of data collected every day by U.S. military forces in Iraq and Afghanistan, beyond threat-specific efforts like <a href="//www.jieddo.dod.mil/">JIEDDO</a> COIC.</li>
</ul>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/andybeger.wordpress.com/77/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/andybeger.wordpress.com/77/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=andybeger.wordpress.com&#038;blog=19223690&#038;post=77&#038;subd=andybeger&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://andybeger.wordpress.com/2011/08/12/report-on-coin-intelligence-collection/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/749f983a6497902acd74f1377866e385?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">andybeger</media:title>
		</media:content>
	</item>
	</channel>
</rss>
