Update from Chicago

Here’s my much-promised, less-delivered update on Chicago: what we’re doing, how it’s going, and if we’re going to come back.

In reverse order, yes. Sue and I need to leave the country at the end of August, when my visa (but not hers) expires. But gee, it’ll not be without regrets. Escaping an apocalyptic Melbourne winter to come to this great city has been immensely enjoyable. More on that below. So yes, all the rumours are false; we’re not staying on, even if we really want to.

The reason for us making the trip over here is so that I could take up the Eric and Wendy Schmidt Data Science for Social Good fellowship, being run through the University of Chicago. The fellowship itself came about from Eric Schmidt’s involvement in the Obama 2012 campaign, the data-intensive side of which was being run by Rayid Ghani.

So the story goes, Schmidt, who has made a little bit of money running Google, wanted Ghani to put the skills of young data scientists to the public good, and funded the fellowship. In return, Ghani and his team run the fellowship, which brings together some seriously bright folk from around the world (but mainly the US) to Chicago for 12 weeks over summer. These fellows work in groups of four with partner organisations—mainly not-for-profits and government agencies—to lend their skills to help solve some difficult problems.

The team I’ve landed in is fantastic. The members are Diana Palsetia, Pete Lendwehr, and Sam Zhang. Diana is a PhD student at Northwestern who specialises in high-performance parallel computing on large datasets. She’s the sort of person who interjects sparingly, from the back of the room, with extremely useful insights. Pete is a Phd student at Carnegie Mellon University, specialising in ‘advanced computation’, but seems to spend as much time pondering theatre and hunting down excellent coffee beans. He also turns my ideas—good and bad—into deployable Python code. Sam is a brilliant 21 year old who just finished up at Swarthmore, an elite liberal arts college. He’s an all-round hacker, writer and statistician who makes me reconsider the wisdom of having wasted so much of my 20s.

Our team has been partnered with two organisations: the first is Enroll America, a well-funded not-for-profit tasked with getting as many people signed up for health insurance under the Affordable Care Act/Obamacare as possible; the second is Get Covered Illinois, which is being run out of the Illinois Governer’s office, who are attempting to do the same, though only in Illinois.

Both of these organisations have limited budgets, and the same aim—get uninsured people covered. The big question for them is who should they target with their limited resources? There are some people who will never take out insurance no matter how much they’re pestered, and sad as it is, it’s a waste of money trying to convince them to do so. There are other folk who are far more interested in taking out insurance, but have not because, frankly, the system takes a bit of work. And you can always do it tomorrow, right?

There is no shortage of data here. As many of the people working on this problem (mainly on the Enroll America side) also worked on the Obama 2012 campaign, they use similar datasets to those used to find persuadable voters. That means that some of these datasets are quite big—a row for almost every American, with plenty of details (mostly best guesses) on each.

The approach that our team is taking is to build several statistical models to help Enroll America and GCI work out who they should be spending money contacting. The first model gives, for each person, the probability that that person is uninsured. There is no point contacting someone who is insured. What is surprising about this model is that there are a surprising number of people who you’d not expect to have health insurance who do, and so it’s surprisingly difficult to build a good predictive model that sorts out the insured from the uninsured.

The second model tells us how persuadable someone is, given their probability of being uninsured. Thankfully, Enroll America ran a randomised control trial in March, in which they randomly selected a ‘control group’, who they’d not pester during their telephone and email campaign. They then compared this group to a ‘matched treatment’, who were similar folk who were pestered, and compared the differences in insurance rates after the enrolment period ended. The result was quite profound: people who were pestered by email and phone were about 6% more likely to have taken up health insurance.

While the ‘treatment effect’ of being pestered is about 6% on average, the interesting question for our team is working out what the treatment effect must be for an individual person. This is an extremely difficult problem, towards which we have been devoting most of our time. Our current solution is here.

There are other problems that we’ve not done as much work on. For instance, what is the best contact language? Where should tabling events be held? How can we best guess someone’s income (which will determine how large a subsidy they will receive)? These are for the coming weeks.

Sue and Emi have also been busy, making friends in our neighbourhood–right next to the University of Chicago–and spending long days at the beach. Emi has learned to run, and Sue has learned to spot enclosed playgrounds.

A few words on Chicago. Picture this: one in three days in this city you can freeze to death without trying hard. Yet almost 10 million people decide to live in and around the city. Why would so many people make such a choice, surely crazy to the outsider?

I’m not 100 per cent sure, but it must have something to do with the fact that it somehow combines being an extremely large city with a small-town feel. Traffic is no worse—probably better—than Melbourne. Public Transit certainly isn’t Singapore, but is cheap and effective (especially during the rush). The music, theatre and intellectual scenes are full and exciting. The beaches are fun, the food great, and the people are extremely friendly; some combination of northern-Midwestern Nice and Southern hospitality, a remnant of the Great Migrations. Finally—this was unexpected—the summer is delicious. 29C every day, often cooled by large storms at night. Splendid.

The 10 million people who live in Chicagoland aren’t mad. They could live in the Eternal Spring that is Southern California, but don’t. If Southern California had Chicago’s winter, nobody would live there.

Comments (4)

A method of estimating out-of-sample continuous treatment effects using experimental data

The whole point of the below is to prioritise who gets treatment first. Who do we canvass first? Who gets the first dose of medicine? Average Treatment Effects simply don’t help here, and other methods do very poorly if you have only experimental data at a slice in time (so you can’t do difference-in-differences or some other unit-level estimate of treatment effect).

So here’s my algorithm. The high-level objective is to estimate a continuous treatment effect for out of sample observations that is essentially a function of individual-level characteristics. For an observation in the test set, the estimated treatment effect is a convex combination of the (unobserved, but estimated) unit-level treatment effects in the training set, where the weights are given by similarity to the units in the training set.

Here’s a rough outline.

The true causal effect for each observation in the treated group, where Y is the outcome, the indicator before the pipe is the group (t = treatment, 0 = control), after the pipe is whether they received the treatment, and inside the brackets is the unit (i)-level.


Yt|t(i) – Yt|0(i)* = treatment effect on the treated

Y0|t(i)* – Y0|0(i) = treatment effect on the untreated. Stars indicate that we never observe that outcome.

Let’s say that the conditional ignorability assumption holds (so t is independent of unobserved variables), as it does in a randomised control trial. Sid Chib’s method is to completely bypass matching, and instead predict the counterfactual. He does this by building two models F(X(i)) and G(X(i)), where

Yt|t(i) = F(Xt(i)) + e(i) tries to predict the outcome variable for the treatment group

Y0|0(i) = G(X0(i)) + u(i) tries to predict the outcome variable for the control group

and then ‘predict’ the counterfactual by putting the control’s Xs into the treatment model, and visa versa. We could use any predictive model for this step, though the better models should give better estimates.

Yt|0(i)* = F(X0(i))

Y0|t(i)* = G(Xt(i))

We then substitute these fitted values into the relations given at (1) to arrive at a vector of estimated treatment effects for each unit, T(i)*.

Here’s where the method gets fun. Define X as the rbind of Xt and X0. Now we can run a randomForest, rF, to predict the treatment effects given the Xs.

T(i)* = rF(X(i)) + v(i).

The lovely thing about random forests is that one of the outputs is the fantastic proximity matrix. For a sample of N, this matrix is a symmetric N*N matrix where the i,j-th (j,i-th) entry is the proportion of terminal leaves in the random forest that observation i and j share. Now, what does it mean to share a leaf? In each tree, two observations will share a leaf when they fall down the same branches. This will only happen when they are similar in several ways. So a high proximity score indicates that two observations are similar in all the ways that help predict the treatment effect.

Now, let’s say we do a survey and collect some (N1) more folks’ Xs, called X1. The whole point of this is to work out for each of these guys how much our intervention is expected to work on each of them (that is, not on average). In the problem I’m working on now, we want to know who we should pester first to buy health insurance.

Call X2 the rbind of X1 and X. We can then predict the saved rF, and store the resulting proximity matrix. It is segmented into four submatrices

PROX = [A,B;C,D], where A is the N1*N1 proximity matrix of the new observations, B is the N1*N matrix that maps the proximity of the new observations to the training set, C is the flipside of A, and D is the old proximity matrix.

Now take B, and normalise each row so that it sums to 0, call this B*. These are our weights for the convex combination of in-sample estimated treatment effects. The estimated treatment effect for the new observations is

Tnew* = B’ T*


Validation here can be done by k-fold cross-validation. We get an unbiased estimate of the treatment effect for the test set the usual way, then estimate it using the method above. The difference between the (observed) test set average treatment effects and the implied ATE for the test set using the method above should be mean across folds.

Comments (1)

Updated bond yields gif


Comments off

Jahangir Hosseini has not eaten in 42 days

A good family friend, Jahangir Hosseini, has been on hunger strike for 42 days now. His story is below.


At the age of 16, I got involved in politics. I was heavily involved in the Iranian revolution and started an anti-Shah group comprised of young people. In my capacity as the leader of this group, I organised secret meetings and lead political discussions.

I started working for the National Iranian Oil Company (NIOC) in Kharg Island when I was aged 18. I was elected as the leader of the NIOC, Kharg Island when I was 18. I resolved workplace disputes, introduced initiatives such as family counselling, advocated for women’s and worker’s rights and drew attention to the demands of workers I represented.

I fought for the wives of workers to be flown to major cities to access health care and to give birth. I campaigned tirelessly for workers on 12 month contracts to be granted permanent contracts, for their years of service to be recognised and for them to receive superannuation. I also fought for contract workers to have the same leave entitlements as permanent staff and for workers to work 40 hour weeks rather than 48 hour weeks, in line with international standards.

I organised regular strikes which heavily impacted upon garbage collection, power and other things. After seeing how active and strong our union was and after hearing about the victories we achieved, workers from other corporations and businesses at Kharg Island signed their names on petitions electing me as the leader of their respective union and forwarded such petitions to the board that managed Kharg Island. I effectively became the leader of a large coalition of unions.

At the time, Kharg Island was the largest offshore crude oil terminal. The Iranian regime was worried that the export of oil would cease if the workers continued to go on strike and could not afford any disruptions.

When I was 21, I was arrested by the Revolutionary Guards. The Iranian regime arranged for my dismissal from the workplace and made attempts to silence me. They perceived the strikes as anti-revolutionary and regarded my actions as provoking workers. I was told it is not an appropriate time to focus on workplace issues and worker’s rights and that our main focus as a nation should be on winning the war against Iraq.

I was barred from working in the private and public sector, could not pursue higher education, could not access my long service leave or superannuation, was prevented from utilising health services and could no longer play for the state soccer team. I have all relevant documentation including a summons to appear at the revolutionary court.

I was imprisoned for 2 years and was convicted on 11 counts which included my involvement with internal and external political organisations (PMOI), provoking workers, organising demonstrations and strikes and anarchy. I developed high blood pressure and was hospitalised. I fled to Turkey, from hospital, and maintained a secret involvement with the PMOI.

I was soon accompanied in Turkey by my wife and 8 month old daughter. Due to security concerns, we fled to Greece and lived in a refugee camp. The UN recognised us as political refugees.

While in Greece, I was heavily involved in refugee issues and politics. I continued my political activism and organised hunger strikes and demonstrations. I went on hunger strike for 55 days for the purposes of condemning human rights violations in Iran and to draw attention to refugee rights. In 1988, I was on hunger strike for 12 days and called for an end to executions in Iran.

I fought for refugee rights and advocated on behalf of many Iranian refugees, all of whom were relocated to Canada and America. After 18 months, we were the only family relocated to Australia. We arrived in Melbourne on 14 January 1989. We had no contacts in Australia and were taken to a hostel where we had to fight for our rights.

We came into contact with PMOI supporters in Australia and have maintained our political activism until today.

I have organised protests condemning the Iranian government for its human rights violations, advocated for refugee rights and regularly spoke to members of parliament and community organisations about refugee rights, the status of women and human rights violations in Iran. The Age and Channel 10 interviewed me following protests against the deportation of Iranian refugees.

Following the signing of a memorandum of understanding by the Australian Coalition government and the Iranian regime, on 3 June 2003, our home was raided by Australian Federal Police. Six houses were raided in Sydney and five in Brisbane. Ours was the only house raided in Victoria. No one was charged. We were all victims of a dirty deal.

Following the raids in Australia, on 17 June 2003, the French police raided the offices of the National Council of Resistance of Iran (NCRI) and arrested its members including its president elect, Mrs Maryam Rajavi. In June 2003, I staged a hunger strike outside the French Embassy for 15 days. I supported international calls for Mrs Rajavi and NCRI members to be released. This hunger strike took place precisely 14 days after the raid on our family home. The raid did not serve as deterrence and instead motivated me to increase my political activism.

On 19 September 2013, due to my opposition to the 1 September 2013 massacre at Camp Ashraf which lead to the murder of 52 unarmed Iranians and the abduction of 7, I decided to commence a hunger strike. I was the first to start a hunger strike despite being a father to a 3 year old boy, 22 year old daughter and a 27 year old daughter. I have been on hunger strike for 40 days as of 28 October 2013 and have had to deal with cold weather. I have slept in a van after being told on day 12 that we can no longer camp outside Casselden place.

I am mentally strong but physically weak. I made the decision to remain on hunger strike until the hostages are released and am willing to risk my own life. This is the bare minimum I can do for human rights.

Comments off

Treasuries yield curve over the last month



– Updated 20-10-2013

Comments off

The same chart again, but with TIPS yields also


Comments off

US Treasury yields over time


Comments off

WA: in recession?

Yesterday, Matt Cowgill put up an interesting post on the West Australian economy, which appears to be slowing. The ABS doesn’t publish state accounts quarterly, so despite our sincerest desire to know, it’s tough to tell whether a recession—two quarters of negative growth—is going on in any given state.

Cowgill’s solution is to back out an estimate of growth using data that is closely related to growth but released more frequently, like labour-force data. This technique is used widely among applied economists working with time series. For a blog-post, the ‘Okun’s rule-of-thumb’ estimate is probably sufficient to get an idea about the ballpark rate of economic growth in WA. But how would someone with more at stake go about forecasting the figure?

Firstly, I have a small concern with how unemployment maps onto output. If the stories are correct, Western Australian mining is transitioning from an investment activity to a volume activity. Building mines, railways, and pipelines is labour intensive, while operating them is not. Old relationships between growth and unemployment may not be the best way of forecasting future changes.

One solution is to incorporate more high-frequency series into our estimates of WA’s economic growth. I did this using WA’s unemployment, domestic demand, investment, Perth’s CPI, global iron-ore prices, and Australian mining exports. I use a canned forecasting routine from the R package ‘forecast’ to push forward the series to the end of the financial year, then annualised them, stuck them into a basic error correction model, and used the estimates to forecast WA’s year-on-year GSP growth.

Relative to a basic model using just unemployment to describe GSP growth, this model does significantly better, with a root mean squared error of 0.74 per cent, significantly less than the 1.1 per cent that using unemployment alone gives. R squared is about 69 per cent versus 29 per cent for the unemployment model. In all, it’s not too bad for within-financial year forecasting.

My growth estimate for this financial year is quite high, at a little under 4 per cent, with the 95 per cent confidence region going down to two per cent. In the plot below, the black line is the actual history, the green line is the model output, and the yellow bands are the 95% confidence region.



So this tells us something about the annual forecast—growth has been slower but not terrible. How about the quarterly picture? Of course, we don’t have quarterly GSP to use as a dependent variable, so we need to take the parameters estimated in the model above, and apply it to annualised quarterly data. This gives my estimate of quarterly year-on-year GSP growth for Western Australia:




So most of the four per cent of economic growth I forecast for this financial year are from the periods of high growth in the first two quarters of this financial year. With these y.o.y. estimates, it’s entirely possible that current growth in WA is zero or negative.

Data and code are here. 


Comments off

Building synthetic control groups using the proximity matrix of a Random Forest

This post is a short summary of my talk at the Melbourne Users of R Network last week–specifically on the use of synthetic control groups formed by using the proximity matrix of a random forest. Don’t worry if you don’t know what those things are–a plain-English description is below. The slides are available here.

One of the main problems that applied economists work on is working out a ‘treatment effect’ of some policy variable. So we may be interested in whether those who graduate from university (university education being the ‘treatment’) earn more.

A big problem with many of these sorts of questions is that there is selection into treatment; the people who go to university were probably going to make more anyway, and so simply comparing them to those who haven’t graduated is going to be a poor estimate of the true causal effect of university on earnings. What we want to do is compare the treated person to their untreated self.

Randomised control trials (RCTs) achieve this, in the statistical sense. Their beauty is that the (large sample) distribution of personal characteristics (both observed and unobserved) for the treatment and control groups are exactly the same. Those swallowing the sugar pill have the same probability of being 40, being female, or having a pushy mother, as the group receiving the real drug. Unfortunately for science, we’re not allowed to run RCTs for many interesting policies. Randomly allocating some people to higher rates of education and others less may be seen to be unethical.

Due to this constraint, Economists often use natural experiments to achieve the same objective; discontinuities in policy, weather, or geography that randomly assign some people to a treatment group and others to the control, otherwise the two groups should be very similar. However, good natural experiments are rare, normally don’t exist in our data-sets, and apply to a relatively restricted set of interesting policy questions. While a good one will get you tenure at a top US university, they don’t seem to be the secret ingredient in building a better evidence base for good policy.

So what if you still have an interesting policy question, good quality data, but no natural experiment? Thankfully, some methods exist that allow us to construct synthetic control groups that look a lot more like the treatment group than the old control group.

One method that has been very popular over the last decade or so is propensity score matching, popularised by Rosenbaum and Rubin (1983), and Dehejia and Wahba (2002). This method works in two stages:

1. You set up a predictive model of the ‘treatment’ (in this case, university completion), using personal characteristics for the independent variables. Normally you’d use a logit/probit style model for this. And

2. For each ‘treated’ observation, get the untreated observation with the closest probability of having gone to university. You then chuck out the unmatched observations–they don’t look much like graduates anyway–and run your regression on the remaining observations. The ‘treatment effect’ in the regression is, hopefully, now closer to the true value.

This is very easily done in R using the ‘arm’ package. Some code is in the presentation linked at the top of this post.

There are still some big problems with propensity score matching. Smith and Todd (2005) found that the results are not very robust to changes in the propensity model in part 1. While we have constructed a match on the observed variables (that in the data), but we still have no idea whether the treated observation is more likely to have a pushy mother or not. Also, we have no idea about the direction or scale of remaining bias. The method is not magical.

My improvement on this method is to use a more robust measure of similarity to help get over the Smith and Todd critique, borrowing from the Random Forest–a tool widely used in predictive analytics. For a deeper discussion of how these work, see here.

A Random Forest is basically a collection of models, called trees–in this case, they are models to predict whether someone went to university. Each of these trees is estimated on only a subset of the data–ensuring no individual survey respondent or survey question makes much of a different to the outcome. For every respondent, we ask all of the trees (there are sometimes thousands) whether they think the respondent went to university or not, based on their personal characteristics. The winning vote is the ‘prediction’ for the random forest for that survey respondent.

Every one of the trees is constructed in such a way so that the branches divide people according to some characteristic– in this case, gender, or age, or mother’s education level, etc. A branch will grow off the tree only if dividing people according to one of these characteristics results in a ‘purer’ division of graduates and non-graduates. That is, each tree is constructed with the aim of some ‘leaves’ containing only graduates, and others no graduates.

When two people wind up in the same leaf, then we know they are similar in several ways. Importantly, they are similar in the ways that matter to whether they will have gone to university. They are said to be proximate. The proximity score is the proportion of terminal leaves two observations have in common. My proximity score matching routine works like so:

1. Run a Random Forest with the treatment as the dependent variable, and include the pre-treatment independent variables. As random forests are build on randomly subset data, you should set the random-number generation seed if you want your results to be replicable. Also, make sure you save the proximity matrices!

2. Match on the proximity scores, also save the proximities of the untreated observations. I find that using these as weights improves my estimates (in terms of decrease in deviation from experimental estimates).

3. Discard unmatched observations. Make sure you don’t duplicate matches!

4. Run your regressions on the remaining data.

Example R code is included in the presentation linked above.


Rather than matching on the propensity score, I believe the proximity score produces a control group that is more similar to the treatment group than any other existing method, and consequently allows us to produce less biased estimates of the causal effects of policy. In my experiments with this method so far, I have found that:

– Matching on the proximity score results in a more robust matching with the inclusion of extra independent variables than probit/logit methods; and

– Benchmarking against the famous Lalonde dataset, I find my estimates of the causal effect are closer to the experimental estimate than when using propensity score matching

However I should emphasise that if you have a small number of trees, you will have fairly unstable matches. With current memory constraints, it is not feasible to build proximity score matrices on large datasets for lots of trees. So the method is not well suited to large data-sets without re-writing the Random Forest algorithm to iteratively update the proximity matrix.

If you have any experimental/quasiexperimental data you would like to share with me, I’d love to do some more benchmarking of this routine. As it is, I’m 85% sure it’s an improvement on what we have; I’d like to be more sure!

Comments off

Making cool bubble-charts in R: structural unemployment edition

A problem with highly aggregated unemployment statistics is that they mask big differences in the work fortunes of different groups of people. In the ideal world, people made redundant in shrinking sectors can find work quite easily in growing sectors. Unfortunately, that doesn’t appear to be the case—the skills of a worker in a bike shop or shoe factory are different to the skills required to work in a growing sector, like mining or hospitals.

To visualise this, I pulled the ABS’s unemployment data by industry, employment by industry, and the 2006 Census’s education levels by industry, to make this pretty plot. On the horizontal axis we have median quarterly growth in employment from 2001 through 2012 (changing the metric here doesn’t greatly affect the chart). The vertical axis has the median unemployment rate for each industry over the period—again, this is pretty robust to changes in definition. The area of the bubbles represents the amount of employment in 2001. Finally, the colours are darker for those industries with a greater share of workers with a bachelor education or higher.

The code and data to make these plots are here (if you want to make them, you’ll need to change the working directory in the R script).

As I posted the other day, we know there are big differences in the unemployment rates in different sectors, and so it’s not really a surprise to see that unemployment rates tend to be higher in slowly-growing industries. Indeed, the relationship could be spurious: most unemployment observed at a point in time is short-term, though most unemployment (in terms of man-days not working over a period) is long-term. So it could be that we’re repeatedly measuring people just laid off from declining sectors. I’d not bet on that. People in the lagging sectors are less trained than people in low-unemployment sectors, and can’t easily shift industries.

All of this points to something quite sad: while we’ve all heard stories of Cashed Up Bogans in construction and mining making a motza with little  formal education, there are other people with a fairly low level of education who haven’t done so well out of the boom. While their unemployment rates have been quite low over the last decade (especially when we compare them to unemployment rates in Europe or the US), any slump in the future would shift all the circles up—especially the circles with less education. Then, it’s far from clear that displaced aluminium smelter workers will be able to find work in professional services or education.

Comments off

« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »