Building synthetic control groups using the proximity matrix of a Random Forest

This post is a short summary of my talk at the Melbourne Users of R Network last week–specifically on the use of synthetic control groups formed by using the proximity matrix of a random forest. Don’t worry if you don’t know what those things are–a plain-English description is below. The slides are available here.

One of the main problems that applied economists work on is working out a ‘treatment effect’ of some policy variable. So we may be interested in whether those who graduate from university (university education being the ‘treatment’) earn more.

A big problem with many of these sorts of questions is that there is selection into treatment; the people who go to university were probably going to make more anyway, and so simply comparing them to those who haven’t graduated is going to be a poor estimate of the true causal effect of university on earnings. What we want to do is compare the treated person to their untreated self.

Randomised control trials (RCTs) achieve this, in the statistical sense. Their beauty is that the (large sample) distribution of personal characteristics (both observed and unobserved) for the treatment and control groups are exactly the same. Those swallowing the sugar pill have the same probability of being 40, being female, or having a pushy mother, as the group receiving the real drug. Unfortunately for science, we’re not allowed to run RCTs for many interesting policies. Randomly allocating some people to higher rates of education and others less may be seen to be unethical.

Due to this constraint, Economists often use natural experiments to achieve the same objective; discontinuities in policy, weather, or geography that randomly assign some people to a treatment group and others to the control, otherwise the two groups should be very similar. However, good natural experiments are rare, normally don’t exist in our data-sets, and apply to a relatively restricted set of interesting policy questions. While a good one will get you tenure at a top US university, they don’t seem to be the secret ingredient in building a better evidence base for good policy.

So what if you still have an interesting policy question, good quality data, but no natural experiment? Thankfully, some methods exist that allow us to construct synthetic control groups that look a lot more like the treatment group than the old control group.

One method that has been very popular over the last decade or so is propensity score matching, popularised by Rosenbaum and Rubin (1983), and Dehejia and Wahba (2002). This method works in two stages:

1. You set up a predictive model of the ‘treatment’ (in this case, university completion), using personal characteristics for the independent variables. Normally you’d use a logit/probit style model for this. And

2. For each ‘treated’ observation, get the untreated observation with the closest probability of having gone to university. You then chuck out the unmatched observations–they don’t look much like graduates anyway–and run your regression on the remaining observations. The ‘treatment effect’ in the regression is, hopefully, now closer to the true value.

This is very easily done in R using the ‘arm’ package. Some code is in the presentation linked at the top of this post.

There are still some big problems with propensity score matching. Smith and Todd (2005) found that the results are not very robust to changes in the propensity model in part 1. While we have constructed a match on the observed variables (that in the data), but we still have no idea whether the treated observation is more likely to have a pushy mother or not. Also, we have no idea about the direction or scale of remaining bias. The method is not magical.

My improvement on this method is to use a more robust measure of similarity to help get over the Smith and Todd critique, borrowing from the Random Forest–a tool widely used in predictive analytics. For a deeper discussion of how these work, see here.

A Random Forest is basically a collection of models, called trees–in this case, they are models to predict whether someone went to university. Each of these trees is estimated on only a subset of the data–ensuring no individual survey respondent or survey question makes much of a different to the outcome. For every respondent, we ask all of the trees (there are sometimes thousands) whether they think the respondent went to university or not, based on their personal characteristics. The winning vote is the ‘prediction’ for the random forest for that survey respondent.

Every one of the trees is constructed in such a way so that the branches divide people according to some characteristic– in this case, gender, or age, or mother’s education level, etc. A branch will grow off the tree only if dividing people according to one of these characteristics results in a ‘purer’ division of graduates and non-graduates. That is, each tree is constructed with the aim of some ‘leaves’ containing only graduates, and others no graduates.

When two people wind up in the same leaf, then we know they are similar in several ways. Importantly, they are similar in the ways that matter to whether they will have gone to university. They are said to be proximate. The proximity score is the proportion of terminal leaves two observations have in common. My proximity score matching routine works like so:

1. Run a Random Forest with the treatment as the dependent variable, and include the pre-treatment independent variables. As random forests are build on randomly subset data, you should set the random-number generation seed if you want your results to be replicable. Also, make sure you save the proximity matrices!

2. Match on the proximity scores, also save the proximities of the untreated observations. I find that using these as weights improves my estimates (in terms of decrease in deviation from experimental estimates).

3. Discard unmatched observations. Make sure you don’t duplicate matches!

4. Run your regressions on the remaining data.

Example R code is included in the presentation linked above.

Prox_score_match

Rather than matching on the propensity score, I believe the proximity score produces a control group that is more similar to the treatment group than any other existing method, and consequently allows us to produce less biased estimates of the causal effects of policy. In my experiments with this method so far, I have found that:

- Matching on the proximity score results in a more robust matching with the inclusion of extra independent variables than probit/logit methods; and

- Benchmarking against the famous Lalonde dataset, I find my estimates of the causal effect are closer to the experimental estimate than when using propensity score matching

However I should emphasise that if you have a small number of trees, you will have fairly unstable matches. With current memory constraints, it is not feasible to build proximity score matrices on large datasets for lots of trees. So the method is not well suited to large data-sets without re-writing the Random Forest algorithm to iteratively update the proximity matrix.

If you have any experimental/quasiexperimental data you would like to share with me, I’d love to do some more benchmarking of this routine. As it is, I’m 85% sure it’s an improvement on what we have; I’d like to be more sure!

Comments off

Making cool bubble-charts in R: structural unemployment edition

A problem with highly aggregated unemployment statistics is that they mask big differences in the work fortunes of different groups of people. In the ideal world, people made redundant in shrinking sectors can find work quite easily in growing sectors. Unfortunately, that doesn’t appear to be the case—the skills of a worker in a bike shop or shoe factory are different to the skills required to work in a growing sector, like mining or hospitals.

To visualise this, I pulled the ABS’s unemployment data by industry, employment by industry, and the 2006 Census’s education levels by industry, to make this pretty plot. On the horizontal axis we have median quarterly growth in employment from 2001 through 2012 (changing the metric here doesn’t greatly affect the chart). The vertical axis has the median unemployment rate for each industry over the period—again, this is pretty robust to changes in definition. The area of the bubbles represents the amount of employment in 2001. Finally, the colours are darker for those industries with a greater share of workers with a bachelor education or higher.

The code and data to make these plots are here (if you want to make them, you’ll need to change the working directory in the R script).


As I posted the other day, we know there are big differences in the unemployment rates in different sectors, and so it’s not really a surprise to see that unemployment rates tend to be higher in slowly-growing industries. Indeed, the relationship could be spurious: most unemployment observed at a point in time is short-term, though most unemployment (in terms of man-days not working over a period) is long-term. So it could be that we’re repeatedly measuring people just laid off from declining sectors. I’d not bet on that. People in the lagging sectors are less trained than people in low-unemployment sectors, and can’t easily shift industries.

All of this points to something quite sad: while we’ve all heard stories of Cashed Up Bogans in construction and mining making a motza with little  formal education, there are other people with a fairly low level of education who haven’t done so well out of the boom. While their unemployment rates have been quite low over the last decade (especially when we compare them to unemployment rates in Europe or the US), any slump in the future would shift all the circles up—especially the circles with less education. Then, it’s far from clear that displaced aluminium smelter workers will be able to find work in professional services or education.

Comments off

Is Delhi Belly the most influential film of the next 20 years?

Around the end of my first year of university–about the same time I realised that dhal was the most cost-effective food–I started shopping fairly frequently at Aurora Spices, an Indian grocery store in North Carlton. It’s a pretty standard spice store, with the standard collection of spicy chips, dry-tasting sweets, spices, rice, mortars-and-pestles, beedis, and two-for-one Bollywood films.

Once while picking up the weekly supply of fresh curry leaves, I bumped into a family friend, a south-Indian tablist. He asked me if I’d seen any of these films, and spoke pretty highly of some of the ones on display. Politely, I accepted his suggestion that I should watch one at his recommendation. He pressed a case into my hands: Devdas.

I went home, and watched it. And again within the week. This was a wonderful film: the music was tight, the women beautiful, the story epic, and even the normally terrible Shahrukh Khan was bearable. By the stage I saw it, it was also a few years old. If this movie could exist without me finding out, what else could be out there? I thought.

I was hooked. Within a few months, I’d watched hundreds of Bollywood movies. All I wanted was to replicate the sensation of seeing Devdas the first time. Most were terrible. There were a few exceptions: the cheesy but delightful factory-jobs put together by the Yash Raj company, or the slightly-edgier films starring Abhishek Bachchan. But on the whole, I was serially disappointed. The phase tempered, and I (almost) moved on.

Since then, I’ve perhaps seen five or ten Indian films per year. I have no particular interest in Indian culture, language, or history, but find keeping my knowledge of Bollywood reasonably fresh opens more doors than it closes. It’s also a cheap way to keep (cheaply) entertained.

Imagine my surprise, then, when I finally got around to see Delhi Belly a couple of months ago. I’d seen the advertising beforehand, but knew not to get too hyped up by any poster in a spice store (especially after the horror of being subjected to the hyped-up Ra.One last year). The assistant at the subcontinental DVD store on Sydney Road made me buy it. “Dont’ worry, this one is very funny”, he told me. He has burned me before, having given similar praise to the horrid Jodhaa Akbar, but he seemed more genuine this time.

Delhi Belly is about three mid-late-20s flatmates in central Delhi. One is asked to deliver a parcel for his girlfriend, an air-hostess, on behalf of a (presumably) Russian gangster. A second flatmate has Delhi Belly, and so asks the third flatmate to deliver a stool-sample to the doctor. Predictably, the parcels are switched, and the stool sample is sent to a local crime boss.

To say the film is “not Bollywood” is an understatement. It runs for 90 minutes, has a single tune (think Alice Cooper with blazing sitars and pelvic thrusts), is mainly in English, and is very, very vulgar. It has much more in common with Snatch than Dhoom. Above all, though, it’s funny in the Hollywood block-buster sense. Pant-wettingly funny.

Then there’s all the muck. Most Indian films are shot through a rosy lense; the characters of Devdas live in stained-glass palaces; even the Delhi 6 of Delhi 6 looks habitable (oh! the Culture!). Delhi Belly doesn’t tease in this sense. The inner suburbs of developing mega-cities are full of terrible apartments, plumbing problems, and crappy stores selling crappy things. Young professionals there don’t get to live in luxury condos—they share smelly apartments. Delhi Belly makes sure that you, the viewer, is left with no doubt how grubby it all is. It’s so refreshingly honest.

So why do I think this could be an incredibly influential film? To main reasons.

First, it shows that there is sufficient talent in India to make a first-rate Western-style blockbuster comedy for $5M. With a better foreign release strategy I have no doubt it would have been a box-office hit abroad. Foreign film producers in search of high potential returns should be looking at how to replicate Delhi Belly, but this time market it better abroad.

Second, let’s be honest: when was the last time you cried laughing at a non-US/UK-made film? Of course there are exceptions, but in general, foreign comedies aren’t very funny. This could be a volume-effect. Most US comedies aren’t funny either, and it’s their number that results in a couple of good ones floating to the top. But now that Delhi Belly has broken a few barriers (and made quite a bit of money doing so; it’s box-office return was around 300%), I expect that at the very least we will see plenty of Indian copycat productions. Indeed, just as Bollywood has inspired filmmakers all over the (rest of the) world, I expect this film and the coming rip-offs to similarly spawn good old-fashioned toilet-humour comedies in foreign cinema. That’s exactly what we need.

A solid four stars.

Comments off

Changing survey factors to names in R

Lately, I’ve been doing a fair bit of work on some large survey data-sets, and have had a recurrent issue: The survey reports factors (Are you married? Single? Divorced? etc.) as numbers. R then automatically treats these as numbers, which has little meaning.

The basic transformation you can make is simply

df$variable <- as.factor(df$variable)

though then you get ugly regression output. It will say “marital=1; marital=2; etc”, which makes it difficult for people who don’t have access to the survey documentation to interpret. As I’m running hundreds of regressions on many surveys, I simply don’t have time to TeX them all up and change these into words.

My work-around, then, is fairly simple. First I define a list (basically a key-pair dictionary)

marital.factors <- list(single = 1, married = 2, divorced =3, etc.)

then simply match the content.

dataframe$marital = names(marital.factors[match(dataframe$marital, marital.factors)])

My regression then automatically treats the variables as factors, and gives me regression output that has some meaning.

Comments off

Naming in loops in R like in Eviews

While Eviews may be a more limited system than R overall, it has a couple of really cool features. One, which I have found very useful, is the ability to put strings on the left hand side of an assignment. This allows the user to generate objects with names obtained from a string list. This is useful when you want to do the same thing for data in lots of different industries (treating them seperately).

For example, you could define a list of industries:

%industries = “Ag Manu Services”

then run a loop, generating, say, nominal values

for %i %industries

series nominal_output_{%i} = real_output_{%i} * price_{%i}

next

which generates appropriately named series for each industry. This is considerably more difficult in R, at least to my knowledge. Here’s my workaround:

industries <- c(“Ag”, “Manu”, “Services”)

for(i in 1:length(industries)){

a <- industries[i] # a becomes a string variable containing, sequentially, the industry name

real <- paste(“real_output_”,”a”, sep = “”) # real becomes a string obtained by concatenating “real_output_” and the string

real_series <- eval(parse(text = paste(“price”,”a”,sep=”"))) * eval(parse(text = paste(“real_output_”,”a”,sep = “”))) # eval(parse(text = … is a way of evaluating a string as the name of an object, so here we generate a series called “real_series” which contains the relevant value for that loop

assign(real, real_series, inherits = TRUE) # and here we assign the series to the name.

}

 

It’s a pretty ugly workaround (and what I’ve found in my short stint of being an R coder is that when something’s not elegant, there already exists a better way, or you’re trying to do something that you shouldn’t be trying to do), but it works. If you do know of a better way, please let me know!

Comments off

Why aren’t Mexican managers as good as managers in the States?

Mexico City’s Buena Vista Walmart is the first super-market I’ve ever seen paralised by a trolley-jam. It is not more busy than any other large city supermarket, and not understaffed. No: the reason it takes ten minutes to walk from one side of the store to the other every single night is because they have no baskets—only large trolleys. To my sample of Mexican Walmarts, this is unique; most Mexican Walmarts have trolleys and baskets. And so the error must have been made by the managers of that particular outlet. Of course, this is only a small example of inept management in Mexico, but it’s sadly representative—labour productivity in Mexico is about a third of that in the US. If Mexico were as well managed as the US, it would be as rich.

This all raises a few questions: why are managers in Mexico (and the developing world in general) so terrible? why are those in the US so good? and how much of the difference is able to be affected by policy?

Management in Mexico

There are many well managed businesses in Mexico, and they do very well. This suggests there are large returns to good management, and we should expect, over time, well managed businesses to displace poorly managed businesses. To an extent, this has occurred already; Walmart and Appleby’s have slowly usurped street markets and cantinas. But the pace of this transition is slow, and still most of the businesses most Mexicans deal with are run terribly, exacerbating poverty. I attribute this poor management to three causes: Mexico’s caste system, nepotistic/senioric hiring, and its abysmal school and university system.

First, Mexico actually has a caste of idle rich, which surprised me. In Australia, it’s typical that small business owners (and most large business owners) work harder than any of their employees. In Mexico however, it seems the entire point of owning a business is to avoid any painful task.

To characterise, there are some jobs which appear to be below Mexico’s entrepreneurs, and so potentially labour-saving or product-improving changes to businesses are either not realised or not acted on. In the restaurant I worked at for 3 weeks, the owner never looked inside the fridge. Had she, she’d have discovered motzarella, stored on top of raw pork sausages, which were placed directly on top of (not above: on) raw lettuce. This didn’t bother the kitchen workers—generally poor and uneducated—and so nothing changed. I lost 3kg in a day after eating from that kitchen.

So if the business owners don’t want to work, why don’t they just hire a good manager, and incentivise them with a suitably-worded contract? This problem confused me, until hearing an anecdote from a former classmate who now works in a Mexican central government department.

All public-sector job openings in Mexico have a selection process that involves, as the second-last hurdle, completing an exam. Of course, preferential hiring always finds a way: on helping to fill a senior position, my former classmate, who had worked with the preferred candidate, was instructed to write an exam that only the pre-chosen candidate could pass.

We sadly expect corruption like this in the public service of poor countries, but not in private enterprise. However we still see well-run businesses (like Walmart) hiring incompetent managers, and then not firing them. My suspicion is that much of the cause for this is the low level of security, which has begot Mexicans’ low levels of trust for those with whom they’re not acquainted. They hire an incompetent manager not because they know the manager is good, but because they know he’s not a thief.

Of course the first two reasons can be remedied by educating people. Caste can be dissolved when there’s some prospect for dish-pigs to become hotelliers, and hiring a stranger can be easier when you know that their education probably indicates competence. But on almost any metric, Mexico’s education fails. In the OECD league tables, only Brazil and Indonesia do worse. It’s not poverty—in the 90s when Korea was as well off as Mexico (it’s now twice as well off, which is telling), it still did better than the US. Of course, if children aren’t being taught to do difficult things well, universities have to pick up the slack, but in this respect, Mexico still lags: only a handful of universities educate their students to anything near a world standard. So there’s probably little reason to expect a new generation of world-class managers.

Management in the US

On returning from holidays in the US, Australians usually rave on about two things: terrible coffee served by excellent waiters. These two things are really a symptom of the same phenomenon: Americans’ desire for well-priced consistency. This is centuries old: economic historian Nathan Rosenberg points out that the great immigration westward was facilitated by cheap (not fancy) balloon houses; that the conveyor-belts of early factories were more standardised than abroad, lowering down-time when piece of machinery broke; while on the Continent skilled craftsmen made excellent, expensive things, the uneducated of America produced, by the million, less-good, less-expensive things. There is no way of producing many good, cheap things without deliberate management.

While much has been said about the Taylorism of American management—the reduction of food making to a production-line job requiring little training, for instance—I’ve seen little written about the accompanying phenomenon: the desire of American managers to turn their business into an institution. It may be gimmicky, but it seems to work. In Seattle WA, fishmongers throw fish at each other in front of tourists’ cameras; in Bakersfield CA, the town’s most popular Italian restaurant has hundreds, no thousands of photos of footballers and old newspapers pinned up on the wall; in Austin TX the best tacos are the Korean ones; LA’s favourite fast-food burger joint has an off-menu menu; and all through the country, evening road-trippers pull in at Denny’s diner to be greeted “Good morning, and welcome to Denny’s”.

The fact is that managers of businesses in the US take their role as managers far more seriously than anywhere else I’ve spent any time.

The Role of Public Policy

If I am attributing the differences in productivity between Mexico and the US to private-sector management, then what role policy? We surely can’t force businesses to be better managed. But there are some public policies (or lack of) in Mexico today which, if continued, are likely to perpetuate poverty.

- Mexico has an abysmal education system, particularly in fostering critical thinking and mathematics. It also has a culture of ‘rule-of-thumb’ (as do many relatively uneducated countries), and so getting people to accept new ideas is incredibly difficult relative to the US. Unsurprisingly, the Mexican education system is terribly corrupt, with tenured teaching jobs effectively sold or bequested, and La Maestra, the head of the the Union, being one of the most powerful political players in Mexico. Short of a very costly industrial war, there does not seem to be an easy way out of this equilibrium.

- There is no Productivity Commission in Mexico. There are plenty of great economists, especially in Mexico City and Guanajuato, but neither Federal nor (more powerful) state governments give them toothy roles in government. In Australia, where we have a very well respected Productivity Commission, commissioned reports serve as talking point documents for both sides of parliament. The result ends up being that on both sides of parliament there is nominal acquiescence to themes of productivity, especially over the long-run. Even union bosses aren’t arguing for a return to the bad old days.

- Mexico’s informal economy and poorly-developed banking system go hand in hand. Because so many businesses work solely in cash, they have no financial history able to be verified by banks: a prerequisite for any serious lending.

This leads to incredibly low deposit and loan rates. According to MarketWatch, in 2010 total bank deposits were about 15 per cent of GDP, compared with over 100 per cent of GDP in Australia. There is little surprise that small businesses–even the ones that are more productive–fail to grow if credit market access is off the table.

This is partly the fault of Mexican banks, which, despite being private institutions, are ridiculously poorly managed. Electronic banking is hardly used (for payments, salaries, anything!) and queues at major banks can be several hours long. Forcing businesses to formalise could only work if banks come up to speed, and if bad policies like deposit taxes (!!!) are removed.

While Mexico’s poverty is almost solely a result of poor management (even within the private sector), there are some policies which affect productivity in the long-term. I’d be interested to hear your ideas.

Comments (3)

How to better do performance management?

Many large organisations use internal “performance management” “systems” to check up on and rate their staff. Unfortunately, this task isn’t easy: managers typically observe only a part of an employee’s work, and so can only give a biased appraisal. Most managers know this, and so ask colleagues for their thoughts on other colleagues. This process has, in the management textbooks (and probably on the Weasel Words website), become known as 360 Degree Feedback.

To some, this process causes undue stress, and in particular teams or organisations, may be unproductive; people generally don’t like being made to bitch about their colleagues (even those who voluntarily do it!), perhaps because they themselves don’t like the idea of other people bitching about them.

While monitoring staff is an essential function of management, there must be an easier way. Below I describe a method of collecting the relevant data, without the potentially disruptive side effects. It is based on a Hamming code, a type of linear error-correcting code used in information theory.

First, I contest that the only thing that really matters in a monitoring context  is whether someone is below an acceptable threshold or above it. It is obvious when someone is clearly above a threshold (at which their wage is set), because they will ask for a promotion or leave.

This makes the job easier—all we need to do is rate people on whether they can or can’t do certain aspects of their job. And so the process works like this:

1: Each person in the organisation nominates a list of people they feel able to give an assessment on.

2: Management creates a list of criteria, which needn’t be applicable to all levels of the job function. There are always two wordings of each criteria.

3: Each person then completes a questionnaire, in a room by themselves. This questionnaire gives three names (which have been randomly drawn from the list provided at step 1), and a criteria to assess the three names against.

4: The person then has to write whether there is an odd or an even number of people in the list of three who satisfy the threshold.  They do not specify who they consider competent, nor how many.  This may be replaced with a question asking how many of the three satisfy the threshold, and this improves quality slightly.

5: The manager ranks each person for each of the necessary criteria.

By Hamming’s 1950 insight, it turns out this is sufficient to determine with an arbitrary degree of accuracy (for large groups) who exactly is incompetent. By auditing employees on several characteristics, it’s possible to then create a ranking of employees. This ranking should, in general, match up with the pay and hierarchy of the organisation. That is, people in more senior positions should not be failing on criteria passed by junior employees, while junior employees should not be showing “strategic leadership”.

Comments off

The ten blogs you probably don’t read and probably should

I am a hopeless addict of economics blogs. So I thought I’d compile a short list of blogs that are relatively un-read, despite being better than the blogs which seem to get a lot of attention.

  1. David Warsh’s Economic Principles. Warsh has about a thousand subscribers on Google Reader, versus forty-five thousand for the drivel produced by Brad DeLong. His posts are long, well written, and, importantly, not written by Brad DeLong.
  2. James Hamilton’s and Menzie Chin’s Econbrowser, which ahs 3000 odd subscribers relative to the 70,000 had by Paul Krugman. Unlike Krugman, this blog has nice, considered long posts which consider the data above politics. It is also the best econ-blog.
  3. Andrew Gelman’s blog, which has 4000 subscribers relative to Freakonomics’ 13000. While this is not strictly an econ-blog, it contains more thought than most of them, and has a lively, well-behaved considerate comments section.
  4. John Cochrane’s new blog, which has a thousand subscribers. It’s perhaps a bit too early to tell, but the posts are long, fairly well considered, and, except for one spat with DeLong, have largely avoided mudslinging. Cochrane’s a pretty serious character in the field, and I really hope the blog turns into him imparting his wisdom on the readers.
  5. Austin Contrarian is a great little blog (130 subscribers) on town-planning issues, especially considering the intersection between urban economics, local politics, and law. These issues actually affect our lives far more than what DeLong writes about, and so you should subscribe!. Unfortunately, it’s been a bit quiet lately, but I’m sure with enough pestering, Chris will blog frequently again!
  6. Harry Clarke’s blog. Again, another small blog (130 subscribers) though highly respected in the Australian econ-blogging community. His longer posts on environmental econ are especially good.
  7. Andrew Norton’s blog. Andrew has a small but loyal following in Australia. His posts are data-driven posts, primarily on higher education. Definitely worth reading.
  8. Zero Intelligence Agents. Drew Conway is a PhD student in political science, who seems to have become an R addict in the mean-time. I relate to this, as it’s also become my addiction.
  9. VOX. I really can’t believe this only has 6000 subscribers on Google Reader. Hundreds of authors, thousands of posts, all driven by the latest research. Kind of like a repository for extended abstracts.
  10. ?

Comments (3)

Utilities: The low-hanging productivity fruit

Below I outline an idea to address the productivity growth shortfalls in the electricity, gas, and water industries recently experienced in Australia. Especially given the likely increases in electricity and gas costs from the carbon price, now is a good time to reconsider the structure of the retail utilities market.

—————-

In December, Martin Ferguson announced that the Productivity Commission would be commencing an investigation into electricity network regulation. This move is welcome: after large gains in productivity growth following partial or full privatisation in the 1990s, water, electricity and gas distribution have seen their productivity growth rates plummet.

“In electricity, gas and water supply, the level of productivity fell by around one-quarter this decade and, though this fall is largely inscrutable, it is clear that productivity improvements from streamlining workforces and running capital closer to capacity have run their course.” –Dolman, B., 2009

Such a slowdown affects Australians far more than the much-publicised decline of Australian manufacturing. While manufactured goods can generally be imported, our utilities can’t, and so increases in productivity leads to equivalent real price decreases.  Because we haven’t seen productivity increases in utilities as we have in other consumer-industries (been to Aldi or Costco lately?), this means that the real cost of utilities has been increasing recently, and now takes a larger proportion of household final consumption expenditure than in any time in the last half century.

Source: ABS, 5204 Table 42—Household Final Consumption Expenditure (including imputed rents). Data construction is (Water and sewerage services + Electricity and gas services) / HFCE (all current prices).

Of course, this increase could be driven by an increase in relative prices of utilities to other consumption goods, or an increase in the amount of utilities demanded to other goods, or both. However, the data tend to suggest that the large increases in costs are due primarily to price increases (see figures below). While this recent period has also seen large increases in input costs (due to a commodity price surge and a drought), the same forces were at work before the recent increase in prices, which leads me to suspect there’s something else going on.

Source: ABS, 5204, Table 42. 

There are reasons to suspect that there are economic rents made in the utilities rent business. These rents occur because there are fairly real transaction costs in account switching providers. That’s why second-tier retailers (ie. the retailers who don’t own wholesale or distribution businesses) send people door-to-door trying to get you to switch. If rents do exist in this industry, then there are productivity and welfare gains to be had by redesigning the retail market in order to eliminate them.

The cost of choice?

In a paper investigating different models of retirement pensions, Peter Diamond (1996) looks at the Chilean pension model. In the early 1980s, Chile privatised its pension in a system (resulting in a system similar to Australia’s Superannuation system, only with tightly regulated private fund managers competing to manage the retirement savings of workers). While their returns were good, their management costs were excessive—up to about 3% of the average annual income of an employee. The overheads of these firms were so high partly because the returns to marketing were so high—if you didn’t market, you didn’t get any funds to manage, and you went broke. This means higher fixed costs: Chilean pension-fund managers had 3.5 salespeople for every 1000 accounts, whereas the total employment of Social Security was 0.5 people per 1000 accounts.

Diamond’s more recent insight is that this excessive ‘cost of choice’ can be remedied by the state taking a somewhat more active role in market design. His proposal is for the government to aggregate groups of people with similar levels of risk-tolerance, and run a closed-envelope tender for the business of managing the groups’ pooled funds (which simply take some combination of an index and government bonds, depending on the risk profile of the portfolio). Because a) the cost of managing an index is minimal, b) the marketing costs of this type of operation are the cost of submitting a tender, and c) the price elasticity facing any firm becomes very high, the equilibrium management fees for this sort of retirement plan become very low: the TSP, which runs in this fashion, had management fees of 2.5 basis points in 2011; Australian Super had 16-79 basis points + other expenses.

Diamond extended this idea to insurance in the early 1990s: you could group people together, geographically, in large enough groups to eliminate selection problems, and offer their collective business (well enough defined) to the cheapest insurer, you would see large falls in insurance prices.

But what about choice?

Choice is good to the extent that it offers people the quality they want at a price that is reasonable to them. However, with homogeneous products, there is less of a quality/price trade-off.  This means that, if a reduction in choice of contracts for a homogeneous commodity is accompanied in a reduction in prices, there exists a number of (reduced) choices which can leave everyone’s welfare improved. One only need to choose the contract which is closest to their ideal contract, and, so long as the person’s preferences aren’t odd, the reduction in price should offset or more than offset the decrease in utility that occurs by the person not being to purchase the perfect contract.

The flip-side of this is that the fewer choices offered, the greater the saving must be to compensate people (in utility terms) for the reduced choice.

So what to do about utilities retailing?

IF there are large marketing costs in utility retailing, or there are rents in utilities retailing, or both, then prices could almost certainly be decreased by adopting a geographic-pooling scheme.

This would involve drawing squares on a map, and designating each of these squares a ”pool”. The members of each pool would then elect a contract “type”, roughly equivalent to the sorts of choices your electricity provider would provide you. The information on aggregated types for each geographic pool would then be sent to the retailers, who could then nominate a price for service which would bind for the following 6 months. Towards the end of the period, the firms could re-contest the pools.

The result of this would be that utility retailing firms would not need a marketing division, and would simply need to offer the lowest price in order to get business. Their monitoring costs would also be reduced, as metering agents would not need to travel as far (every building in the pool would be a customer, rather than some smaller share).

If, in period, there were things like black-outs or water interruptions, the firm would loose the ability to re-bid for the pool’s contract the next period.

The potential problems would lie in the potential for collusion, and the problem of introduction. The returns to collusive behaviour would increase as the potential spoils increase, and so this would need to be monitored. The problem of introduction would be that there are pre-existing contracts made between retailers and households. I don’t know anything about contracts, and so I’m not sure how this would be remedied.

Diamond, P., Proposals to restructure Social Security, The Journal of Economic Perspectives, vol 10, no. 3, 1996

Dolman, B., What happened to Australia’s productivity surge, The Australian Economic Review, vol  42, no. 3, 2009

 

Comments off

Some productivites are more equal than others: Why subsidising the auto industry is a worse idea than I thought yesterday

Overview:

In the long run, standards of living are almost entirely determined by productivity—the ability to produce more with the same resources.  However, sometimes “productivity improvements” are used as an excuse for bad policy. These types of productivity improvements, while improving productivity, don’t actually improve the welfare maximising consumption possibilities of the population over the long run.

————-

Differences in levels of productivity explain most of the difference in living standards between poor countries and rich countries. However, the way in which it is commonly measured may give politicians the wrong idea when they use public funds to promote pet industries in the name of “improving productivity”.  This is because there is an inherent difference between the productivity measure reported in the National Accounts and that which describes the improvement in “consumption possibilities” (sorry, terrible jargon) of the population. This insight is due to William Nordhaus (2000).

How do we measure productivity?

Productivity is simply the amount of output of a firm or economy, divided by the amount of inputs. A commonly used measure, which I alluded to yesterday, is “labour productivity”, which is total output divided by the amount of labour used (we normally use an estimate of the number of hours worked for this).

It makes sense that improving productivity should improve living standards. Say I am living on an island, and I eat only corn. In the first year I produce 40 bushels of corn; in the second year I produce 50. It’s clear that I am able to eat more the second year. This seems to be how politicians think about productivity, which is fare, as this is how it’s published by the ABS also.

There are three main problems with carrying the island analogy across to an entire economy. The first is that in a real economy, we have many different products and industries, and these change over time. The second is that we trade with foreigners. The third is that while improvements in productivity are more feasible in some industries than others, “feasible” isn’t the benchmark which should be used.

The first one and a half of these problems is what Nordhaus discusses in his paper, which I encourage interested people to read.

The implication of there being many sectors is that “productivity”, as measured by the ABS, can be attributable to:

1) The pure productivity effect. This is when firms produce more with the same amount of inputs. You measure this by holding industry shares constant, and looking at the increase in productivity. Some people call this the “Shift effect”.

2) The so-called Baumol effect. This is when, although no industries improve their rates of productivity, a relatively productive industry increases in size by a greater amount than an unproductive industry. To measure this, we hold the levels of productivity constant and look at the difference in industry shares, which is why we call this the “Share effect”. William Baumol gets the naming rights as he noticed that industries with relatively high productivity growth seemed to grow faster than industries with low rates of productivity growth, and this improved the total productivity of the economy.

3) The Denison effect. This happens due to an interaction between the first two. You can think of it as the pure effect of having some people leave the unproductive sector and start working in the more productive sector. If we say that both sectors grow at the same rate, this means that the average productivity of the remaining workers in the unproductive industry is higher, as is the productivity of the workers who left to work in the productive sector (even though the average productivity of those in the productive sector has decreased). The net effect is the Denison effect.

The next problem raised by Nordhaus is that if you want to measure productivity as it actually affects livelihoods you shouldn’t look so much at production as you should at consumption. While this can’t be wholly divorced from the trade argument I will make below, I will try.

Let’s suppose for a sec that the country does not trade, and that all industries produce final consumption goods. Then which components of productivity above should we include in the “welfare maximising” productivity measure? The reader should realise that the first two (the pure productivity and Baumol effects) both imply that the country makes more stuff without more effort. As such, they improve welfare. The third effect occurs due to differences in inter-industry productivity, which may be due to heterogeneity in the inputs. To the extent this is true, we’re not comparing apples with apples; the inputs we use in a sector with vastly lower productivity aren’t likely to be the same inputs we use in a more productive industry. Consequently our welfare maximising productivity growth does not include the Denison effect (Nordhaus’s argument is slightly more nuanced).

For simplicity, the model used by Nordhaus to deduce the above relies on there being perfectly competitive markets. One of the main consequences of that assumption is that the producers produce consumption goods exactly in concordance with the desires of consumers. In real life, things ain’t so simple. If governments encourage certain types of production (by subsidising) and discourage others (by taxing/imposing tariffs) there is no reason to expect the levels of production of goods to match up with what people want to buy.

The unfortunate result of policies like the car subsidy package is that, while they are perhaps inspired by claims of higher productivity in heavy industry, they resulting changes in productivity are largely due to the Denison effect, and so don’t improve productivity in a meaningful way as much as advertised. That is, the policy doesn’t lead to higher consumption in cars in Australia, and almost certainly reduces the consumption of other things. This becomes a stronger argument once we accept that we trade with foreigners.

So what does trade imply?

A difficulty is that some sectors are able to improve in productivity, while others cannot. This latter category includes the likes of hairdressers and string quartets—non-tradable goods and services with almost no scope for productivity changes. Of those sectors able to improve in technology, we can break them into non-tradeable and tradeable sectors.

In Australia, there are some sectors like banking, telecoms and other utilities who have had strong growth in productivity, and are almost untraded. As a paper by my old boss Ben Dolman points out, there is still much scope for improvement in these sectors (by international standards). Because gains in productivity in these sectors are reaped by Australians, productivity improvements in these sectors are very likely to lead to welfare improvements.

Other industries, like mining and manufacturing, are also traded, and there is also scope for improvements to productivity. In mining, Australia’s productivity is above world norms (though this is likely a remnant of our coal being at surface rather than deep underground). However, in manufacturing Australian productivity is about half the US’s. This likely due to scale (see table below, from Dolman’s Productivity Commission working paper).

Australian wages are in exchange-rate weighted terms about the same as the US’s. American workers are also better trained, and have more capital per worker. So why would any foreigner in their right mind buy an Australian manufactured good over an American one? It will almost certainly be more expensive, and equal or lesser quality. To close this price differential, Australian manufacturers would have to more or less double their productivity—and almost impossible task without the scale possible in the US.

Put a bit more plainly, this means that the only way that a policy like car subsidies would improve Australian welfare is if it improved productivity by so much that Australian cars weren’t so expensive by world standards, and this resulted in more exports, and this allowed us to buy at lease one more of the goods that we gave up in order to subsidise the car industry.

But industry policy aimed at boosting Australian productivity (the good kind, not the fake kind) needn’t be so difficult. Simply reduce bottlenecks in infrastructure impeding the industries which do already export—like mining. Each dollar invested there will improve Australian welfare by multiples of the dollars used to make V8 sports-cars nobody wants.

 

–Nordhaus, W., “Alternative methods of measuring productivity growth”, Cowles Foundation Discussion Paper 1282, 2000

–Dolman, B., Parham, D., Zheng, S., “Can Australia Match US Productivity Performance?”, Productivity Commission Staff Working Paper, 2007.

Comments (1)

« Previous entries Next Page » Next Page »