Fun with Point of Sale data from Rex Tremendae

[Edit: charts below updated, with feedback from Tim Cameron.]

About a year ago, my brother and I signed our first commercial lease, a claim to use 18 square metres up the dodgy end of Flinders Lane for up to nine years. Today, Rex Tremendae, our cafe, is ticking along quite well. Our fantastic customers—basically everyone’s a regular—appear to like us, and on the whole it’s beginning to look as though it won’t be the world’s worst investment.

Below are a few charts from some analysis I pulled together, based on transaction-level data from our point-of-sale system. The time period covered is October 2014 – January 2015. All the charts are produced in ggplot2 (from within R). Y axes are sales in dollars, but I’ve censored these as I’d prefer not to tell our competitors how much we’re selling!

Note that the figures have been scaled down to fit within this awful WordPress theme. Click through for full size.

How does the weekly cycle look?

When ordering inputs (milk, pastries, deli items etc.) we need to have a bit of an idea which days are more likely to be busy. One thing is for sure—there is a cycle through the week, with Thursdays and Fridays quite a bit busier. The red line is my very rough estimate of total daily running costs—of course, costs increase with sales, but marginal costs in the cafe business are very small compared to fixed (or semi-fixed) costs.


Are same-day sales going in the right direction?

The big question here is how sales are going, comparing like days with each other. I’ve fit a linear trend-line to each of the series, which also has 95% confidence intervals around the line. On the whole, the slopes are in the right direction, though we’ve not enough observations to be sure that it’s not noise. Again, the red line gives me some indication of average daily running costs.



When are we busy?

Rex is very small, and has no indoor seating. While we do make the most delicious toasties on the planet (salami, chevre, tomato tapanade and spinach is my fav), our customers don’t buy many of them. Instead, they seem happy to queue out the door for the delicious coffee (roasted by Rob, my brother) each morning.

The plot below illustrates the sales through the day. The x-axis is the time, the y-axis is the sales (in dollars, again, censored), and the curve fit through them is a smoothed sales profile. The red horizontal line gives the average hourly cost through the day. As you can see, afternoons aren’t an especially profitable time for us (though the true costs of being open then tend to be lower, as one of the staff knock off, or Rob uses the time to go and roast beans/work with wholesale clients).



Are we winning business within certain times?

Another question I’ve been asking is whether the after-lunch coffee market is improving? If there’s no improvement, then we need to think about changes in strategy. Thankfully, there does seem to be some improvement over time.

The chart below illustrates sales through the day and sales-by-the-hour over time. Each box represents an hour of the day; the points on the left come from October, and the points on the right are from December. The lines fitted are the linear trend lines with 95% confidence bands. As you can see, there is considerable growth in the 10-11 segment, and from 1-4, though the trend-lines in the afternoon aren’t too strong.


How bad is January really?

Bad. I’d heard about January being a crap month in hospitality, but our January was really quite crappy. It didn’t help that Rob and Effi (his partner) spent half the month in Germany seeing Effi’s folks. Or that all of our customers were on holidays. Or that the weather was bad. It was crap!

The chart below illustrates this. The height of the lines indicates the sales per hour, and the X axis is the time of day. The red line is January.



That’s all for now folks. If there are any cuts of the data you’d like to see, do let me know.


Rex Tremendae on Urbanspoon

Comments (4)

3 things we should see in tomorrow’s macroeconometric modelling

Macroeconometric modelling is a funny sort of field. The stakes of doing it wrong (or right) are extremely high, while the data used are infrequently and poorly measured. In many cases, estimating a model on a large-enough sample to do useful inference involves including observations from a long time in the past—I’m talking  the 60s and 70s—and believing that the data were both correctly measured and coming from the same economy.

A skeptical macroeconometrician may ask: “how much of my view about how the world works today should I inform using data points from the 1960s, 70s, or 80s?” and they’d have a good point.

Here’s a field where it’s basically impossible to know anything—at least to any scientific standard—that has enormous impact and policy relevance. It’s really no wonder that it attracts a ‘spectrum of personalities’, vying with one another for the ears of our political leaders.

At the same time, macroeconometrics done right is useful. There is not nothing to be learned from history, so long  as macroeconometricians are honest about what can and cannot be learned from historical data.

To these ends, I thought I’d put together a list of 3 characteristics that we should expect in tomorrow’s empirical macro models, with a few notes on how to implement them. All of these exist already, but are not standard features of commonly used empirical macro models.

1. Model uncertainty and sensible confidence intervals

Most readers here would expect any forecast to come with forecast confidence intervals, normally 95%. The implication to the reader is that the forecaster is “95% sure” that future values will fall inside the confidence band. An alternative interpretation may be that “95% of possible futures” fall inside the confidence band.

Almost all of the time, these confidence bands are poorly constructed, resulting in the reader being too sure about the future. This is because confidence intervals constructed the usual way—using historical forecast errors—assume that the underlying economic model is true. That is, using the normal approach, a 95% confidence band contains 95% of potential futures given the underlying economic model is a perfect representation of the world.

Of course, economic models are not perfect representations of the world, and so the 95% confidence band here is useless. I doubt highly that if the Australian Treasury had used their current technique of constructing confidence intervals over the last decade that such confidence bands would have included 95% of the realised outcomes.


Introducing model uncertainty—uncertainty over how well the model actually represents the world—helps to overcome this. There are ways of introducing model uncertainty’ to a macro model, often by bootstrapping (which I have issues with, as I don’t believe that historical data come from the same model), and more commonly by using Bayesian techniques, using priors that reflect how little we actually know. These tools are used quite frequently by many macro modellers, though, unfortunately, not many who matter.

2. Coherent weighting schemes/model shifts

When building an empirical macro model, often one of the most difficult choices is how much data to include. Macroeconomic data are recorded fairly infrequently—monthly for unemployment and trade, quarterly for prices and the national accounts, annually for state accounts, etc. Many of these series don’t really move about too much, which makes it difficult to pin down the relationships between macro variables. This means that the empirical macroeconomist often needs to estimate their models on long histories.

This is a tough choice: include a long history and you end up estimating a useless value (the average relationship between variables for the whole period, rather than their relationship today), include a short history and you end up throwing out a lot of data that may have value. One common work-around is to use a weighting scheme that gives more importance to recent observations, and less importance to historical observations. But is the recent past really a better predictor than the distant past? Can we learn nothing from history?

Once you’re in the world of time-series modelling, you’re implicitly saying that relationships between historical variables are of some use. If this is the case, then why not go the whole way and say that more can be learned from more relevant histories? 

One fairly simple way of doing this is to simply give more weight to relevant histories when we build our models. But how do you know which histories are relevant and which are not? My method is to do the following:

1. Train a random forest on the relevant dependent variable, using a wide range of independent variables. The random forest is a tool from machine learning that will throw out irrelevent independent variables, so you can afford to put many in.

2. Save the proximity matrix from the random forest. This symmetric matrix gives us a measure of similarity between two observations. Importantly, it is how similar two observations are in all the ways that matter to predict the dependent variable. I have written on this elsewhere; I consider it to be one of the most important tools to the future of inferential economic research.

Here is the first five rows and columns of a proximity matrix from the demonstration below.

1979Q3 1979Q4 1980Q1 1980Q2 1980Q3
1979Q3 1 0.240876 0.164179 0.19708 0.402878
1979Q4 0.240876 1 0.169355 0.212598 0.222222
1980Q1 0.164179 0.169355 1 0.132231 0.103704
1980Q2 0.19708 0.212598 0.132231 1 0.115108
1980Q3 0.402878 0.222222 0.103704 0.115108 1

3. Run your regression model, taking the appropriate row of the proximity matrix to be the weighting vector. This will normally be the last row, as you’re interested in finding similar histories to today.

It’s really that simple.

So how much of the data actually gets used in this method? To illustrate, I’ve put together a little demo (code and data—which downloads automatically on running the script—available here). In it, I’m trying to model labour productivity growth, in particular, how much it appears to be affected by changes to unemployment. Note that this is for illustrative purposes, and I’m not making any claims about whether the parameter is well identified.

The figure below illustrates the weights that are being given to historical data observations, along with the fitted values and predicted values.



If we use this method, we can see how the relationship between changes in unemployment and changes in productivity vary over time, when we give more weight to relevant histories. The line in the middle is what we would estimate today, if we gave equal weight to all observations. As we can see, in some histories, changes to productivity do appear to move together with changes in unemployment.




These charts wrap up my spiel on using relevant histories, though I’ll probably write some more on it in the future.


3. The ability to inform the user when the model should not be used

One of the major shortcomings of macro models today is that they lack an intuitive way of knowing when a forecast or policy simulation should not be performed because the model was estimated on data from a different world. Instead, a model is typically just a bunch of coefficients (or occasionally distributions) that we multiply with our hypothetical x variables. It doesn’t care whether those x variables are nothing like the ones that the model was estimated on.

This is one of the big areas of abuse of models, sometimes with catastrophic consequences. We estimate a model on the good times, and wonder why it doesn’t work during the bad. Wouldn’t it be wonderful if the model just reported enormous confidence bands whenever it was being asked to do something unreasonable?

Well, this can be done too, using the weighting scheme discussed above. If we’re in the middle of an unusual economy, then there will be very few histories proximate to the present, and confidence intervals can be adjusted accordingly (as the model is effectively being estimated on fewer data-points). On the other hand, if we’re estimating the behaviour of a fairly regular economy, we have lots of relevant histories and our confidence intervals will be smaller.


There are many things we should be asking of our macroeconomic modellers. My top three are that they appropriately model what they do not know, that they stop building models on useless data, and that they do not use their models out of context. That’s not too much to ask.

Comments (3)

The Kangaroo Jack effect: what happens when nobody says no

Some big ideas are objectively terrible, yet are followed through on, resulting in a predictably terrible product. Kangaroo Jack comes to mind [1], as does the Bollywood production of Fight Club, in which some budding entrepreneurs start a for-profit fight club (market research and all). Here are some other terrible ideas from the startup world.

The interesting thing about these terrible ideas is that there were presumably several people who could have just said “no”. Political scientists call these people veto players. There are the financiers, the production houses, the directors, producers and so on. It’s an interesting dynamic that has veto players allow so many people to pour so much human effort into these thought farts.

I’m not sure why these ideas end up going ahead. It could be some combination of:

– Hindsight bias. Maybe Kangaroo Jack, Fight Club the musical etc. weren’t obviously terrible ideas at the outset?
– In some communities—think Hollywood and Silicon Valley—maybe everyone owes everyone, so people work on each other’s projects in order to curry favour.
– A lot of people do say no, yet the persistence of the bad idea’s champions ends up finding the dumb money or weak/stupid veto players.
– Both film production firms and VCs just play the numbers game, aware that many apparently dumb ideas end up paying off. Indeed, the Kangaroo Jack effect may be a feature of investment markets in which most investments make nothing, and a few make a huge amount.

I’d be interested to hear of your explanations/examples.

[1] Kangaroo Jack, despite being a terrible film, did turn a small profit, but “Bollywood version of Fight Club effect” doesn’t quite sound the same. The opportunity cost of funding Kangaroo Jack isn’t not making Kangaroo Jack; it’s not making The LEGO movie. I’m pretty sure the world is worse off due to Kangaroo Jack.

Comments (1)

Update from Chicago

Here’s my much-promised, less-delivered update on Chicago: what we’re doing, how it’s going, and if we’re going to come back.

In reverse order, yes. Sue and I need to leave the country at the end of August, when my visa (but not hers) expires. But gee, it’ll not be without regrets. Escaping an apocalyptic Melbourne winter to come to this great city has been immensely enjoyable. More on that below. So yes, all the rumours are false; we’re not staying on, even if we really want to.

The reason for us making the trip over here is so that I could take up the Eric and Wendy Schmidt Data Science for Social Good fellowship, being run through the University of Chicago. The fellowship itself came about from Eric Schmidt’s involvement in the Obama 2012 campaign, the data-intensive side of which was being run by Rayid Ghani.

So the story goes, Schmidt, who has made a little bit of money running Google, wanted Ghani to put the skills of young data scientists to the public good, and funded the fellowship. In return, Ghani and his team run the fellowship, which brings together some seriously bright folk from around the world (but mainly the US) to Chicago for 12 weeks over summer. These fellows work in groups of four with partner organisations—mainly not-for-profits and government agencies—to lend their skills to help solve some difficult problems.

The team I’ve landed in is fantastic. The members are Diana Palsetia, Pete Lendwehr, and Sam Zhang. Diana is a PhD student at Northwestern who specialises in high-performance parallel computing on large datasets. She’s the sort of person who interjects sparingly, from the back of the room, with extremely useful insights. Pete is a Phd student at Carnegie Mellon University, specialising in ‘advanced computation’, but seems to spend as much time pondering theatre and hunting down excellent coffee beans. He also turns my ideas—good and bad—into deployable Python code. Sam is a brilliant 21 year old who just finished up at Swarthmore, an elite liberal arts college. He’s an all-round hacker, writer and statistician who makes me reconsider the wisdom of having wasted so much of my 20s.

Our team has been partnered with two organisations: the first is Enroll America, a well-funded not-for-profit tasked with getting as many people signed up for health insurance under the Affordable Care Act/Obamacare as possible; the second is Get Covered Illinois, which is being run out of the Illinois Governer’s office, who are attempting to do the same, though only in Illinois.

Both of these organisations have limited budgets, and the same aim—get uninsured people covered. The big question for them is who should they target with their limited resources? There are some people who will never take out insurance no matter how much they’re pestered, and sad as it is, it’s a waste of money trying to convince them to do so. There are other folk who are far more interested in taking out insurance, but have not because, frankly, the system takes a bit of work. And you can always do it tomorrow, right?

There is no shortage of data here. As many of the people working on this problem (mainly on the Enroll America side) also worked on the Obama 2012 campaign, they use similar datasets to those used to find persuadable voters. That means that some of these datasets are quite big—a row for almost every American, with plenty of details (mostly best guesses) on each.

The approach that our team is taking is to build several statistical models to help Enroll America and GCI work out who they should be spending money contacting. The first model gives, for each person, the probability that that person is uninsured. There is no point contacting someone who is insured. What is surprising about this model is that there are a surprising number of people who you’d not expect to have health insurance who do, and so it’s surprisingly difficult to build a good predictive model that sorts out the insured from the uninsured.

The second model tells us how persuadable someone is, given their probability of being uninsured. Thankfully, Enroll America ran a randomised control trial in March, in which they randomly selected a ‘control group’, who they’d not pester during their telephone and email campaign. They then compared this group to a ‘matched treatment’, who were similar folk who were pestered, and compared the differences in insurance rates after the enrolment period ended. The result was quite profound: people who were pestered by email and phone were about 6% more likely to have taken up health insurance.

While the ‘treatment effect’ of being pestered is about 6% on average, the interesting question for our team is working out what the treatment effect must be for an individual person. This is an extremely difficult problem, towards which we have been devoting most of our time. Our current solution is here.

There are other problems that we’ve not done as much work on. For instance, what is the best contact language? Where should tabling events be held? How can we best guess someone’s income (which will determine how large a subsidy they will receive)? These are for the coming weeks.

Sue and Emi have also been busy, making friends in our neighbourhood–right next to the University of Chicago–and spending long days at the beach. Emi has learned to run, and Sue has learned to spot enclosed playgrounds.

A few words on Chicago. Picture this: one in three days in this city you can freeze to death without trying hard. Yet almost 10 million people decide to live in and around the city. Why would so many people make such a choice, surely crazy to the outsider?

I’m not 100 per cent sure, but it must have something to do with the fact that it somehow combines being an extremely large city with a small-town feel. Traffic is no worse—probably better—than Melbourne. Public Transit certainly isn’t Singapore, but is cheap and effective (especially during the rush). The music, theatre and intellectual scenes are full and exciting. The beaches are fun, the food great, and the people are extremely friendly; some combination of northern-Midwestern Nice and Southern hospitality, a remnant of the Great Migrations. Finally—this was unexpected—the summer is delicious. 29C every day, often cooled by large storms at night. Splendid.

The 10 million people who live in Chicagoland aren’t mad. They could live in the Eternal Spring that is Southern California, but don’t. If Southern California had Chicago’s winter, nobody would live there.

Comments (4)

A method of estimating out-of-sample continuous treatment effects using experimental data

The whole point of the below is to prioritise who gets treatment first. Who do we canvass first? Who gets the first dose of medicine? Average Treatment Effects simply don’t help here, and other methods do very poorly if you have only experimental data at a slice in time (so you can’t do difference-in-differences or some other unit-level estimate of treatment effect).

So here’s my algorithm. The high-level objective is to estimate a continuous treatment effect for out of sample observations that is essentially a function of individual-level characteristics. For an observation in the test set, the estimated treatment effect is a convex combination of the (unobserved, but estimated) unit-level treatment effects in the training set, where the weights are given by similarity to the units in the training set.

Here’s a rough outline.

The true causal effect for each observation in the treated group, where Y is the outcome, the indicator before the pipe is the group (t = treatment, 0 = control), after the pipe is whether they received the treatment, and inside the brackets is the unit (i)-level.


Yt|t(i) – Yt|0(i)* = treatment effect on the treated

Y0|t(i)* – Y0|0(i) = treatment effect on the untreated. Stars indicate that we never observe that outcome.

Let’s say that the conditional ignorability assumption holds (so t is independent of unobserved variables), as it does in a randomised control trial. Sid Chib’s method is to completely bypass matching, and instead predict the counterfactual. He does this by building two models F(X(i)) and G(X(i)), where

Yt|t(i) = F(Xt(i)) + e(i) tries to predict the outcome variable for the treatment group

Y0|0(i) = G(X0(i)) + u(i) tries to predict the outcome variable for the control group

and then ‘predict’ the counterfactual by putting the control’s Xs into the treatment model, and visa versa. We could use any predictive model for this step, though the better models should give better estimates.

Yt|0(i)* = F(X0(i))

Y0|t(i)* = G(Xt(i))

We then substitute these fitted values into the relations given at (1) to arrive at a vector of estimated treatment effects for each unit, T(i)*.

Here’s where the method gets fun. Define X as the rbind of Xt and X0. Now we can run a randomForest, rF, to predict the treatment effects given the Xs.

T(i)* = rF(X(i)) + v(i).

The lovely thing about random forests is that one of the outputs is the fantastic proximity matrix. For a sample of N, this matrix is a symmetric N*N matrix where the i,j-th (j,i-th) entry is the proportion of terminal leaves in the random forest that observation i and j share. Now, what does it mean to share a leaf? In each tree, two observations will share a leaf when they fall down the same branches. This will only happen when they are similar in several ways. So a high proximity score indicates that two observations are similar in all the ways that help predict the treatment effect.

Now, let’s say we do a survey and collect some (N1) more folks’ Xs, called X1. The whole point of this is to work out for each of these guys how much our intervention is expected to work on each of them (that is, not on average). In the problem I’m working on now, we want to know who we should pester first to buy health insurance.

Call X2 the rbind of X1 and X. We can then predict the saved rF, and store the resulting proximity matrix. It is segmented into four submatrices

PROX = [A,B;C,D], where A is the N1*N1 proximity matrix of the new observations, B is the N1*N matrix that maps the proximity of the new observations to the training set, C is the flipside of A, and D is the old proximity matrix.

Now take B, and normalise each row so that it sums to 0, call this B*. These are our weights for the convex combination of in-sample estimated treatment effects. The estimated treatment effect for the new observations is

Tnew* = B’ T*


Validation here can be done by k-fold cross-validation. We get an unbiased estimate of the treatment effect for the test set the usual way, then estimate it using the method above. The difference between the (observed) test set average treatment effects and the implied ATE for the test set using the method above should be mean across folds.

Comments (1)

Updated bond yields gif


Comments off

Jahangir Hosseini has not eaten in 42 days

A good family friend, Jahangir Hosseini, has been on hunger strike for 42 days now. His story is below.


At the age of 16, I got involved in politics. I was heavily involved in the Iranian revolution and started an anti-Shah group comprised of young people. In my capacity as the leader of this group, I organised secret meetings and lead political discussions.

I started working for the National Iranian Oil Company (NIOC) in Kharg Island when I was aged 18. I was elected as the leader of the NIOC, Kharg Island when I was 18. I resolved workplace disputes, introduced initiatives such as family counselling, advocated for women’s and worker’s rights and drew attention to the demands of workers I represented.

I fought for the wives of workers to be flown to major cities to access health care and to give birth. I campaigned tirelessly for workers on 12 month contracts to be granted permanent contracts, for their years of service to be recognised and for them to receive superannuation. I also fought for contract workers to have the same leave entitlements as permanent staff and for workers to work 40 hour weeks rather than 48 hour weeks, in line with international standards.

I organised regular strikes which heavily impacted upon garbage collection, power and other things. After seeing how active and strong our union was and after hearing about the victories we achieved, workers from other corporations and businesses at Kharg Island signed their names on petitions electing me as the leader of their respective union and forwarded such petitions to the board that managed Kharg Island. I effectively became the leader of a large coalition of unions.

At the time, Kharg Island was the largest offshore crude oil terminal. The Iranian regime was worried that the export of oil would cease if the workers continued to go on strike and could not afford any disruptions.

When I was 21, I was arrested by the Revolutionary Guards. The Iranian regime arranged for my dismissal from the workplace and made attempts to silence me. They perceived the strikes as anti-revolutionary and regarded my actions as provoking workers. I was told it is not an appropriate time to focus on workplace issues and worker’s rights and that our main focus as a nation should be on winning the war against Iraq.

I was barred from working in the private and public sector, could not pursue higher education, could not access my long service leave or superannuation, was prevented from utilising health services and could no longer play for the state soccer team. I have all relevant documentation including a summons to appear at the revolutionary court.

I was imprisoned for 2 years and was convicted on 11 counts which included my involvement with internal and external political organisations (PMOI), provoking workers, organising demonstrations and strikes and anarchy. I developed high blood pressure and was hospitalised. I fled to Turkey, from hospital, and maintained a secret involvement with the PMOI.

I was soon accompanied in Turkey by my wife and 8 month old daughter. Due to security concerns, we fled to Greece and lived in a refugee camp. The UN recognised us as political refugees.

While in Greece, I was heavily involved in refugee issues and politics. I continued my political activism and organised hunger strikes and demonstrations. I went on hunger strike for 55 days for the purposes of condemning human rights violations in Iran and to draw attention to refugee rights. In 1988, I was on hunger strike for 12 days and called for an end to executions in Iran.

I fought for refugee rights and advocated on behalf of many Iranian refugees, all of whom were relocated to Canada and America. After 18 months, we were the only family relocated to Australia. We arrived in Melbourne on 14 January 1989. We had no contacts in Australia and were taken to a hostel where we had to fight for our rights.

We came into contact with PMOI supporters in Australia and have maintained our political activism until today.

I have organised protests condemning the Iranian government for its human rights violations, advocated for refugee rights and regularly spoke to members of parliament and community organisations about refugee rights, the status of women and human rights violations in Iran. The Age and Channel 10 interviewed me following protests against the deportation of Iranian refugees.

Following the signing of a memorandum of understanding by the Australian Coalition government and the Iranian regime, on 3 June 2003, our home was raided by Australian Federal Police. Six houses were raided in Sydney and five in Brisbane. Ours was the only house raided in Victoria. No one was charged. We were all victims of a dirty deal.

Following the raids in Australia, on 17 June 2003, the French police raided the offices of the National Council of Resistance of Iran (NCRI) and arrested its members including its president elect, Mrs Maryam Rajavi. In June 2003, I staged a hunger strike outside the French Embassy for 15 days. I supported international calls for Mrs Rajavi and NCRI members to be released. This hunger strike took place precisely 14 days after the raid on our family home. The raid did not serve as deterrence and instead motivated me to increase my political activism.

On 19 September 2013, due to my opposition to the 1 September 2013 massacre at Camp Ashraf which lead to the murder of 52 unarmed Iranians and the abduction of 7, I decided to commence a hunger strike. I was the first to start a hunger strike despite being a father to a 3 year old boy, 22 year old daughter and a 27 year old daughter. I have been on hunger strike for 40 days as of 28 October 2013 and have had to deal with cold weather. I have slept in a van after being told on day 12 that we can no longer camp outside Casselden place.

I am mentally strong but physically weak. I made the decision to remain on hunger strike until the hostages are released and am willing to risk my own life. This is the bare minimum I can do for human rights.

Comments off

Treasuries yield curve over the last month



– Updated 20-10-2013

Comments off

The same chart again, but with TIPS yields also


Comments off

US Treasury yields over time


Comments off

« Previous entries Next Page » Next Page »