Zen of modeling

  1. Your model should have some theoretical basis.
  2. Your model, when simulated, should produce outcomes with a similar density to the observed values. Similarly, your model should not place weight on the impossible (like negative quantities, or binary outcomes that aren’t binary). It should place non-zero weight on possible but unlikely outcomes.
  3. Think deeply about what is a random variable and what is not. A good rule of thumb: random variables are those things we do not know for certain out of sample. Your model is a joint density over the random variables.
  4. You never have enough observations to distinguish one possible data generating process from another process that has different implications. You should model both, giving both models weight in decision-making.
  5. The point of estimating a model on a big dataset is to estimate a rich model (one with many parameters). Using millions of observations to estimate a model with dozens of parameters is a waste of electricity.
  6. Unless you have run a very large, very well-designed experiment, your problem has unobserved confounding information. If this problem does not occupy a lot of your time, you are doing something wrong.
  7. Fixed effects normally aren’t. Mean reversion applies to most things, including unobserved information. Don’t be afraid to shrink.
  8. Relationships observed in one group can almost always help us form better understanding of relationships in another group. Learn and use partial pooling techniques to benefit from this.
  9. For decision-making, your estimated standard deviations are too small; your estimated degrees of freedom are too big, or your have confused one for the other. Remember, the uncertainty produced by your model is the amount of uncertainty you should have if your model is correct and the process you are modeling does not change.
  10. You always have more information than exist in your data. Be a Bayesian, and use this outside information in your priors.

Comments are closed.