Descriptive Models vs Inference Models and Knowing your Audience
“All models are wrong but some are useful” — George Box
Tailoring your model to your audience is important. There are investors who are more interested in targeted indicators like R² and mean-squared errors, where highly descriptive models are the best fit. Highly descriptive models are the kind that will run all the terms with all the fits in multiple to tease out the right target. They are also used for describing the landscape of certain statistics when you don’t have good working field models. This kind of model is exploratory and descriptive rather than proactive
Predictive models are closer to the science equations that you learn in school. Predictive models take certain theoretical values and build out hypotheses on what is driving your target and what is more or less important. Predictive models can be build off your descriptive models. They can also be built off of academic fields, simulations, simplified mathematical equations and many other things. Important is the trade-off between descriptive power and inferential power. If you are just looking for the best possible explanation for the data you have, run a more descriptive model. If you are concerned about the interpretability of your model, you should aim for a more inferential model.
Below are a set of non-linear structures that will help you descriptively explain why your model is swinging wildly from one extreme to another. Hopefully these will help you make sense of your data in the aggregate.
Log and log-log plots
Why use log plots?
Say your whole department worked insanely hard last year and is given a big raise (woohoo!). How much sense would it make to add a flat $1000 to all paychecks over the course of the year? It’d be great for the intern you hired over the summer making only $10,000. That would represent a 10% increase in their paycheck. The department head though making $100k per year would feel cheated at only a 1% yearly increase. That’s why yearly raises are typically awarded as a percentage of what you’re already making a 5% increase over the previous year. The same is true for inflation which is the percentage yearly decay of the value of your money. That’s why money over time is typically (but not always!) plotted as a log plot. We say log(n) is the “natural” unit of money. More money begets more money.
Log plots capture compounding effects. This is useful anytime in the real world where the result of an effect builds on its last result, such as how interest grows over time, or how populations are modeled to grow as there are more reproducing pairs to procreate. This gives a characteristic exponential curve.
A log plot of base x can be transformed into any other log plot of base y as long as the bases are the same sign (positive, negative, zero). So log_10 and log_2 are the same plot! Since we effectively can use any base we need a “natural” base to default to.
What is e?
e is the constant of the natural log function ln. Here’s how (and why) to use it.
Remember that any log base can be transformed into any other log base. Using base e is not necessarily better than using base 2 or base 10 or base pi. However the using the exponent base e gives us a constant (k). This value falls out of the equation and represents the growth constant to your exponential! In the previous example the your yearly pay raise of 5% would be represented as k=0.05 every year.
Rate of growth (k)
ln(ŷ) = a + kX
where ŷ > 0
(Remember when using log values you cannot have 0 or negative values. Transform your data to only positive numbers.)
e^ln(ŷ) = e^(a + kX)
ŷ = e^a * e^(kX)
ŷ = Ae^(kX)
The k in for an exponential function with base e is special. It represents the rate that the exponential increases (k>0) or decays (k<0). If your feature is time it represents the rate your bank account appreciates over time.
Say your target doesn’t only depend on each feature linearly but also on other features. How do you model that as a simple linear regression? Interaction terms take one feature and finds the product with another feature. An interaction term represents synergy or anti synergy between features. Ideally your features are independent from one another and contribute significantly to your process. In nature it is fairly rare to see . This could be indicative of an underlying feature that is driving both interactions.
Elasticity constant (k)
Lastly a regression for future research. Harmonic regression uses sines and cosines to model periodic data. If you know you have periodicity you can use the harmonic function.
In the fivethirtyeight model above they took the fact that they knew it that it was harmonic to derive a k value which is the “spring constant” term which they turned into voter elasticity.
Of course we’re only scratching the surface. Here we reviewed exponential functions, multiplicative interaction terms, and harmonic terms and the constants that go with them but remember the growth/decay term does not necessarily need to be a constant, and theres no reason your model should only include one non-linear element. Going forward we can combine and use even more advanced structures.