While I was working on a trading desk I got exposed to a new-to-me statistical tool: the **exponentially-weighted** moving average. I fear that programmers often reach for a normal, window-based moving average when an exponential one would serve them better.

But let's back up a little: A moving average is a statistical transformation we can apply to time-series data to smooth out spikes. This can be very useful for things like preventing flaky alerts or smoothing out financial data.

Say we have a function `f`

that represents some spiky data over time:

This could be any metric over time: a server's memory usage, an oven's temperature, Apple's stock price, etc. Maybe smoothing it can remove noise and give us a better understanding of what's going on?

We can define a **windowed moving average** `windowAvg(f, windowSize, t)`

at each time `t`

to be the average of `f`

at all the points that fall in the previous `windowSize`

interval. The goal is to choose a value for `windowSize`

that smooths `f`

exactly as much as one wants. So a `windowAvg(f, 5min, t)`

would look like this:

This is helpful! We can more easily see a trend in the bump between times 36 and 46. We don't have to be distracted by outliers like 8 and 41 quite as much. But our smoothing function also introduces some noise: with a window size of 5 minutes, you can see a distinct drop between times 12 and 13.

These plateaus can be confusing for readers of the graph: it's unclear if the drop at time 13 is indicative of a drop in `f`

at that time or just a spike that's leaving the window. The drop isn't telling us anything about what's happening with `f`

at time 13 — the width of this plateau is only dependent on `windowSize`

. It's *just* an artifact of the averaging function. You can try changing the window size with the slider above to see for yourself.

On the other hand, an **exponentially-weighted** moving average does not have this downside. We can define `expAvg(f,r,t)`

by adding the value of `f`

at time `t`

(i.e. `f(t)`

) with the previous value of `expAvg`

at time `t-1`

(i.e. `expAvg(f,r,t-1)`

). We also get to choose a **rate** `r`

that modulates how much our averaging function should adjust to the new value of `f`

or trust the previously calculated average. So, recursively:

`expAvg(f, r, t) = r*f(t) + (1-r)*expAvg(f, r, t-1)`

with a base case:

`expAvg(f, r, 0) = 0`

The goal is to choose an `r`

that gives you an appropriately smooth graph. The higher the rate, the more quickly our moving average will accept new spikes. You can try playing with it here:

Doesn't that seem more representative of our underlying `f`

?

This still isn't perfect. You'll notice that when `r = 0.4`

, parts of `expAvg`

look suspiciously like exponentially-increasing or -decreasing functions? Unfortunately, this is also an artifact of this new smoothing function rather than an insight into the underlying function `f`

, but I'd argue it's less of an issue for readers than the boxy plateaus we get in our windowed version.

The beautiful part about `expAvg`

is that it's very easy to calculate with a computer! To calculate `windowAvg`

, one has to store all the points from the past window in memory, to average them. But for an exponential moving average *you only need two numbers* — the last value we calculated for `windowAvg`

and the time it was calculated.

Exponential moving averages are such a powerful concept because they allow us to model and think about real world events more accurately. As an example, how sleepy you are right now can be thought of as a function of how much sleep you've had in the past. Let's try to model it! Does sleepiness just depend on how much sleep you got last night?

Nope. If you don't sleep for 2 days, you'll still be tired even after an 8-hour sleep. We've got to take into account how you've slept even further back. Can we use a windowed moving average, say, the average you've slept for the last 5 days?

Of course not! That would weight last night's sleep just as highly as that of 5 nights ago. We want to weight more recent events more heavily than older ones, maybe even... exponentially so? Yes! We can model sleepiness as an exponentially weighted average of sleep — taking the past into account but only diminishingly so the further back you go.

Well there you have it — all my thoughts on exponentially weighted averages! Hope it adds to your day ☺️