September 3, 2020

How Good is my Forecast?

If you have enough data then the Forecast Forge addon will estimate how accurate your forecast is likely to be.

How it works

We don’t know what will happen in the future so it is impossible to be certain how good or bad your forecast will be. But we can use the same forecasting algorithm to make a forecast for the recent past and then compare how accurate that forecast is against what actually happened.

For example, you might pretend you don’t know what happened between April 2019 and April 2020 (and I think we’d all like to imagine this didn’t happen at all!) and use the data from April 2017 to March 2019 to feed into the forecasting algorithm.

Then you can compare the results of this forecast with the actual data for 2019/20 to see how good the forecasting algorithm is at predicting with your data.

    You have this data
2018       2019        2020
 /----------/-----------/-----
                             |~~~~~~
               And you want to forecast this

    Use this data
2018       2019        2020
 /----------/-----------/-----
                       |~~~~~~
                 To forecast this

Measuring Error

The Forecast Forge addon shows you four different ways of measuring the error. They are each useful in different circumstances.

Every error metric is based on the daily errors; the difference between the actual value and the forecast value for each day in the forecast.

1. ME - Mean Error

Take all the error values and find the mean.

This is the simplest error metric but it doesn’t always tell you the full story because positive and negative errors (where the forecast over- and under-estimates) can cancel each other out.

The main thing the Mean Error tells you is whether the forecast tends to overestimate (positive error) or underestimate (negative error).

2. MAE - Mean Absolute Error

Take the absolute value of the errors (i.e. make them all positive) and then find the mean.

This fixes the problem with Mean Error described above.

3. RMSE - Root Mean Squared Error

Square all the error values, find the mean of this and then take the square root.

This is a very commonly used error metric in machine learning. I strongly suggest you try to minimise this error when working to improve your forecasts unless you have a very good reason not to.

However, this can be a bit harder to understand than the other error metrics so once you have your model figured out you can report MAE or MAPE to your clients who aren’t elbows deep in forecasting.

4. MAPE - Mean Absolute Percentage Error

Find the error values as a percentage, take the absolute value and then calculate the mean of this.

This is a very useful error metric because it is a percentage; it doesn’t matter what scale the values being forecast are.

For example, imagine I tell you that I’ve made a forecast for average order value (AOV) and that my MAE is 15. Is this good or bad?

It is impossible to say without knowing more about the average order value. If it is very high (e.g. over $200) then 15 is quite good. If it is very low (e.g. $20) then 15 is very bad!

But if you have a MAPE of 10% then you don’t need to know how big or small the AOV is to assess how much of a problem the error might be.

For more detail on running backtests manually or using other error metrics read the Backtesting Forecasts to Estimate Future Accuracy post.

I Think You’ll Find It’s a Bit More Complicated Than That

As with just about everything, it’s a bit more complicated than that!

Rather than run just one backtest the addon runs up to five and then averages the results. This is just in case one of the backtest periods is exceptional in some way; running more than one tests makes the estimates more accurate.

2018       2019        2020
 /----------/-----------/-----
                       |~~~~~~ Backtest 1
                    |~~~~~~    Backtest 2
                |~~~~~~        Backtest 3

This is a process known as Timeseries Cross Validation.

Each backtest is known as a fold. You can see the number of folds that were used below the error metrics table.