The Anvil

A place to hammer out ideas

Substitute Days and Broken Machine Learning

January 4, 2021

Christmas 2020 was on a Friday which meant that Boxing Day was on a Saturday. In the UK Boxing Day is a public holiday but because it fell on a weekend the following Monday (the 28th December) was a “substitute day” public holiday.

The database of country holidays used by Forecast Forge treats these two things as different holidays. i.e. in 2020 there was a Boxing Day holiday and an extra substitute day Boxing Day on the 28th.

The last year with a substitute Boxing Day was 2015 so unless the historical data used in the forecast includes Christmas 2015 then the Forecast Forge algorithm will not have any relevant training data to make a prediction for the holiday effect in 2020.

I tweeted this observation last week and Peter O’Neill replied with this:

Which I think is a very interesting question.

The algorithm is behaving exactly as it is designed to so in one (important!) respect it is not broken at all. But on the other hand if it isn’t doing what the user expects then even if it isn’t fair to describe the procedure as “broken” it is certainly far from ideal.

And this gets to the heart of why I think it is an interesting question; it is about how users interact with machine learning systems and what expectations are reasonable for such an interaction. And this is exactly the area where the user interface for Forecast Forge sits!

Let’s talk specifically about the substitute day bank holiday problem even though there must be about a million other similar issues.

Off the top of my head I can think of six different ways of modelling this:

  1. Ignore it (i.e. treat the 26th as the bank holiday and the substitute day as a normal day)
  2. Move the entire holiday effect over onto the substitute day
  3. Apply the same holiday effect to both days (i.e. having a substitute day holiday gives you much more uplift than not)
  4. Split the holiday effect evenly between the two days
  5. Split the holiday effect unevenly between the two days. This sounds like a great solution, especially if you can use machine learning to estimate the optimal split. But in practice you will need data from previous substitute bank holidays to do this.
  6. Treat the substitute holiday as a separate holiday

It isn’t obvious to me that any one of these is always going to be better than all the others.

For a bricks and mortar store I can imagine that, in a non-Covid year, they would get a lot more footfall with a substitute day than in a normal year. For an online brand I can see it not making very much difference - particularly for Boxing Day which lies in the dead period between Christmas and New Year.

The good news is that Forecast Forge allows you to use any of these options by setting up appropriate helper columns. The bad news is that it doesn’t communicate at all about which of the models you are using with the default “country holidays” model.

It is a huge challenge for me to build a user interface that knows when a modelling choice has other options that the user might want to explore. People don’t use Forecast Forge because they want to have to think about this for every possible modelling decision; people like this need to code their own models. But, just as importantly, people do use Forecast Forge because they want a bit more control and customisation rather than a forecasting service where they get the result and can’t do anything about what it says even if it is obviously wrong.

And not knowing what modelling decisions have been made behind the scenes and what other options are possible makes this much harder than I would like for users.

As Peter says:

And this would be the holy grail for human/machine learning interaction. It already exists for those who are confident with both coding and converting whatever levers a business might want to pull into mathematics. But these people are a rare (and expensive!) breed; I wouldn’t class myself as top tier at either discipline.

I have made design decisions with Forecast Forge that do restrict the levers that are available for pulling. My hope is that I have left the most important levers and that by building it into a spreadsheet all my users have more flexibility in what they can do with it than in other tools.

But it is still short of what I wish it could be. Is there a way to hide the unnecessary complexity whilst making all the necessary complexity visible?


Read more

A Forecasting Workflow

December 2, 2020

Here is how I make forecasts using Forecast Forge. There are a lot of similarities with how forecasting is done by data scientists so some of this will be useful even if you aren’t a Forecast Forge subscriber.

1. Look at the data!

I cannot stress enough how important it is to plot the data! For forecasting problems it isn’t even that complicated; putting the date along the x-axis and the metrics you are interested in on the y-axis will be enough in most cases.

Don’t be like the students who didn’t find the hidden gorilla during their data analysis! (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02133-w)

Read more

Forecasting the Effect of Lockdowns

November 23, 2020

A Covid-19 vaccination is now clearly visible on the horizon but there are still significant uncertainties about when exactly this will become available and what will happen in the meantime.

In the UK I think there is likely to be a period of national lockdown in January. And maybe more in the Spring too. Other countries in Europe are in a similar situation and who knows what will be going on in the USA by then?

It is important for businesses to be able to make good estimates for what their demand will look like at these times. I will show you how to do this in Forecast Forge.

The impact of lockdown #1 in the UK

Read more

Cumulative Forecasts and Measuring Impact

November 3, 2020

If you have a really good forecasting model that has, historically, produced good results then you can use it to estimate the effect of a change.

For example, physicists can use Newtonian Physics to forecast where planets will be in the sky and things like solar eclipses. If these predictions suddenly started being wrong (after they have been right for hundreds of years) people would conclude that something fairly drastic had changed.

Similarly, if you have made a good forecast in Forecast Forge but then it starts predicting badly this could be because something has changed. You can turn this reasoning around and also say “if something has changed then the forecast won’t work as well.”

You can watch a quick video of me demonstrating how you can do this in Forecast Forge. Or read on below for more details and commentary.

Look at this example of daily transaction data:


Read more

Extending Prophet for fun and profit - Part 2

October 27, 2020

This post is part of a series on extending the Prophet forecasting model so that it can do other things. You should read part one on the default Prophet forecasting model first.

What Prophet Can’t Do

For me, one of the most useful features of Prophet is how easily you can add regression features to the model. This enables you to provide information to the machine learning model about things like the dates when email campaigns were sent or when your PPC budget increased; and you can use your marketing calendar to add information about when these things will happen in the future too.

Being able to do this in a spreadsheet is one of the main things that makes Forecast Forge great.



Read more

Get updates in your inbox