Here you can find all the Forecasting Forge learning resources.

Improving Forecasts

August 12, 2020

This tutorial follows on from the Your First Forecast resource. If you haven’t completed that yet, you can make a copy of this Google Sheet to use as a starting point.

Forecasting problems

At the end of the last tutorial we had made a forecast for the daily number of pageviews for the Easter page on wikipedia.

There are two main problems with this forecast:

  1. Sometimes the values are negative and pageviews can never be a negative number.
  2. The number of pageviews on Easter Sunday is much lower than it has been in previous years.

The two methods I will teach you to help solve these problems are:

  1. Data Transformations

  2. Adding Regressors

Read more

Installation and Setup

August 11, 2020

Welcome to Forecast Forge! This tutorial starts you off with installing and configuring the Google Sheets Addon.

Installing the Addon

Visit https://gsuite.google.com/marketplace/app/forecast_forge/1071689147475 and click the big blue “Install” button to install the addon.

You might need permission from your GSuite Domain administrator to do this.

Sometimes there are problems loading the page if you are signed in to more than one Google Account in your browser. Sign out of your other accounts or login in an incognito window and then try again.

Activating the Addon

New addons need to be “activated” to be used in a spreadsheet.

You will have to do this step (but not subsequent steps) for each spreadsheet where you want to use the FORGE_FORECAST custom function.

Choose a Google Sheet and wait for it to load fully. Or click here to open a new blank workbook.

When the spreadsheet has fully loaded you can click “Forecast Forge” then “Activate” from the Add-ons dropdown menu.

The first time you use Forecast Forge a window will pop up asking you to grant the addon the permissions it needs to work with your data.

Read more

Creating your First Forecast

August 11, 2020

If you haven’t installed and configured the addon yet then you should read the Installation and Setup tutorial first.

Watch the Video

Or read about it in more detail below:

Getting the data

The algorithms which power the FORGE_FORECAST custom function work across a wide range of metrics. But, for your first forecast, it is helpful to have a concrete example of something.

I will show you how to forecast the number of pageviews for the wikipedia page for Easter.

You can download the daily pageview data as a CSV from Wikimedia pageviews dataset on ToolForge.

Or make a copy of this Google Sheet which contains the same data (up to the date of writing).

Read more

The Power of Spreadsheets

August 5, 2020

The simplest way to do a machine learning powered forecast is to take historical data for the thing you want to forecast (e.g. last 3 years of revenue) and feed it into an algorithm to predict into the future (e.g. revenue for the next 12 months).

The algorithm might be very complicated, but the system is quite simple.

+----------------------------+
|      Historical Data       |
|       for the thing        |
|    you want to forecast    |
+----------------------------+
  |
  |
  v
+----------------------------+
| Machine Learning Algorithm |
+----------------------------+
  |
  |
  v
+----------------------------+
|     Future Predictions     |
+----------------------------+

The algorithm can only know about things it receives as input (and it won’t necessarily learn every pattern in the input) which in the system above is just one single metric.

In business it is rare to find metrics where the future values depend only on the historical values. There are always other factors to consider.

If you can find one of these other factors where:

  1. The algorithm, training on historical data, can learn the relationship between your factor and the metric you want to forecast
  2. You know the values this factor will take in the future

then we call this factor a regressor and it can be used to help make better predictions about the future.

Splitting the “Machine Learning Algorithm” box into “training” and “predicting” makes this clearer:

                        +------------------------+
                        |    Historical Data     |
                        |     for the thing      |
                        |  you want to forecast  |
                        +------------------------+
                          |
                          |
                          v
+-----------------+     +------------------------+
| Historical Data |     |                        |
|     for the     |     | Training the Algorithm |
|    regressor    | --> |                        |
+-----------------+     +------------------------+
                          |
                          |
                          v
+-----------------+     +------------------------+
|  Future Values  |     |   Trained Algorithm    |
|     of the      |     |  used for predictions  |
|    regressor    | --> |                        |
+-----------------+     +------------------------+
                          |
                          |
                          v
                        +------------------------+
                        |   Future Predictions   |
                        +------------------------+

Next I will talk about some example regressor variables.

Regressor Examples

Suppose you want to forecast revenue for the paid search channel of an ecommerce business.

One very important factor for this is how much money you spend on advertising. What has happened in the past won’t tell you very much about the future if the amount of money spent on advertising changes a lot.

If you have a budget plan for the next year then you can use this as a regressor to help predict what the revenue will be. This has the added advantage of allowing you to run “what-if” analyses to predict the impact of budget changes.

Another common regressor is to tell the algorithm when the store has been is a sale or period of heavy discounting. These periods don’t necessarily occur with any regularity (most forecasting algorithms can learn these patterns without a regressor as long as the occur at the same time each year) but you will know when they happened in the past and the dates for the future and you can feed this into the algorithm to improve forecast accuracy.

A similar example is when new product ranges or lines launch. In fashion the periods shortly after new season ranges launch can be quite different to the weeks preceding and following it. And often these events take place at slightly different times each year so the algorithm won’t spot the pattern unless you help it out by providing the past and future dates as a regressor.

The Care and Feeding of Algorithms

I’ve talked a lot about “helping” the algorithm or “telling” it certain pieces of information.

This whole product is about enabling you to do just that.

We run the machine learning algorithms on our servers so the missing piece is finding the best way for you to input data and see the outputs.

  • All inputs (whether metrics to be forecast or regressors) are lists of numbers
  • You must be able to easily add new inputs or see how the results change if you remove an input
  • For regressors, it is useful to be able to change the future values so that you can perform “what-if” analyses

So the user interface must support the entry and editing of lots of lists of numbers. Which sounds very like a spreadsheet!

Building our machine learning product into a spreadsheet has other advantages: 1. People already know how to use a spreadsheet 2. Businesses already have lots of data stored in spreadsheets

Rather than trying to code my own spreadsheet it seemed sensible to piggyback on the work of others and create something that integrates with existing solutions.

Disadvantages

However, it isn’t all plane sailing; there are a small number of disadvantages to doing things this way: 1. I don’t have the skills to write an addon for Excel 2. Google Sheet’s custom functions have a timeout of 30 seconds. This limits the algorithms it is possible to use because they must train and predict within the 30 second window.

There might be workarounds for the timing issue: if I can make the machine learning available outside of a custom function (e.g. from a menu) then the timeout restriction does not apply.

Read more