August 5, 2020

The Power of Spreadsheets

The simplest way to do a machine learning powered forecast is to take historical data for the thing you want to forecast (e.g. last 3 years of revenue) and feed it into an algorithm to predict into the future (e.g. revenue for the next 12 months).

The algorithm might be very complicated, but the system is quite simple.

+----------------------------+
|      Historical Data       |
|       for the thing        |
|    you want to forecast    |
+----------------------------+
  |
  |
  v
+----------------------------+
| Machine Learning Algorithm |
+----------------------------+
  |
  |
  v
+----------------------------+
|     Future Predictions     |
+----------------------------+

The algorithm can only know about things it receives as input (and it won’t necessarily learn every pattern in the input) which in the system above is just one single metric.

In business it is rare to find metrics where the future values depend only on the historical values. There are always other factors to consider.

If you can find one of these other factors where:

  1. The algorithm, training on historical data, can learn the relationship between your factor and the metric you want to forecast
  2. You know the values this factor will take in the future

then we call this factor a regressor and it can be used to help make better predictions about the future.

Splitting the “Machine Learning Algorithm” box into “training” and “predicting” makes this clearer:

                        +------------------------+
                        |    Historical Data     |
                        |     for the thing      |
                        |  you want to forecast  |
                        +------------------------+
                          |
                          |
                          v
+-----------------+     +------------------------+
| Historical Data |     |                        |
|     for the     |     | Training the Algorithm |
|    regressor    | --> |                        |
+-----------------+     +------------------------+
                          |
                          |
                          v
+-----------------+     +------------------------+
|  Future Values  |     |   Trained Algorithm    |
|     of the      |     |  used for predictions  |
|    regressor    | --> |                        |
+-----------------+     +------------------------+
                          |
                          |
                          v
                        +------------------------+
                        |   Future Predictions   |
                        +------------------------+

Next I will talk about some example regressor variables.

Regressor Examples

Suppose you want to forecast revenue for the paid search channel of an ecommerce business.

One very important factor for this is how much money you spend on advertising. What has happened in the past won’t tell you very much about the future if the amount of money spent on advertising changes a lot.

If you have a budget plan for the next year then you can use this as a regressor to help predict what the revenue will be. This has the added advantage of allowing you to run “what-if” analyses to predict the impact of budget changes.

Another common regressor is to tell the algorithm when the store has been is a sale or period of heavy discounting. These periods don’t necessarily occur with any regularity (most forecasting algorithms can learn these patterns without a regressor as long as the occur at the same time each year) but you will know when they happened in the past and the dates for the future and you can feed this into the algorithm to improve forecast accuracy.

A similar example is when new product ranges or lines launch. In fashion the periods shortly after new season ranges launch can be quite different to the weeks preceding and following it. And often these events take place at slightly different times each year so the algorithm won’t spot the pattern unless you help it out by providing the past and future dates as a regressor.

The Care and Feeding of Algorithms

I’ve talked a lot about “helping” the algorithm or “telling” it certain pieces of information.

This whole product is about enabling you to do just that.

We run the machine learning algorithms on our servers so the missing piece is finding the best way for you to input data and see the outputs.

So the user interface must support the entry and editing of lots of lists of numbers. Which sounds very like a spreadsheet!

Building our machine learning product into a spreadsheet has other advantages: 1. People already know how to use a spreadsheet 2. Businesses already have lots of data stored in spreadsheets

Rather than trying to code my own spreadsheet it seemed sensible to piggyback on the work of others and create something that integrates with existing solutions.

Disadvantages

However, it isn’t all plane sailing; there are a small number of disadvantages to doing things this way: 1. I don’t have the skills to write an addon for Excel 2. Google Sheet’s custom functions have a timeout of 30 seconds. This limits the algorithms it is possible to use because they must train and predict within the 30 second window.

There might be workarounds for the timing issue: if I can make the machine learning available outside of a custom function (e.g. from a menu) then the timeout restriction does not apply.