January 10, 2023

Working with Google Analytics 4 Data

For standard (non-Premium) users Google will stop processing new hits after the 1st July 2023. So after this date all your new data will be in Google Analytics 4 and there will be no new data in your Universal Analytics profiles.

Forecast Forge will work just as well with GA4 data as it will with data collected using UA but (and I think you knew there was a “but” coming here) the algorithm does require at least two years of history so that it can make good estimates for how the timeseries varies by season (time of year, day of week etc.). For the earliest of early adopters this is not a problem - GA4 launched at the end of July 2019 - but for most, myself and most of my clients included, Google Analytics 4 properties have been setup within the last two years and there is not yet sufficient data to make good forecasts.

This is a problem. Here are several solutions; the one that is best for you will depend on how much GA4 history you have and how similar your GA4 tracking is to your old UA setup.

Option 1. Don’t Do Anything Fancy

This is a great option if your GA4 data is very similar to your Universal Analytics data.

Simply export your UA data into Google Sheets and then add the new GA4 data at the bottom. Assuming you’ve run UA and GA4 in parallel for a time you still have to make a decision about when the cutoff between the two should be but the good news is that if the GA4 numbers are so similar to UA then it doesn’t matter where this cutoff is; I’d probably include as much GA4 data as possible, i.e. from the date at which it starts to have good quality, but this isn’t necessary to get a decent forecast.

Option 2. Use a Binary Regressor Column

The setup for this is very similar to option 1 above; export your UA data into Google Sheets and then add the new GA4 data at the bottom. The extra step is to add an extra regressor column with 0 for all the days with UA data and a 1 for all the days with GA4 data.

This helps the Forecast Forge algorithm distinguish between the UA and GA4 data and learn the difference between them.

Use this option if your GA4 data is different to your UA data by a constant amount (e.g. +250 sessions/day or similar).

Option 3. Use UA Data to Estimate Seasonality

This approach is more complicated and I only recommend using it if you have at least one year of GA4 data and nothing major has changed in the last three months.

First make a forecast using your UA data and then use the historical data plus the forecast as a regressor column for your GA4 data. For this to work the forecast on your UA data must do a good job of representing the seasonal variation in your business.

The idea here is that the seasonality in your GA4 data follows the same pattern as your UA data so you can apply the seasonality estimate from your UA forecast to the GA4 data; the regressor column is what gives Forecast Forge this information.

This method is more complicated than the previous two, but the advantage is that there doesn’t need to be any relationship between the UA data and GA4 data other than that they have the same seasonal pattern. So you can use this option if you want to forecast a new GA4 metric which doesn’t have any equivalent in UA (as long as you are sure about the seasonal pattern).

Option 4. Remove Trend from UA Data First

As with many things, option 3 above ignores some details when the true picture is actually a bit more complicated than that. The UA forecast doesn’t just contain information about seasonality but also an estimate for the future trend; this might be fine or it might be totally not what you want - particularly if there have been any big changes after the end of your UA data.

You want to adjust the UA data so that it only contains the seasonal data and not anything to do with the trend. I have a feature in my backlog to make doing this really easy with the Forecast Decomposition Report but until that launches here is a simple alternative:

Use the TREND function in Google Sheets to estimate a linear trend for your data. The difference between this trend estimate and your actual data is the seasonal component. You can see an example of using the TREND function in the official Google Sheets docs.

You can do this with your entire history of UA data and also for the forecasted values too but the Forecast Forge trend is more complicated than the TREND function can fit so I suggest the following approach:

  1. Use your historical UA data to make a forecast for the next year
  2. Apply the TREND function only to this forecasted data
  3. Take the difference between the forecast and the output of TREND; this is the estimate for the seasonal effect for each day of the year
  4. You can then take this estimate and add it as a regressor column alongside your GA4 data. Be careful to make sure the days match up! (you’ll have to bodge a bit for leap years)

This is way more complicated than the previous three methods so I would only recommend it if the simpler options don’t work. But it is a powerful tool and can be very useful in other circumstances where you don’t have enough data to estimate seasonality; for example you can use it with Google Trends data or sector data from the ONS or another statistics agency.


Never miss a post