Ensemble Time Series Forecasting at Scale

Written by nitishgaddam54 | Published 2023/02/28
Tech Story Tags: business | forecasting | data-science | time-series | machine-learning | business-strategy | demand-forecasting | growth-hacking

TLDRIn this article we will cover general methods for putting time series forecasting models into practice. We will also create an ensemble approach that backtest a variety of models and choose the best approach. According to Venturebeat, 90% of the forecasting models do not go into production and that is because of a lack of exploratory nature.via the TL;DR App

Recently, due to the various changes in macroeconomic factors, many businesses have faced difficulty coming up with forecasts that are reliable. According to Venturebeat, about 90% of the forecasting models do not go into production. This has been attributed to a complex pipeline that’s unscalable and a generally non-exploratory nature.

Additionally, we also notice that a single model is generally not an answer to a range of forecasting needs as forecasting for an organization involves a lot of moving parts.

In this article, we will cover general methods for putting time series forecasting models into practice while creating an ensemble approach that backtests a variety of models and chooses the best approach.

Table of Contents

  • Business Problem
  • Approach
  • Pipeline Overview
  • Benchmarking
  • Automation
  • Conclusion

Business Problem

  • Predict the revenue a product would make over the financial year

  • Revenue impact on the forecast, assuming a change in assumptions (like Take Rate, CPI)

  • A scale to understand Contribution to Growth

  • Anomaly Detection

For a typical organization, forecasting is a process that goes hand in hand with budgeting, planning, and allocation. We typically look at a few metrics that we want to forecast and we run a time series model on historical data. During this process, we decompose the forecast into its corresponding trends, seasonality, and residual use. This result is then used to tune and come up with a forecast.

Based on the nature and understanding of the business problem in hand we can also build assumptions in the form of regressors that can help instruct the forecast. These assumptions are based on features like spend actuals & intended spend, take rate, conversion efficiency, economic data like publicly available CPI forecasts, custom seasonality flags, etc.

Approach

  • Data preprocessing & preparation

  • Forecast Decomposition into Trend, Seasonality, Holidays and Error

  • Improve Accuracy by Cross Validation

  • Top Down Approach vs Bottom Up Approach

  • Running Backtests

We now go ahead and come up with the overall approach we want to follow. Some of the questions we would ask during this step are how much data we have, how clean is the data, and how to parse the data to train/test datasets. Additionally, during this stage we also come up with the various models we want to try out and a metric we would consider for benchmarking these models against each other in the form of backtests. Sometimes we could also want to combine the results of multiple models together in an ensemble approach.

A Top-down vs bottom-up approach is the methodology of determining whether we want to forecast at the higher level and decompose the forecast or if we want to forecast at a product level and aggregate it to come up with the overall forecast.

Pipeline Overview

  • Data Prep using BigQuery

  • Set up VM Instance to deploy Pipeline
  • Machine Learning Libraries like ScikitLearn, Prophet, Arima etc.
  • Control pipeline using Bash Script
  • Deploy Persistent Machines on Cloud Storage

For a Typical forecasting project, we start the project by querying the data and creating a dataset that we can use as a data input. In certain cases, we would also go ahead and further create features that make the most sense to use in our training (for example converting categorical variables to Numeric) and then run a model after defining our target variable. Models like Facebook Prophet, Arima, MVAR, etc can all be used during this step.

We then look at the results by charting them as well as aggregating the data to multiple granularities (like converting daily forecasts to weekly & monthly).

Look at the results.

Benchmarking

  • Cross Validation

  • Errors() MAPE = Mean Absolute Percentage Error

  • Comparison (Compare to other metrics)

After running a model, we then go ahead and cross-validate the data, using a metric like MAPE to understand the model performance. We optimize the forecast using this metric and run backtests and then perform hyperparameter tuning to derive the best parameters that fit the model.

Automation

  • Forecasting at Scale

  • Predict the same data at hourly, daily, or weekly aggregations

  • Run for Multiple Merchants in Parallel using the same Models

  • Test Models in Production

Once the core models and forecasting approaches have been selected, the subsequent step involves devising a scalable approach to run the pipeline. This can be achieved through the use of tools such as Airflow to schedule the job or simple chron jobs to create a schedule. During this process, it is important to understand the various files that need to be executed and the models that need to be run.

An effective pipeline can be created by following a sequence of steps, starting with Data Preparation, followed by data cleaning and model preparation before executing the model run. Subsequently, data validation and scoring can be carried out before rerunning the model and back-testing the results. The results can then be written into a data table, which can be used to feed a dashboard or outputted as an excel sheet to test assumptions and gain a better understanding of the forecast.

Conclusion

In conclusion, forecasting is an integral part of business decision-making. It provides organizations with insights into market trends, potential risks, and growth opportunities. It is crucial to ensure that any forecasting automation implemented adds value to the organization by considering factors such as forecast accuracy, data reliability, and potential impact on the business. By doing so, we can provide effective and efficient forecasting automation that supports the organization’s goals and objectives, enabling them to make informed decisions that drive success.

Some of the questions we ask ourselves by automating this forecast would be:

  • Who Asked for the forecast?
  • Man Hours Saved
  • Business Impact
  • Operational Impact

Lead Image generated with stable diffusion.


Written by nitishgaddam54 | Experienced Data Scientist; Skilled in experimental design, data-intensive apps, predictive modeling & ML for startups.
Published by HackerNoon on 2023/02/28