Saturday, November 9, 2024
No menu items!
HomeCloud ComputingHow to use custom holidays for time-series forecasting in BigQuery ML

How to use custom holidays for time-series forecasting in BigQuery ML

Time-series forecasting is one of the most important models across a variety of industries, such as retail, telecom, entertainment, manufacturing. It serves many use cases such as forecasting revenues, predicting inventory levels and many others. It’s no surprise that time series is one of the most popular models in BigQuery ML. Defining holidays is important in any time-series forecasting model to accommodate for variations and fluctuations in the time-series data. In this blog post we will discuss how you can take advantage of the recent enhancements to define custom holidays and get better explainability for your forecasting models in BigQuery ML.

You could already specify HOLIDAY_REGION when creating a time-series model. The model would use the holiday information within that HOLIDAY_REGION to capture the holiday effect. However, we heard from our customers that they are looking to understand the holiday effect in detail — which holidays are used in modeling, what is the contribution of individual holidays in the model as well as the ability to customize or create their own holidays for modeling.

To address these, we recently launched the preview of custom holiday modeling capabilities in ARIMA_PLUS and ARIMA_PLUS_XREG. With these capabilities, you can now do the following:

Access to all the built-in holiday data by querying the BigQuery public dataset bigquery-public-data.ml_datasets.holidays_and_events_for_forecasting or by using the table value function ML.HOLIDAY_INFO. You can inspect the holiday data used for fitting your forecasting model

Customize the holiday data (e.g. primary date and holiday effect window) using standard GoogleSQL to improve time series forecasting accuracy

Explain the contribution of each holiday to the forecasting result

Before we dive into using these features, let’s first understand custom holiday modeling and why one might need it. Let’s say you want to forecast the number of daily page views of the Wikipedia page for Google I/O, Google’s flagship event for developers. Given the large attendance of Google I/O you can expect significantly increased traffic to this page around the event days. Given that these are Google-specific dates and not included in the default HOLIDAY_REGION, the forecasted page views will not provide a good explanation for the spikes around those dates. So you need the ability to specify custom holidays in your model so that you get better explainability for your forecasting. With custom holiday modeling features, you can now build more powerful and accurate time-series forecasting models using BigQuery ML.

The following sections show some examples of the new custom holiday modeling in forecasting in BigQuery ML. In this example, we explore the bigquery-public-data.wikipedia dataset, which has the daily pageviews for Google I/O, create a custom holiday for Google I/O event, and then use the model to forecast the daily pageviews based on its historical data and factoring in the customized holiday calendar.

“The bank would like to utilize a custom holiday calendar as it has ‘tech holidays’ due to various reasons like technology freezes, market instability freeze etc. And, it would like to incorporate those freeze calendars while training the ML model for Arima,” said a data scientist of a large US based financial institution.

An example: forecast wikipedia daily pageviews for Google I/O

Step 1. Create the dataset

BigQuery hosts hourly wikipedia page view data across all languages. As a first step, we aggregate them by day and all languages.

code_block[StructValue([(u’code’, u”CREATE OR REPLACE TABLE `bqml_tutorial.googleio_page_views`rnASrnSELECTrn DATETIME_TRUNC(datehour, DAY) AS date,rn SUM(views) AS viewsrnFROMrn `bigquery-public-data.wikipedia.pageviews_*`rnWHERErn datehour >= ‘2017-01-01’rn AND datehour < ‘2023-01-01’rn AND title = ‘Google_I/O’rnGROUP BYrn DATETIME_TRUNC(datehour, DAY)”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea499868750>)])]

Step 2: Forecast without custom holiday

Now we do a regular forecast. We use the daily page view data from 2017 to 2021 and forecast into the year of 2022.

code_block[StructValue([(u’code’, u”CREATE OR REPLACE MODEL `bqml_tutorial.forecast_googleio`rn OPTIONS (rn model_type = ‘ARIMA_PLUS’,rn holiday_region = ‘US’,rn time_series_timestamp_col = ‘date’,rn time_series_data_col = ‘views’,rn data_frequency = ‘DAILY’,rn horizon = 365)rnASrnSELECTrn *rnFROMrn `bqml_tutorial.googleio_page_views`rnWHERErn date < ‘2022-01-01′;”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea4998a7fd0>)])]

We can visualize the result from ml.explain_forecast using Looker Studio and get the following graph:

As we can see, the forecasting model is capturing the general trend pretty well. However, it is not capturing the increased traffic that are related to previous Google I/O events and not able to generate an accurate forecast for 2022 either.

Step 3: Forecast with custom holiday

As we can see from below, Google I/O happened during these dates between 2017 and 2022. We would like to instruct the forecasting model to consider these dates as well.

code_block[StructValue([(u’code’, u”CREATE OR REPLACE MODEL `bqml_tutorial.forecast_googleio_with_custom_holiday`rn OPTIONS (rn model_type = ‘ARIMA_PLUS’,rn holiday_region = ‘US’,rn time_series_timestamp_col = ‘date’,rn time_series_data_col = ‘views’,rn data_frequency = ‘DAILY’,rn horizon = 365)rnAS (rn training_data AS (rn SELECTrn *rn FROMrn `bqml_tutorial.googleio_page_views`rn WHERErn date < ‘2022-01-01’rn ),rn custom_holiday AS (rn SELECTrn ‘US’ AS region,rn ‘GoogleIO’ AS holiday_name,rn primary_date,rn 1 AS preholiday_days,rn 2 AS postholiday_daysrn FROMrn UNNEST(rn [rn DATE(‘2017-05-17’),rn DATE(‘2018-05-08’),rn DATE(‘2019-05-07’),rn — cancelled in 2020 due to pandemicrn DATE(‘2021-05-18’),rn DATE(‘2022-05-11′)])rn AS primary_datern )rn);”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea4998a7790>)])]

As we can see, we provide a full list of Google I/O’s event dates to our forecasting model. Besides, we also adjust the holiday effect window to cover four  days around the event date to better capture some potential view traffic before and after the event.

After visualizing in Looker Studio, we get the following chart:

As we can see from the chart, our custom holiday significantly helped boost the performance of our forecasting model and now it is perfectly capturing the increase of page views caused by Google I/O.

Step 4: Explain fine-grained holiday effect

You can further inspect the holiday effect contributed by each individual holidays by using ml.explain_forecast:

code_block[StructValue([(u’code’, u’SELECTrn time_series_timestamp,rn holiday_effect_GoogleIO,rn holiday_effect_US_Juneteenth,rn holiday_effect_Christmas,rn holiday_effect_NewYearrnFROMrn ml.explain_forecast(rn modelrn bqml_tutorial.forecast_googleio_with_custom_holiday,rn STRUCT(365 AS horizon))rnWHERE holiday_effect != 0;’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea4999dd110>)])]

The results look similar to the following. As we can see, Google I/O indeed contributes a great amount of holiday effect to the overall forecast result for those custom holidays.

Step 5: Compare model performance

At the end, we use ml.evaluate to compare the performance of the previous model created without custom holiday and the new model created with custom holiday. Specifically, we would like to see how the new model performs when it comes to forecasting a future custom holiday, and hence we are setting the time range on the week of Google I/O in 2022.

code_block[StructValue([(u’code’, u’SELECTrn “original” AS model_type,rn *rnFROMrn ml.evaluate(rn MODEL bqml_tutorial.forecast_googleio,rn (rn SELECTrn *rn FROMrn `bqml_tutorial.googleio_page_views`rn WHERErn date >= ‘2022-05-08’rn AND date < ‘2022-05-12’rn ),rn STRUCT(rn 365 AS horizon,rn TRUE AS perform_aggregation))rnUNION ALLrnSELECTrn “with_custom_holiday” AS model_type,rn *rnFROMrn ml.evaluate(rn MODELrn bqml_tutorial.forecast_googleio_with_custom_holiday,rn (rn SELECTrn *rn FROMrn `bqml_tutorial.googleio_page_views`rn WHERErn date >= ‘2022-05-08’rn AND date < ‘2022-05-12’rn ),rn STRUCT(rn 365 AS horizon,rn TRUE AS perform_aggregation));’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ea4a4c6f9d0>)])]

We get the following result, which demonstrates the great performance boost of the new model:

Conclusion

In the previous example, we demonstrated how to use custom holidays in forecasting and evaluate its impact on a forecasting model. The public dataset and the ML.HOLIDAY_INFO table value function is also helpful for understanding what holidays are used to fit your model. Some gains brought by this feature are as follows:

You can configure custom holidays easily using standard GoogleSQL, enjoying BigQuery scalability, data governance, etc.

You get elevated transparency and explainability of time series forecasting in BigQuery.

What’s next?

Custom holiday modeling in forecasting models is now available for you to try in preview. Check out the tutorial in BigQuery ML to learn how to use it. For more information, please refer to the documentation.

Acknowledgements: Thanks to Xi Cheng, Haoming Chen, Jiashang Liu, Amir Hormati, Mingge Deng, Eric Schmidt and Abhinav Khushraj from the BigQuery ML team. Also thanks to Weijie Shen, Jean Ortega from the Fargo team of Resource Efficiency Data Science team.

Related Article

How to do multivariate time series forecasting in BigQuery ML

Multivariate time series forecasting allows BigQuery users to use external covariate along with target metric for forecasting.

Read Article

Cloud BlogRead More

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments