Eric D. Brown / Oct 08 2019

Forecasting Time Series Data With Prophet III

Introduction

This is the third in a series of posts about using Prophet to forecast time series data. Follow this link for parts 1 & 2 of Forecasting Time-Series Data With Prophet.

In those previous posts, I looked at forecasting monthly sales data 24 months into the future. In this post, I wanted to look at using the ‘holiday’ construct found within the Prophet library to try to better forecast around specific events. If we look at our sales data (you can find it here), there’s an obvious pattern each December. That pattern could be for a variety of reasons, but lets assume that its due to a promotion that is run every December.

Import necessary libraries

import pandas as pd
import numpy as np
from fbprophet import Prophet
import matplotlib.pyplot as plt
 
plt.rcParams['figure.figsize']=(20,10)
plt.style.use('ggplot')

Matplotlib must be manually registered with Pandas due to a conflict between Prophet and Pandas.

pd.plotting.register_matplotlib_converters()

Read in the data

Read the data in from the retail sales CSV file in the examples folder then set the index to the 'date' column. We are also parsing dates in the data file.

retail_sales.csv
sales_df = pd.read_csv(
retail_sales.csv
, index_col='date', parse_dates=True)
sales_df.head()
datesales
2009-10-01338630
2009-11-01339386
2009-12-01400264
2010-01-01314640
2010-02-01311022
5 items

Prepare for Prophet

As explained in previous prophet posts, for prophet to work, we need to change the names of these columns to ds and y.

df = sales_df.reset_index()
df.head()
datesales
02009-10-01338630
12009-11-01339386
22009-12-01400264
32010-01-01314640
42010-02-01311022
5 items

Let's rename the columns as required by fbprophet. Additioinally, fbprophet doesn't like the index to be a datetime...it wants to see ds as a non-index column, so we won't set an index differnetly than the integer index.

df=df.rename(columns={'date':'ds', 'sales':'y'})
df.head()
dsy
02009-10-01338630
12009-11-01339386
22009-12-01400264
32010-01-01314640
42010-02-01311022
5 items

Now's a good time to take a look at your data. Plot the data using Pandas' plot function

df.set_index('ds').y.plot().figure

Reviewing the Data

We can see from this data that there is a spike in the same month each year. While spike could be due to many different reasons, let's assume its because there's a major promotion that this company runs every year at that time, which is in December for this dataset.

Because we know this promotion occurs every December, we want to use this knowledge to help prophet better forecast those months, so we'll use Prohpet's holiday construct (explained here).

The holiday constrict is a Pandas dataframe with the holiday and date of the holiday. For this example, the construct would look like this:

promotions = pd.DataFrame({
  'holiday': 'december_promotion',
  'ds': pd.to_datetime(['2009-12-01', '2010-12-01', '2011-12-01', '2012-12-01',
                        '2013-12-01', '2014-12-01', '2015-12-01']),
  'lower_window': 0,
  'upper_window': 0,
})

This promotions dataframe consisists of promotion dates for Dec in 2009 through 2015. The lower_window and upper_window values are set to zero to indicate that we don't want Prophet to consider any other months than the ones listed.

promotions
holidaydslower_windowupper_window
0december_promotion2009-12-0100
1december_promotion2010-12-0100
2december_promotion2011-12-0100
3december_promotion2012-12-0100
4december_promotion2013-12-0100
5december_promotion2014-12-0100
6december_promotion2015-12-0100
7 items

To continue, we need to log-transform our data:

df['y'] = np.log(df['y'])
df.tail()
dsy
672015-05-0113.044650453675313
682015-06-0113.013059541513272
692015-07-0113.033991074775358
702015-08-0113.030993424699561
712015-09-0112.973670775134828
5 items

Running Prophet

Now, let's set Prophet up to begin modeling our data using our promotions dataframe as part of the forecast

Note: Since we are using monthly data, you'll see a message from Prophet saying Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. This is OK since we are working with monthly data but you can disable it by using weekly_seasonality=True in the instantiation of Prophet.

model = Prophet(holidays=promotions, weekly_seasonality=True, daily_seasonality=True)
model.fit(df)
<fbprophet.fo...x7f0c5691fc50>

We've instantiated the model, now we need to build some future dates to forecast into.

future = model.make_future_dataframe(periods=24, freq = 'm')
future.tail()
ds
912017-04-30
922017-05-31
932017-06-30
942017-07-31
952017-08-31
5 items

To forecast this future data, we need to run it through Prophet's model.

forecast = model.predict(future)

The resulting forecast dataframe contains quite a bit of data, but we really only care about a few columns. First, let's look at the full dataframe:

forecast.tail()

We really only want to look at yhat, yhat_lower and yhat_upper, so we can do that with:

forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
dsyhatyhat_loweryhat_upper
912017-04-3013.06909238539388712.66634716046731613.461268900286738
922017-05-3113.07188316343372512.63734652984466313.491574989616737
932017-06-3013.0577348335148912.59004162987984713.506115228104957
942017-07-3113.0646618720163312.56565792230639113.540365994068482
952017-08-3113.01239325708700512.47934557738160613.525504713353449
5 items

Plotting Prophet results

Prophet has a plotting mechanism called plot. This plot functionality draws the original data (black dots), the model (blue line) and the error of the forecast (shaded blue area).

model.plot(forecast);

Personally, I'm not a fan of this visualization but I'm not going to build my own...you can see how I do that here.

Additionally, Prophet lets us take a at the components of our model, including the holidays. This component plot is an important plot as it lets you see the components of your model including the trend and seasonality (identified in the yearly pane).

model.plot_components(forecast);

Comparing holidays vs no-holidays forecasts

Let's re-run our prophet model without our promotions/holidays for comparison.

model_no_holiday = Prophet()
model_no_holiday.fit(df);
<fbprophet.fo...x7f0c563c4898>
future_no_holiday = model_no_holiday.make_future_dataframe(periods=24, freq = 'm')
future_no_holiday.tail()
ds
912017-04-30
922017-05-31
932017-06-30
942017-07-31
952017-08-31
5 items
forecast_no_holiday = model_no_holiday.predict(future)

Let's compare the two forecasts now. Note: I doubt there will be much difference in these models due to the small amount of data, but its a good example to see the process. We'll set the indexes and then join the forecast dataframes into a new dataframe called compared_df.

forecast.set_index('ds', inplace=True)
forecast_no_holiday.set_index('ds', inplace=True)
compared_df = forecast.join(forecast_no_holiday, rsuffix="_no_holiday")

We are only really interested in the yhat values, so let's remove all the rest and convert the logged values back to their original scale.

compared_df= np.exp(compared_df[['yhat', 'yhat_no_holiday']])

Now, let's take the percentage difference and the average difference for the model with holidays vs that without.

compared_df['diff_per'] = 100*(compared_df['yhat'] - compared_df['yhat_no_holiday']) / compared_df['yhat_no_holiday']
compared_df.tail()
dsyhatyhat_no_holidaydiff_per
2017-04-30474061.52194792114469583.265601533350.9536660853216669
2017-05-31475386.3702518066467836.52374046791.6137787727593058
2017-06-30468707.8037344029477502.74244912295-1.8418614036875616
2017-07-31471965.8319525281467920.138080587670.8646120443834493
2017-08-31447930.451468062454689.61942474794-1.4865454736436081
5 items
compared_df['diff_per'].mean()
31627.529378734773

This isn't an enormous difference, (<1%) but there is some difference between using holidays and not using holidays.

If you know there are holidays or events happening that might help/hurt your forecasting efforts, prophet allows you to easily incorporate them into your modeling.