Kaggle M5 Forecasting:传统预测方法与机器学习预测方法对比(二)-阿里云开发者社区

Kaggle M5 Forecasting:传统预测方法与机器学习预测方法对比(二)

2022-12-13 229

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： Kaggle M5 Forecasting:传统预测方法与机器学习预测方法对比(二)

双指数平滑方法

单指数平滑方法只使用了一个平滑系数，而双指数平滑方法则引入了第二个平滑系数，以反映数据的趋势。使用双指数平滑方法，我们需要定义 seasonal_periods。

具体代码如下：
t0 = time.time()
model_name='Double Exponential Smoothing'
#train
doubleExpSmooth_model = ExponentialSmoothing(train['demand'],trend='add',seasonal_periods=7).fit()
t1 = time.time()-t0
#predict
predictions[model_name] = doubleExpSmooth_model.forecast(28).values
#visualize
fig, ax = plt.subplots(figsize=(25,4))
train[-28:].plot(x='date',y='demand',label='Train',ax=ax)
test.plot(x='date',y='demand',label='Test',ax=ax);
predictions.plot(x='date',y=model_name,label=model_name,ax=ax);
#evaluate
score = np.sqrt(mean_squared_error(predictions[model_name].values, test['demand']))
print('RMSE for {}: {:.4f}'.format(model_name,score))
stats = stats.append({'Model Name':model_name, 'Execution Time':t1, 'RMSE':score},ignore_index=True)

双指数平滑方法预测结果

三指数平滑方法

三指数平滑方法进一步引入了系数以反映数据的趋势及季节性变化。

具体代码如下：

t0 = time.time()
model_name='Triple Exponential Smoothing'
#train
tripleExpSmooth_model = ExponentialSmoothing(train['demand'],trend='add',seasonal='add',seasonal_periods=7).fit()
t1 = time.time()-t0
#predict
predictions[model_name] = tripleExpSmooth_model.forecast(28).values
#visualize
fig, ax = plt.subplots(figsize=(25,4))
train[-28:].plot(x='date',y='demand',label='Train',ax=ax)
test.plot(x='date',y='demand',label='Test',ax=ax);
predictions.plot(x='date',y=model_name,label=model_name,ax=ax);
#evaluate
score = np.sqrt(mean_squared_error(predictions[model_name].values, test['demand']))
print('RMSE for {}: {:.4f}'.format(model_name,score))
stats = stats.append({'Model Name':model_name, 'Execution Time':t1, 'RMSE':score},ignore_index=True)

三指数平滑方法预测结果

从预测结果可以看出，三指数平滑方法能够学习数据的季节性变化特征。

ARIMA

使用 ARIMA 方法，首先需要确定 p,d,q 三个参数。

p 是AR项的顺序。
d 是使时间序列平稳所需的差分次数
q 是MA项的顺序。

自动确定 ARIMA 所需参数

通过调用 auto_arima 包，可以自动确定 ARIMA 所需的参数。

t0 = time.time()
model_name='ARIMA'
arima_model = auto_arima(train['demand'], start_p=0, start_q=0,
                         max_p=14, max_q=3,
                         seasonal=False,
                         d=None, trace=True,random_state=2020,
                         error_action='ignore',   # we don't want to know if an order does not work
                         suppress_warnings=True,  # we don't want convergence warnings
                         stepwise=True)
arima_model.summary()

auto_arima 的计算结果

确定了 p,d,q 参数，就可以进行下一步的训练及预测：

#train
arima_model.fit(train['demand'])
t1 = time.time()-t0
#predict
predictions[model_name] = arima_model.predict(n_periods=28)
#visualize
fig, ax = plt.subplots(figsize=(25,4))
train[-28:].plot(x='date',y='demand',label='Train',ax=ax)
test.plot(x='date',y='demand',label='Test',ax=ax);
predictions.plot(x='date',y=model_name,label=model_name,ax=ax);
#evaluate
score = np.sqrt(mean_squared_error(predictions[model_name].values, test['demand']))
print('RMSE for {}: {:.4f}'.format(model_name,score))
stats = stats.append({'Model Name':model_name, 'Execution Time':t1, 'RMSE':score},ignore_index=True)

ARIMA 预测结果

这里使用的简单ARIMA模型不考虑季节性，是一个（5，1，3）模型。这意味着它使用5个滞后来预测当前值。移动窗口的大小等于 1，即滞后预测误差的数量等于1。使时间序列平稳所需的差分次数为 3。

SARIMA

SARIMA 是 ARIMA 的发展，进一步引入了相关参数以使得模型能够反映数据的季节变化特征。

通过 auto_arima 相关代码，将参数设置为 seasonal=True,m=7，自动计算 SARIMA 所需的参数。

t0 = time.time()
model_name='SARIMA'
sarima_model = auto_arima(train['demand'], start_p=0, start_q=0,
                         max_p=14, max_q=3,
                         seasonal=True, m=7,
                         d=None, trace=True,random_state=2020,
                         out_of_sample_size=28,
                         error_action='ignore',   # we don't want to know if an order does not work
                         suppress_warnings=True,  # we don't want convergence warnings
                         stepwise=True)
sarima_model.summary()

auto_arima 计算结果

确定了参数后，接下来进行训练及预测：

#train
sarima_model.fit(train['demand'])
t1 = time.time()-t0
#predict
predictions[model_name] = sarima_model.predict(n_periods=28)
#visualize
fig, ax = plt.subplots(figsize=(25,4))
train[-28:].plot(x='date',y='demand',label='Train',ax=ax)
test.plot(x='date',y='demand',label='Test',ax=ax);
predictions.plot(x='date',y=model_name,label=model_name,ax=ax);
#evaluate
score = np.sqrt(mean_squared_error(predictions[model_name].values, test['demand']))
print('RMSE for {}: {:.4f}'.format(model_name,score))
stats = stats.append({'Model Name':model_name, 'Execution Time':t1, 'RMSE':score},ignore_index=True)

SARIMA 预测结果

SARIMAX

使用前面的方法，我们只能基于前面的历史数据进行预测。在 SARIMAX 中引入外生回归因子（eXogenous regressors），可以实现对时间序列数据以外的数据的分析。

本例中，我们引入 sell_price 数据以辅助更好地预测。

t0 = time.time()
model_name='SARIMAX'
sarimax_model = auto_arima(train['demand'], start_p=0, start_q=0,
                         max_p=14, max_q=3,
                         seasonal=True, m=7,
                         exogenous = train[['sell_price']].values,
                         d=None, trace=True,random_state=2020,
                         out_of_sample_size=28,
                         error_action='ignore',   # we don't want to know if an order does not work
                         suppress_warnings=True,  # we don't want convergence warnings
                         stepwise=True)
sarimax_model.summary()

auto_arima 计算结果

通过 auto_arima 自动计算了 SARIMAX 方法所需的参数后，可以直接进行训练和预测。

#train
sarimax_model.fit(train['demand'])
t1 = time.time()-t0
#predict
predictions[model_name] = sarimax_model.predict(n_periods=28)
#visualize
fig, ax = plt.subplots(figsize=(25,4))
train[-28:].plot(x='date',y='demand',label='Train',ax=ax)
test.plot(x='date',y='demand',label='Test',ax=ax);
predictions.plot(x='date',y=model_name,label=model_name,ax=ax);
#evaluate
score = np.sqrt(mean_squared_error(predictions[model_name].values, test['demand']))
print('RMSE for {}: {:.4f}'.format(model_name,score))
stats = stats.append({'Model Name':model_name, 'Execution Time':t1, 'RMSE':score},ignore_index=True)

SARIMA 预测结果

从预测结果可以看出，通过分析额外的数据，有助于减少误差。

Kaggle M5 Forecasting:传统预测方法与机器学习预测方法对比(二)

双指数平滑方法

三指数平滑方法

ARIMA

自动确定 ARIMA 所需参数

SARIMA

SARIMAX

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

​Kaggle M5 Forecasting:传统预测方法与机器学习预测方法对比(二)

双指数平滑方法

三指数平滑方法

ARIMA

自动确定 ARIMA 所需参数

SARIMA

SARIMAX

热门文章

最新文章

相关课程

相关电子书

相关实验场景

Kaggle M5 Forecasting:传统预测方法与机器学习预测方法对比(二)