0. 데이터 로드
1. 데이터 전처리
2. 데이터 분석
(3) SARIMAX 시계열 분석으로 수요 예측 - 독립변수 일부 반영 (Colab)
- Random Forest에서 feature importances가 높게 나온 8가지 변수 반영
- Local에서는 에러남 (LinAlgError: LU decomposition error.)
- 독립변수 전체 반영은 왜 가능한지 의문..
['count_seasonal', 'weather', 'count_lag2', 'count_diff', 'Quater_ver2', 'Hour', 'workingday', 'DayofWeek']
# Features Select
exog_tr = X_train_feRSM[['count_seasonal', 'weather', 'count_lag2', 'count_diff', 'Quater_ver2', 'Hour', 'workingday', 'DayofWeek']]
exog_te = X_test_feRSM[['count_seasonal', 'weather', 'count_lag2', 'count_diff', 'Quater_ver2', 'Hour', 'workingday', 'DayofWeek']]
# Parameter Setting
trend_diff_order = 0
seasonal_diff_order, seasonal_order = 0, 12
# SARIMAX
fit_ts_sarimax = sm.tsa.SARIMAX(Y_train_feR, trend='c', order=(1,trend_diff_order,1),
seasonal_order=(1,seasonal_diff_order,1,seasonal_order),
exog=exog_tr).fit()
pred_tr_ts_sarimax = fit_ts_sarimax.predict()
pred_te_ts_sarimax = fit_ts_sarimax.get_forecast(len(Y_test_feR), exog=exog_te).predicted_mean
pred_te_ts_sarimax_ci = fit_ts_sarimax.get_forecast(len(Y_test_feR), exog=exog_te).conf_int()
(4) SARIMAX 시계열 분석으로 수요 예측 - 독립변수 전체 반영
# Parameter Setting
trend_diff_order = 0
seasonal_diff_order, seasonal_order = 0, 12
# SARIMAX
fit_ts_sarimax = sm.tsa.SARIMAX(Y_train_feR, trend='c', order=(1,trend_diff_order,1),
seasonal_order=(1,seasonal_diff_order,1,seasonal_order),
exog=X_train_feRSM).fit()
pred_tr_ts_sarimax = fit_ts_sarimax.predict()
pred_te_ts_sarimax = fit_ts_sarimax.get_forecast(len(Y_test_feR), exog=X_test_feRSM).predicted_mean
pred_te_ts_sarimax_ci = fit_ts_sarimax.get_forecast(len(Y_test_feR), exog=X_test_feRSM).conf_int()
(5) Auto-SARIMAX - AIC 기준으로 (p,d,q)(P,D,Q) 선택
# Select variables
exog_tr = X_train_feRSM[['count_seasonal', 'weather', 'count_lag2', 'count_diff', 'Quater_ver2', 'Hour', 'workingday', 'DayofWeek']]
exog_te = X_test_feRSM[['count_seasonal', 'weather', 'count_lag2', 'count_diff', 'Quater_ver2', 'Hour', 'workingday', 'DayofWeek']]
# Parameter Setting
p, q = range(1,3), range(1,3)
d = range(0,1)
P, Q = range(1,3), range(1,3)
D = range(1,2)
m = 12
trend_pdq = list(product(p, d, q))
seasonal_pdq = [(candi[0], candi[1], candi[2], m) for candi in list(product(P, D, Q))]
## SARIMAX
AIC = []
SARIMAX_order = []
for trend_param in tqdm(trend_pdq):
for seasonal_params in seasonal_pdq:
try:
result =sm.tsa.SARIMAX(Y_train_feR, trend='c',
order=trend_param, seasonal_order=seasonal_params, exog=exog_tr).fit()
print('Fit SARIMAX: trend_order={} seasonal_order={} AIC={}, BIC={}'.format(trend_param, seasonal_params, result.aic, result.bic, end='\r'))
AIC.append(result.aic)
SARIMAX_order.append([trend_param, seasonal_params])
except:
continue
# Parameter Selection
print('The smallest AIC is {} for model SARIMAX{}x{}'.format(min(AIC), SARIMAX_order[AIC.index(min(AIC))][0],
SARIMAX_order[AIC.index(min(AIC))][1]))
# Auto-SARIMAX Fitting
fit_ts_sarimax = sm.tsa.SARIMAX(Y_train_feR, trend='c', order=SARIMAX_order[AIC.index(min(AIC))][0],
seasonal_order=SARIMAX_order[AIC.index(min(AIC))][1], exog=exog_tr).fit()
pred_tr_ts_sarimax = fit_ts_sarimax.predict()
pred_te_ts_sarimax = fit_ts_sarimax.get_forecast(len(Y_test_feR), exog=exog_te).predicted_mean
pred_te_ts_sarimax_ci = fit_ts_sarimax.get_forecast(len(Y_test_feR), exog=exog_te).conf_int()
(6) Auto-ARIMA - MemoryError
import pmdarima as pm
# Parameter Setting
trend_diff_order = 0
seasonal_diff_order, seasonal_order = 0, 24
## SARIMAX
fit_ts_autoarima = pm.auto_arima(Y_train_feR,
stationary=False,
with_intercept=True,
start_p=0, d=None, start_q=0,
max_p=2, max_d=1, max_q=2,
seasonal=True, m=24,
start_P=0, D=None, start_Q=0,
max_P=2, max_D=1, max_Q=2,
max_order=30, maxiter=3,
stepwise=False,
exogenous=X_train_feRSM,
information_criterion='aic',
trace=True, suppress_warnings=True)
pred_tr_ts_autoarima = fit_ts_autoarima.predict_in_sample(exogenous=X_train_feRSM)
pred_tr_ts_autoarima = fit_ts_autoarima.predict(n_periods=len(Y_train_feR), exogenous=X_train_feRSM)
pred_te_ts_autoarima = fit_ts_autoarima.predict(n_periods=len(Y_test_feR),
exogenous=X_test_feRSM,
return_conf_int=True)[0]
pred_te_ts_autoarima_ci = fit_ts_autoarima.predict(n_periods=len(Y_test_feR),
exogenous=X_test_feRSM,
return_conf_int=True)[1]
# 검증
Score_ts_autoarima, Resid_tr_ts_autoarima, Resid_te_ts_autoarima = evaluation_trte(Y_train_feR, pred_tr_ts_autoarima,
Y_test_feR, pred_te_ts_autoarima, graph_on=True)
display(Score_ts_autoarima)
ax = pd.DataFrame(Y_test_feR).plot(figsize=(12,4))
pd.DataFrame(pred_te_ts_autoarima,
index=Y_test_feR.index, columns=['prediction']).plot(kind='line',
xlim=(Y_test_feR.index.min(),Y_test_feR.index.max()),
linewidth=3, fontsize=20, ax=ax)
ax.fill_between(pd.DataFrame(pred_te_ts_autoarima_ci, index=Y_test_feR.index).index,
pd.DataFrame(pred_te_ts_autoarima_ci, index=Y_test_feR.index).iloc[:,0],
pd.DataFrame(pred_te_ts_autoarima_ci, index=Y_test_feR.index).iloc[:,1], color='k', alpha=0.15)
plt.show()
# 잔차진단
error_analysis(Resid_tr_ts_autoarima, ['Error'], Y_train_feR, graph_on=True)
3. 분석 결과 비교
display(Score_ts_sarimax)
display(fit_ts_sarimax.summary())
#display(fit_ts_autoarima.summary())
# 검증
Score_ts_sarimax, Resid_tr_ts_sarimax, Resid_te_ts_sarimax = evaluation_trte(Y_train_feR, pred_tr_ts_sarimax,
Y_test_feR, pred_te_ts_sarimax, graph_on=True)
# 시각화
ax = pd.DataFrame(Y_test_feR).plot(figsize=(12,4))
pd.DataFrame(pred_te_ts_sarimax, index=Y_test_feR.index, columns=['prediction']).plot(kind='line',
xlim=(Y_test_feR.index.min(),Y_test_feR.index.max()),
linewidth=3, fontsize=20, ax=ax)
ax.fill_between(pd.DataFrame(pred_te_ts_sarimax_ci, index=Y_test_feR.index).index,
pd.DataFrame(pred_te_ts_sarimax_ci, index=Y_test_feR.index).iloc[:,0],
pd.DataFrame(pred_te_ts_sarimax_ci, index=Y_test_feR.index).iloc[:,1], color='k', alpha=0.15)
plt.show()
# 잔차진단
error_analysis(Resid_tr_ts_sarimax, ['Error'], Y_train_feR, graph_on=True)
(3) TS 분석 결과 - SARIMAX ( X 일부반영 ) / SARIMAX ( X 전체반영 )
'Analysis > Time series' 카테고리의 다른 글
Lecture 17. 비선형 확률과정 (0) | 2021.04.06 |
---|---|
Lecture 16. 다변량 선형확률과정 (0) | 2021.04.05 |
Lecture 14. Kaggle 자전거 수요 예측 (RF/SARIMA) (0) | 2021.04.03 |
Lecture 13. 선형확률과정 분석실습 (0) | 2021.04.03 |
Lecture 12. 적분 선형확률과정 (0) | 2021.04.01 |
댓글