본문 바로가기
Analysis/Time series

Lecture 15. Kaggle 자전거 수요 예측 (SARIMAX / Auto-ARIMA)

by 5ole 2021. 4. 5.

 

 

Lecture 14. Kaggle 자전거 수요 예측 (RF/SARIMA)

머신러닝 분석으로 수요 예측 시계열 분석으로 수요 예측 - SARIMA 0. 데이터 로드 Bike Sharing Demand Forecast use of a city bikeshare system www.kaggle.com 2년치 데이터로 시간당 데이터 축적 Y는 count,..

5ohyun.tistory.com

 

0. 데이터 로드

1. 데이터 전처리

 


 

2. 데이터 분석

 

 

(3) SARIMAX 시계열 분석으로 수요 예측 - 독립변수 일부 반영 (Colab)

 

  • Random Forest에서 feature importances가 높게 나온 8가지 변수 반영
  • Local에서는 에러남 (LinAlgError: LU decomposition error.)
  • 독립변수 전체 반영은 왜 가능한지 의문..

['count_seasonal', 'weather', 'count_lag2', 'count_diff', 'Quater_ver2', 'Hour', 'workingday', 'DayofWeek']

 

 

# Features Select

exog_tr = X_train_feRSM[['count_seasonal', 'weather', 'count_lag2', 'count_diff', 'Quater_ver2', 'Hour', 'workingday', 'DayofWeek']]
exog_te = X_test_feRSM[['count_seasonal', 'weather', 'count_lag2', 'count_diff', 'Quater_ver2', 'Hour', 'workingday', 'DayofWeek']]                             

# Parameter Setting
trend_diff_order = 0
seasonal_diff_order, seasonal_order = 0, 12


# SARIMAX
fit_ts_sarimax = sm.tsa.SARIMAX(Y_train_feR, trend='c', order=(1,trend_diff_order,1), 
                                seasonal_order=(1,seasonal_diff_order,1,seasonal_order), 
                                exog=exog_tr).fit()
                                
pred_tr_ts_sarimax = fit_ts_sarimax.predict()
pred_te_ts_sarimax = fit_ts_sarimax.get_forecast(len(Y_test_feR), exog=exog_te).predicted_mean
pred_te_ts_sarimax_ci = fit_ts_sarimax.get_forecast(len(Y_test_feR), exog=exog_te).conf_int()

 

 

(4) SARIMAX 시계열 분석으로 수요 예측 - 독립변수 전체 반영

 

# Parameter Setting

trend_diff_order = 0
seasonal_diff_order, seasonal_order = 0, 12


# SARIMAX
fit_ts_sarimax = sm.tsa.SARIMAX(Y_train_feR, trend='c', order=(1,trend_diff_order,1), 
                                seasonal_order=(1,seasonal_diff_order,1,seasonal_order), 
                                exog=X_train_feRSM).fit()
                                
pred_tr_ts_sarimax = fit_ts_sarimax.predict()
pred_te_ts_sarimax = fit_ts_sarimax.get_forecast(len(Y_test_feR), exog=X_test_feRSM).predicted_mean
pred_te_ts_sarimax_ci = fit_ts_sarimax.get_forecast(len(Y_test_feR), exog=X_test_feRSM).conf_int()

 

(5) Auto-SARIMAX - AIC 기준으로 (p,d,q)(P,D,Q) 선택

 

# Select variables

exog_tr = X_train_feRSM[['count_seasonal', 'weather', 'count_lag2', 'count_diff', 'Quater_ver2', 'Hour', 'workingday', 'DayofWeek']]
exog_te = X_test_feRSM[['count_seasonal', 'weather', 'count_lag2', 'count_diff', 'Quater_ver2', 'Hour', 'workingday', 'DayofWeek']]                             

# Parameter Setting

p, q = range(1,3), range(1,3)
d = range(0,1)
P, Q = range(1,3), range(1,3)
D = range(1,2)
m = 12
trend_pdq = list(product(p, d, q))
seasonal_pdq = [(candi[0], candi[1], candi[2], m) for candi in list(product(P, D, Q))]

## SARIMAX

AIC = []
SARIMAX_order = []

for trend_param in tqdm(trend_pdq):
    for seasonal_params in seasonal_pdq:
        try:
            result =sm.tsa.SARIMAX(Y_train_feR, trend='c', 
                                   order=trend_param, seasonal_order=seasonal_params, exog=exog_tr).fit()
            print('Fit SARIMAX: trend_order={} seasonal_order={} AIC={}, BIC={}'.format(trend_param, seasonal_params, result.aic, result.bic, end='\r'))
            AIC.append(result.aic)
            SARIMAX_order.append([trend_param, seasonal_params])
        except:
            continue
            
# Parameter Selection
print('The smallest AIC is {} for model SARIMAX{}x{}'.format(min(AIC), SARIMAX_order[AIC.index(min(AIC))][0],
                                                             SARIMAX_order[AIC.index(min(AIC))][1]))


# Auto-SARIMAX Fitting
fit_ts_sarimax = sm.tsa.SARIMAX(Y_train_feR, trend='c', order=SARIMAX_order[AIC.index(min(AIC))][0], 
                                seasonal_order=SARIMAX_order[AIC.index(min(AIC))][1], exog=exog_tr).fit()


pred_tr_ts_sarimax = fit_ts_sarimax.predict()
pred_te_ts_sarimax = fit_ts_sarimax.get_forecast(len(Y_test_feR), exog=exog_te).predicted_mean
pred_te_ts_sarimax_ci = fit_ts_sarimax.get_forecast(len(Y_test_feR), exog=exog_te).conf_int()
    

 

(6) Auto-ARIMA - MemoryError

 

import pmdarima as pm

# Parameter Setting
trend_diff_order = 0
seasonal_diff_order, seasonal_order = 0, 24

## SARIMAX
fit_ts_autoarima = pm.auto_arima(Y_train_feR, 
                                 stationary=False,
                                 with_intercept=True,
                                 start_p=0, d=None, start_q=0,
                                 max_p=2, max_d=1, max_q=2,
                                 seasonal=True, m=24,
                                 start_P=0, D=None, start_Q=0,
                                 max_P=2, max_D=1, max_Q=2,
                                 max_order=30, maxiter=3,
                                 stepwise=False,
                                 exogenous=X_train_feRSM,
                                 information_criterion='aic',
                                 trace=True, suppress_warnings=True)
                                 

pred_tr_ts_autoarima = fit_ts_autoarima.predict_in_sample(exogenous=X_train_feRSM)
pred_tr_ts_autoarima = fit_ts_autoarima.predict(n_periods=len(Y_train_feR), exogenous=X_train_feRSM)
pred_te_ts_autoarima = fit_ts_autoarima.predict(n_periods=len(Y_test_feR), 
                                                exogenous=X_test_feRSM, 
                                                return_conf_int=True)[0]
pred_te_ts_autoarima_ci = fit_ts_autoarima.predict(n_periods=len(Y_test_feR), 
                                                   exogenous=X_test_feRSM, 
                                                   return_conf_int=True)[1]

# 검증
Score_ts_autoarima, Resid_tr_ts_autoarima, Resid_te_ts_autoarima = evaluation_trte(Y_train_feR, pred_tr_ts_autoarima, 
                                                                                   Y_test_feR, pred_te_ts_autoarima, graph_on=True)
display(Score_ts_autoarima)
ax = pd.DataFrame(Y_test_feR).plot(figsize=(12,4))
pd.DataFrame(pred_te_ts_autoarima, 
             index=Y_test_feR.index, columns=['prediction']).plot(kind='line',
                                                                  xlim=(Y_test_feR.index.min(),Y_test_feR.index.max()),
                                                                  linewidth=3, fontsize=20, ax=ax)
ax.fill_between(pd.DataFrame(pred_te_ts_autoarima_ci, index=Y_test_feR.index).index,
                pd.DataFrame(pred_te_ts_autoarima_ci, index=Y_test_feR.index).iloc[:,0],
                pd.DataFrame(pred_te_ts_autoarima_ci, index=Y_test_feR.index).iloc[:,1], color='k', alpha=0.15)
plt.show()

# 잔차진단
error_analysis(Resid_tr_ts_autoarima, ['Error'], Y_train_feR, graph_on=True)

 

3. 분석 결과 비교

 

display(Score_ts_sarimax)

display(fit_ts_sarimax.summary())

#display(fit_ts_autoarima.summary())

# 검증
Score_ts_sarimax, Resid_tr_ts_sarimax, Resid_te_ts_sarimax = evaluation_trte(Y_train_feR, pred_tr_ts_sarimax, 
                                                                             Y_test_feR, pred_te_ts_sarimax, graph_on=True)

# 시각화

ax = pd.DataFrame(Y_test_feR).plot(figsize=(12,4))
pd.DataFrame(pred_te_ts_sarimax, index=Y_test_feR.index, columns=['prediction']).plot(kind='line',
                                                                           xlim=(Y_test_feR.index.min(),Y_test_feR.index.max()),
                                                                           linewidth=3, fontsize=20, ax=ax)
ax.fill_between(pd.DataFrame(pred_te_ts_sarimax_ci, index=Y_test_feR.index).index,
                pd.DataFrame(pred_te_ts_sarimax_ci, index=Y_test_feR.index).iloc[:,0],
                pd.DataFrame(pred_te_ts_sarimax_ci, index=Y_test_feR.index).iloc[:,1], color='k', alpha=0.15)
plt.show()

# 잔차진단
error_analysis(Resid_tr_ts_sarimax, ['Error'], Y_train_feR, graph_on=True)

 

(3) TS 분석 결과 - SARIMAX ( X 일부반영 ) / SARIMAX ( X 전체반영 )

 

 

변수 일부 / 전체

 

변수 일부 / 전체
Train ( 변수 일부 / 전체 )
Test ( 변수 일부 / 전체 )

 

 

변수 일부 반영 / 정상성, 비정상성, 정규분포 아님, 자기상관 있음, 등분산성 아님
변수 전체 반영 / 정상성, 비정상성, 정규분포 아님, 자기상관 있음, 등분산성 아님

 

 

 

댓글