Time Series Prediction using Statsmodels

Pranati Bhanu
3 min readApr 4, 2020

--

Photo by Saffu on Unsplash

Introduction

Time series is nothing but a sequence of data observations, measured at successive time points. For example the price of a share every day at 10 am. Suppose we have time series data for a share price from 1st march till 31st march then we can train a time series model using this data and predict or forecast the share price on 1st April or later dates.

Few of the popular time series analysis models are as below:

  1. ARIMA (Auto Regressive Integrated Moving Average)
  2. SARIMA (Seasonal Auto Regressive Integrated Moving-Average)
  3. SES (Simple Exponential Smoothing)
  4. HWES (Holt Winter’s Exponential Smoothing)
  5. VAR (Vector Auto Regression)

In this article we will be discussing Auto Regressive Integrated Moving Average (ARIMA) and implementing it using statsmodels, one of the python statistical modelling package.

ARIMA is a time series modelling technique that predicts the future values of a series based on the series inertia. ARIMA works even if there is a small amount of data points are present ( as low as 50). It works best when the data exhibits a consistent pattern over time having minimum amount of outliers.

Implementation

Let us import the required libraries first. For this implementation we would need pandas, numpy, datetime and ARIMA as imported below.

import pandas as pd
import numpy as np
from statsmodels.tsa.arima_model import ARIMA
from datetime import datetime

After that we need to read the time series data. The data is stored in the csv file. We are going to read the csv file using pandas. I also created a datetime parser method to parse the datetime in the required format.

def parser(x):
return datetime.strptime(x, '%Y%m%d')
input_series = pd.read_csv('Time-Series-Data.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
input_series.head()

Below is the screen shot of the head of the data. Note that the data has two rows, first is date rows and second is value of the data on that day.

time series data top 5 rows

Implementing the model

We have already imported the ARIMA model from statsmodels above. Here we are going to instantiate the ARIMA model. The model instance is created as model_arima and we are calling the fit method on this instance and storing the result in model_fitted variable.

model_arima = ARIMA(input_series, order=(0,1,1))
model_fitted = model_arima.fit(disp=0)
print(model_fitted.summary())

In the above screenshot you can see the ARIMA model results that is produced when we trained the model.

Making Prediction

After training the model successfully its time to predict the values. You can see that below we are predicting the values that is just one step ahead in time where the training data ends. So for example if the training data ends on 31st march , then the below is the prediction value for 1st April.

Time Series Prediction

Conclusion

In this article we discussed, what is a Time Series data and how to can we predict the next value in the series using a machine learning model.

We Implemented one of the popular machine learning model named ARIMA.

I hope you find this article useful.

--

--

Pranati Bhanu

Data Scientist. Working with data to find the insights.