Times Series Analysis: ARIMA Models in Python

Time series analysis is widely used for forecasting and predicting future points in a time series. AutoRegressive Integrated Moving Average (ARIMA) models are widely used for time series forecasting and are considered one of the most popular approaches. In this tutorial, we will learn how to build and evaluate ARIMA models for time series forecasting in Python.

What is an ARIMA Model?

The ARIMA model is a statistical model utilized for analyzing and predicting time series data. The ARIMA approach explicitly caters to standard structures found in time series, providing a simple yet powerful method for making skillful time series forecasts.

ARIMA stands for AutoRegressive Integrated Moving Average. It combines three key aspects:

  • Autoregression (AR): A model that uses the correlation between the current observation and lagged observations. The number of lagged observations is referred to as the lag order or p.
  • Integrated (I): The use of differencing of raw observations to make the time series stationary. The number of differencing operations is referred to as d.
  • Moving Average (MA): A model takes into account the relationship between the current observation and the residual errors from a moving average model applied to past observations. The size of the moving average window is the order or q.

The ARIMA model is defined with the notation ARIMA(p,d,q) where p, d, and q are substituted with integer values to specify the exact model being used.

Key assumptions when adopting an ARIMA model:

  • The time series was generated from an underlying ARIMA process.
  • The parameters p, d, q must be appropriately specified based on the raw observations.
  • The time series data must be made stationary via differencing before fitting the ARIMA model.
  • The residuals should be uncorrelated and normally distributed if the model fits well.

In summary, the ARIMA model provides a structured and configurable approach for modeling time series data for purposes like forecasting. Next we will look at fitting ARIMA models in Python.

Python Code Example

In this tutorial, we will use Netflix Stock Data from Kaggle to forecast the Netflix stock price using the ARIMA model.

Data Loading

We will load our stock price dataset with the “Date” column as index.

import pandas as pd      net_df = pd.read_csv("Netflix_stock_history.csv", index_col="Date", parse_dates=True)  net_df.head(3)

Times Series Analysis: ARIMA Models in Python

Data Visualization

We can use pandas 'plot' function to visualize the changes in stock price and volume over time. It's clear that the stock prices are increasing exponentially.

net_df[["Close","Volume"]].plot(subplots=True, layout=(2,1));

Times Series Analysis: ARIMA Models in Python

Rolling Forecast ARIMA Model

Our dataset has been split into training and test sets, and we proceeded to train an ARIMA model. The first prediction was then forecasted.

We received a poor outcome with the generic ARIMA model, as it produced a flat line. Therefore, we have decided to try a rolling forecast method.

Note: The code example is a modified version of the notebook by BOGDAN IVANYUK.

from statsmodels.tsa.arima.model import ARIMA  from sklearn.metrics import mean_squared_error, mean_absolute_error  import math      train_data, test_data = net_df[0:int(len(net_df)*0.9)], net_df[int(len(net_df)*0.9):]      train_arima = train_data['Open']  test_arima = test_data['Open']      history = [x for x in train_arima]  y = test_arima  # make first prediction  predictions = list()  model = ARIMA(history, order=(1,1,0))  model_fit = model.fit()  yhat = model_fit.forecast()[0]  predictions.append(yhat)  history.append(y[0])

When dealing with time series data, a rolling forecast is often necessary due to the dependence on prior observations. One way to do this is to re-create the model after each new observation is received.

To keep track of all observations, we can manually maintain a list called history, which initially contains training data and to which new observations are appended each iteration. This approach can help us get an accurate forecasting model.

# rolling forecasts  for i in range(1, len(y)):      # predict      model = ARIMA(history, order=(1,1,0))      model_fit = model.fit()      yhat = model_fit.forecast()[0]      # invert transformed prediction      predictions.append(yhat)      # observation      obs = y[i]      history.append(obs)  

Model Evaluation

Our rolling forecast ARIMA model showed a 100% improvement over simple implementation, yielding impressive results.

# report performance  mse = mean_squared_error(y, predictions)  print('MSE: '+str(mse))  mae = mean_absolute_error(y, predictions)  print('MAE: '+str(mae))  rmse = math.sqrt(mean_squared_error(y, predictions))  print('RMSE: '+str(rmse))
MSE: 116.89611817706545  MAE: 7.690948135967959  RMSE: 10.811850821069696

Let's visualize and compare the actual results to the predicted ones . It's clear that our model has made highly accurate predictions.

import matplotlib.pyplot as plt  plt.figure(figsize=(16,8))  plt.plot(net_df.index[-600:], net_df['Open'].tail(600), color='green', label = 'Train Stock Price')  plt.plot(test_data.index, y, color = 'red', label = 'Real Stock Price')  plt.plot(test_data.index, predictions, color = 'blue', label = 'Predicted Stock Price')  plt.title('Netflix Stock Price Prediction')  plt.xlabel('Time')  plt.ylabel('Netflix Stock Price')  plt.legend()  plt.grid(True)  plt.savefig('arima_model.pdf')  plt.show()  

Times Series Analysis: ARIMA Models in Python Conclusion

In this short tutorial, we provided an overview of ARIMA models and how to implement them in Python for time series forecasting. The ARIMA approach provides a flexible and structured way to model time series data that relies on prior observations as well as past prediction errors. If you're interested in a comprehensive analysis of the ARIMA model and Time Series analysis, I recommend taking a look at Stock Market Forecasting Using Time Series Analysis.
Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master's degree in Technology Management and a bachelor's degree in Telecommunication Engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.

More On This Topic

  • Time Series Forecasting with Ploomber, Arima, Python, and Slurm
  • Codeless Time Series Analysis with KNIME
  • Full cross-validation and generating learning curves for time-series models
  • Multivariate Time Series Analysis with an LSTM based RNN
  • Market Data and News: A Time Series Analysis
  • Create a Time Series Ratio Analysis Dashboard
Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...