Introduction

The aim of this report is to perform some analysis on the electricity consumption data of Turkey, to fit AR & MA models to the data and finally to provide 24 hour ahead forecasts. The data used belongs to EPIAS (Enerji Piyasalari Isletme A.S).

The source code can be found here.

Preparations

Preparation part consists of installing the required packages, reading and taking a first look at the data.

Installing the packages:

# requiring the necessary packages
library(dplyr)
library(ggplot2)
library(ggfortify)
library(lubridate)
library(forecast)
library(timeSeries)
library(zoo)

Reading the data:

# reading the data
df <- read.csv('gercek_tuketim.csv', sep = '"', header = TRUE)

First look at the data:

head(df)

##    X      Tarih. X.1  Saat X.2 X. X.3 Tuketim.Miktari..MWh. X.4 X.5 X.6
## 1 NA 01.01.2016,  NA 00:00  NA  ,  NA             26.277,24  NA  NA  NA
## 2 NA 01.01.2016,  NA 01:00  NA  ,  NA             24.991,82  NA  NA  NA
## 3 NA 01.01.2016,  NA 02:00  NA  ,  NA             23.532,61  NA  NA  NA
## 4 NA 01.01.2016,  NA 03:00  NA  ,  NA             22.464,78  NA  NA  NA
## 5 NA 01.01.2016,  NA 04:00  NA  ,  NA             22.002,91  NA  NA  NA
## 6 NA 01.01.2016,  NA 05:00  NA  ,  NA             21.957,08  NA  NA  NA

Here are the problems that draws the attention after the first look into the data:

The date and hour columns have to be merged.
There are several nonsense columns.
Some renaming and formatting has to be done.

To solve these problems, some data manpulation had to be done.

Data Manipulations

Data manipulations for the dataset can be found below.

# tidying the data
df <- df %>% 
  select(c('Tarih.', 'Saat', 'Tuketim.Miktari..MWh.')) %>%
  rename(Date = 'Tarih.', Hour = 'Saat', Consumption = 'Tuketim.Miktari..MWh.') 

df$Date <- gsub(",", "", df$Date)
df$Consumption <- as.numeric(gsub(",", ".", gsub("\\.", "", df$Consumption)))

df <- df %>%
  mutate(Datetime = as.POSIXct(paste(df$Date, df$Hour),format="%d.%m.%Y %H:%M"), 'GMT') %>%
  select(Datetime, Consumption)

Let’s have a look at the final version of the table.

head(df)

##              Datetime Consumption
## 1 2016-01-01 00:00:00    26277.24
## 2 2016-01-01 01:00:00    24991.82
## 3 2016-01-01 02:00:00    23532.61
## 4 2016-01-01 03:00:00    22464.78
## 5 2016-01-01 04:00:00    22002.91
## 6 2016-01-01 05:00:00    21957.08

At this moment dataset is ready to be used for the analysis.

Analysis

Before starting to analyze the underlying trends and seasonalities in the data, something caught my attention while I was trying to plot the data. Although I downloaded the data directly from EPIAS, there was a missing day. To find the missing day, I wrote the following piece of code:

# missing data
missing_day <- setdiff(seq(as.Date('2016-01-01'),as.Date('2020-04-26'), by=1), as_date(df$Datetime))
as.Date(missing_day, origin="1970-01-01")

## [1] "2016-03-27"

As you can see, the data corresponding the day ‘27-03-2016’ is missing. After this fun fact, we can begin with the analysis.

Plotting the Data

Before decomposing the data and fitting any models to it, we should examine the trends and seasonalities by plotting the data in different ways.

First of all, to plot the hourly consumption data:

Figure 1: Hourly Consumption

By looking at the Figure 1, there seems to be some seasonality, however, to understand better we need more plots. Let’s see the daily average values on a plot.

Figure 2: Daily Consumption

Let’s also plot monthly averages for a cleaner plot.

Figure 3: Monthly Consumption

By looking at this plot, we can say that electricity consumption in Turkey tends to increase in January and June, however, it tends to decrease in October.

These graphs did give us a better understanding of the data, however, to better understand the seasonalities in the data, we need more graphs.

Let’s see how consumption behaves in different hours of a day:

Figure 4: Consumption in Different Hours

As wee can see from Figure 4, the consumption decreases drastically between 11pm and 6 am.

Similarly, let’s plot how consumption behaves in different days of a week.

Figure 5: Consumption in Different Days

We can clearly see that consumption level decreases on Saturday and Sunday by looking at the Figure 5.

Finally, let’s see how consumption behaves in different months of a year.

Figure 6: Consumption in Different Months

Again, consumption seems to increase in January, July and August and seems to decrease in April and October.

Now it is time to decompose the time series data and fit two models to it.

Decomposing the Data

Since hourly data generally have a daily seasonality, a weekly seasonality and an annual seasonality, I have chosen to create an forecast::msts object which is and extended version of forecast::ts object which allows multiple seasonal periods as input. More about forecast::msts object can be found here.

Figure 7: Creation of msts Object

Now we can deasonalize the data using the forecast::mstl() function. forecast::mstl() function is also the multiple seasons version of the forecast::stl() function.

Let’s see the components of the data:

Figure 8: Components of Data

Now to remove the seasonal and trend components from the data, let’s use forecast::seasadj() function to remove the seasonal components and differencing multiple times to remove the trend.

Figure 9: Stationary Version of Data

Now the data seems to be stationary after all of these steps and it is time to fit the models.

Fitting the Models

AR Model

After multiple tries, I decided to use an ARIMA(8,0,0) model which gave me the lowest AIC value.

fit_AR <- Arima(consumption, order = c(8, 0, 0))

Let’s see the model’s properties:

fit_AR

## Series: consumption 
## ARIMA(8,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1      ar2     ar3      ar4     ar5      ar6      ar7     ar8
##       1.7419  -0.9631  0.3121  -0.4426  0.5916  -0.2932  -0.1562  0.1593
## s.e.  0.0051   0.0103  0.0113   0.0110  0.0110   0.0113   0.0103  0.0051
##             mean
##       32694.0259
## s.e.     94.2521
## 
## sigma^2 estimated as 851992:  log likelihood=-312115.3
## AIC=624250.7   AICc=624250.7   BIC=624336.1

Let’s plot the fitted model on top of the actual data.

Figure 10: AR Model Fitted

MA Model

After multiple tries, I decided to use an ARIMA(0,0,8) model which gave me the lowest AIC value.

fit_MA <- Arima(consumption, order = c(0, 0, 8))

Let’s see the model’s properties:

fit_MA

## Series: consumption 
## ARIMA(0,0,8) with non-zero mean 
## 
## Coefficients:
##          ma1     ma2     ma3     ma4     ma5     ma6     ma7     ma8
##       1.7299  1.9592  1.9948  1.6624  1.4739  1.3162  0.8532  0.2639
## s.e.  0.0048  0.0085  0.0107  0.0131  0.0153  0.0145  0.0106  0.0049
##             mean
##       32692.8562
## s.e.     59.4398
## 
## sigma^2 estimated as 890595:  log likelihood=-312954.5
## AIC=625928.9   AICc=625928.9   BIC=626014.3

Let’s plot the fitted model on top of the actual data.

Figure 11: MA Model Fitted

Forecasting

Since AR model gives us as lower AIC value, I am going to make the 24 ahead forecasts with the AR model.

Figure 12: MA Model Fitted

Figure 12 displays the 24 hour forecasts along with the past 7 days’ data.

Summary

In this report, the electricity consumption data was examined, decomposed and used for fitting AR and MA models. The AR model found to be better between the two and was chosen to provide forecasts with. The nature of these algorithms prevented better forecasts however using ARIMA models with all 3 parameters might provide better forecasts.

Forecasting Electricity Consumption with AR & MA Models

Baran Dogru - IE360 - Spring 2020