The aim of this report is to perform some analysis on the electricity consumption data of Turkey, to fit AR & MA models to the data and finally to provide 24 hour ahead forecasts. The data used belongs to EPIAS (Enerji Piyasalari Isletme A.S).
The source code can be found here.
Preparation part consists of installing the required packages, reading and taking a first look at the data.
# requiring the necessary packages
library(dplyr)
library(ggplot2)
library(ggfortify)
library(lubridate)
library(forecast)
library(timeSeries)
library(zoo)
# reading the data
df <- read.csv('gercek_tuketim.csv', sep = '"', header = TRUE)
head(df)
## X Tarih. X.1 Saat X.2 X. X.3 Tuketim.Miktari..MWh. X.4 X.5 X.6
## 1 NA 01.01.2016, NA 00:00 NA , NA 26.277,24 NA NA NA
## 2 NA 01.01.2016, NA 01:00 NA , NA 24.991,82 NA NA NA
## 3 NA 01.01.2016, NA 02:00 NA , NA 23.532,61 NA NA NA
## 4 NA 01.01.2016, NA 03:00 NA , NA 22.464,78 NA NA NA
## 5 NA 01.01.2016, NA 04:00 NA , NA 22.002,91 NA NA NA
## 6 NA 01.01.2016, NA 05:00 NA , NA 21.957,08 NA NA NA
Here are the problems that draws the attention after the first look into the data:
The date and hour columns have to be merged.
There are several nonsense columns.
Some renaming and formatting has to be done.
To solve these problems, some data manpulation had to be done.
Data manipulations for the dataset can be found below.
# tidying the data
df <- df %>%
select(c('Tarih.', 'Saat', 'Tuketim.Miktari..MWh.')) %>%
rename(Date = 'Tarih.', Hour = 'Saat', Consumption = 'Tuketim.Miktari..MWh.')
df$Date <- gsub(",", "", df$Date)
df$Consumption <- as.numeric(gsub(",", ".", gsub("\\.", "", df$Consumption)))
df <- df %>%
mutate(Datetime = as.POSIXct(paste(df$Date, df$Hour),format="%d.%m.%Y %H:%M"), 'GMT') %>%
select(Datetime, Consumption)
Let’s have a look at the final version of the table.
head(df)
## Datetime Consumption
## 1 2016-01-01 00:00:00 26277.24
## 2 2016-01-01 01:00:00 24991.82
## 3 2016-01-01 02:00:00 23532.61
## 4 2016-01-01 03:00:00 22464.78
## 5 2016-01-01 04:00:00 22002.91
## 6 2016-01-01 05:00:00 21957.08
At this moment dataset is ready to be used for the analysis.
Before starting to analyze the underlying trends and seasonalities in the data, something caught my attention while I was trying to plot the data. Although I downloaded the data directly from EPIAS, there was a missing day. To find the missing day, I wrote the following piece of code:
# missing data
missing_day <- setdiff(seq(as.Date('2016-01-01'),as.Date('2020-04-26'), by=1), as_date(df$Datetime))
as.Date(missing_day, origin="1970-01-01")
## [1] "2016-03-27"
As you can see, the data corresponding the day ‘27-03-2016’ is missing. After this fun fact, we can begin with the analysis.
Before decomposing the data and fitting any models to it, we should examine the trends and seasonalities by plotting the data in different ways.
First of all, to plot the hourly consumption data:
Figure 1: Hourly Consumption
By looking at the Figure 1, there seems to be some seasonality, however, to understand better we need more plots. Let’s see the daily average values on a plot.
Figure 2: Daily Consumption
Let’s also plot monthly averages for a cleaner plot.
Figure 3: Monthly Consumption
By looking at this plot, we can say that electricity consumption in Turkey tends to increase in January and June, however, it tends to decrease in October.
These graphs did give us a better understanding of the data, however, to better understand the seasonalities in the data, we need more graphs.
Let’s see how consumption behaves in different hours of a day:
Figure 4: Consumption in Different Hours
As wee can see from Figure 4, the consumption decreases drastically between 11pm and 6 am.
Similarly, let’s plot how consumption behaves in different days of a week.
Figure 5: Consumption in Different Days
We can clearly see that consumption level decreases on Saturday and Sunday by looking at the Figure 5.
Finally, let’s see how consumption behaves in different months of a year.
Figure 6: Consumption in Different Months
Again, consumption seems to increase in January, July and August and seems to decrease in April and October.
Now it is time to decompose the time series data and fit two models to it.
Since hourly data generally have a daily seasonality, a weekly seasonality and an annual seasonality, I have chosen to create an forecast::msts
object which is and extended version of forecast::ts
object which allows multiple seasonal periods as input. More about forecast::msts
object can be found here.
Figure 7: Creation of msts Object
Now we can deasonalize the data using the forecast::mstl()
function. forecast::mstl()
function is also the multiple seasons version of the forecast::stl()
function.
Let’s see the components of the data:
Figure 8: Components of Data
Now to remove the seasonal and trend components from the data, let’s use forecast::seasadj()
function to remove the seasonal components and differencing multiple times to remove the trend.
Figure 9: Stationary Version of Data
Now the data seems to be stationary after all of these steps and it is time to fit the models.
After multiple tries, I decided to use an ARIMA(8,0,0) model which gave me the lowest AIC value.
fit_AR <- Arima(consumption, order = c(8, 0, 0))
Let’s see the model’s properties:
fit_AR
## Series: consumption
## ARIMA(8,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4 ar5 ar6 ar7 ar8
## 1.7419 -0.9631 0.3121 -0.4426 0.5916 -0.2932 -0.1562 0.1593
## s.e. 0.0051 0.0103 0.0113 0.0110 0.0110 0.0113 0.0103 0.0051
## mean
## 32694.0259
## s.e. 94.2521
##
## sigma^2 estimated as 851992: log likelihood=-312115.3
## AIC=624250.7 AICc=624250.7 BIC=624336.1
Let’s plot the fitted model on top of the actual data.
Figure 10: AR Model Fitted
After multiple tries, I decided to use an ARIMA(0,0,8) model which gave me the lowest AIC value.
fit_MA <- Arima(consumption, order = c(0, 0, 8))
Let’s see the model’s properties:
fit_MA
## Series: consumption
## ARIMA(0,0,8) with non-zero mean
##
## Coefficients:
## ma1 ma2 ma3 ma4 ma5 ma6 ma7 ma8
## 1.7299 1.9592 1.9948 1.6624 1.4739 1.3162 0.8532 0.2639
## s.e. 0.0048 0.0085 0.0107 0.0131 0.0153 0.0145 0.0106 0.0049
## mean
## 32692.8562
## s.e. 59.4398
##
## sigma^2 estimated as 890595: log likelihood=-312954.5
## AIC=625928.9 AICc=625928.9 BIC=626014.3
Let’s plot the fitted model on top of the actual data.
Figure 11: MA Model Fitted
Since AR model gives us as lower AIC value, I am going to make the 24 ahead forecasts with the AR model.
Figure 12: MA Model Fitted
Figure 12 displays the 24 hour forecasts along with the past 7 days’ data.
In this report, the electricity consumption data was examined, decomposed and used for fitting AR and MA models. The AR model found to be better between the two and was chosen to provide forecasts with. The nature of these algorithms prevented better forecasts however using ARIMA models with all 3 parameters might provide better forecasts.