In many countries of the world, production and distribution of electricity is regulated in a market structure in which producers are faced with several rules imposed by market regulator. There are various decisions a producer can make in these markets that have the power to affect market structure and profit of the producer in the short run. One of most important decision making process involves producers to report the amount of electricity that can supplied for the minimum amount of price they are willing to charge in each hour of the next day. The price of electricity for the next day can be determined by various algorithms that take these notified amounts as inputs. Therefore, it can be said that the decisions involved with production have definite influence on the profit or loss of the producer.
Turkish Electricity Market ruled by EPIAS is an example of such electricity markets. In this market, producers are obliged to declare the hourly amount of electricity to be produced and their relative minimum prices until 12 PM of the previous day. As mentioned before, the decision on the amounts are of high importance to the producers. To illustrate this fact basically, it can be said that a producer that have decided to produce a given amount may face higher demand. To provide for need the need of electricity, these producer may buy electricity from another one. However, in this case the producers incurs additional costs for buying electricity and supplying to customers gets less profitable if the producer were to declare a higher amount to be provided. A similar case can also be applied for unforeseen low demand. As can be understood from these examples, deciding on amounts based on a forecast with minimum errors is a primary aim for the producers in the market.
Because of the setting of the market and importance of the decisions, forecasting hourly demand of electricity in a day ahead fashion is a topic that has gained interest by industrial organizations as well as academics over the recent years. The aim of this study is also similar that it can be summarized as developing an hourly electricity load forecasting model by analyzing the consumption and limited temperature data from 2017 till mid-January 2021. The model developed is supposed to be tested on a test period between January 30th and February 13rd while measuring the performances of forecasts provided in a day ahead setting.
To help with forecasting several models from various domains can be constructed. Shah et al. (2019) argued that there are a lot of different techniques that are employed to forecast the hourly demand series such as Auto-Regressive models, Moving Averages, Seasonal ARIMA models, Spline Methods, Exponential Smoothing, Holt – Winters Methods and regression. As well as methods from time series and statistical modeling domains, with the increasing popularity of Machine Learning; forecasting techniques powered by Machine Learning and Data Mining algorithms, neural networks, decision trees and such have also been more widely used in the several past years. Nevertheless, for this study choosing a method from either time series or regression is more suitable.
The choice of the method can be facilitated by stating a couple of facts and observations for general hourly electricity consumption series. First of all, it is implied by Tepedino et al. (2014) that the series exhibit daily, weekly and yearly seasonality. Although methods from both time series and regression is suitable to use for forecasting series with seasonality, this is not the only feature of the electricity consumption series. The observations of electricity consumption may vary because of two other factors; calendar effects and temperature. (Calili et al, 2016) Though temperature levels may have some seasonality in general, the series are also subject to some unexpected behavior that may have especially increased in frequency because of rise in global temperature levels over the last few decades. Regarding the calendar effects, Chapagain et al. (2020) claims that the consumption tends to get lower in national holidays that may even contribute to the decrease in the electricity consumption in the week the holiday occurs. However, it is stated by Ziel (2018) that although the consumption incurred by industrial activities are reduced in holidays, consumption of electricity may get higher especially on touristic destinations. Furthermore, in Turkey there are also religious holidays which occur at different dates each year, another feature of the series that may worsen the performance of time series modelling. Hence, because of the complications introduced by temperature and special days, it is decided that linear regression can be a more suitable candidate to model the hourly electricity consumption of Turkey in this study.
From this discussion it can be inferred that modelling hour of the day, day of the week, yearly seasonality, temperature and occurrence of a special day such as national and religious holidays can be utilized in the model to be produced. Nonetheless, before developing a model it should be useful to analyze the data by proper visualizations to check whether these outlined rules also apply to the series at hand or if the data include additional features that may call for modelling. When the first few observations of the data are inspected,
## Date Hour Consumption T_1 T_2 T_3 T_4 T_5 T_6 T_7 tmax
## 1: 2017-01-01 0 27223.06 -15.88 4.18 0.89 -18.96 -14.77 -10.68 2.16 10.94
## 2: 2017-01-01 1 25825.90 -15.88 4.18 0.89 -18.96 -14.77 -10.68 2.16 10.94
## 3: 2017-01-01 2 24252.68 -15.88 4.18 0.89 -18.96 -14.77 -10.68 2.16 10.94
## 4: 2017-01-01 3 22915.47 -15.88 4.18 0.89 -18.96 -14.77 -10.68 2.16 10.94
## tmin t_avg_max t_avg_min t_avg tdiff index
## 1: -18.96 6.165833 -13.21625 -4.431667 29.9 1
## 2: -18.96 6.165833 -13.21625 -4.431667 29.9 2
## 3: -18.96 6.165833 -13.21625 -4.431667 29.9 3
## 4: -18.96 6.165833 -13.21625 -4.431667 29.9 4
It can be seen that the observations are gathered in an hourly fashion. Moreover, hourly temperature recordings from several destinations in Turkey such as Antalya, İstanbul, Adana, Eskişehir are also provided within the data. The series start from January 1st 2017 and include January 28th 2021 at last. Keeping in mind that the observations are recorded for each hour in each day for approximately 4 years, the visualizations will be constructed by both the hourly series and daily sum series to reduce the complexity introduced by vast number of observations. When the hourly observations are plotted with respect to time,
From this plot it can be understood that the series have yearly seasonality, with increased consumption in winter time and even more increased consumption in summers. Moreover, in almost every year, there are two points in time where the consumption levels get considerably reduced, with one at the beginning and other at the end of the summer. However, these crashes in consumption seem to move a couple days behind as years pass indicating that these might be the religious holidays in which industrial activity gets considerably reduced for at least 2 or 3 days due to many work places closing on these days. Another interesting point is the unexpected decrease in the levels in 2020 before the summer that might be regarded as a consequence of lock-downs. Furthermore, in every once in a while there are clusters where the consumption gets a little lower than the usual, these might be weekends. To more clearly investigate the effects of COVID period and weekends, plot of daily aggregated consumption series can be visualized,
This visualization shows that there is decrease in the level of consumption in 2020 starting from mid-March and ending on the beginning of summer, since the levels at that point seem to be in line with observations from the same period in the past. Furthermore, observations closer to the summer season seem to be less infected compared to the observations in April, signaling a smoothing in the effect of lockdown. It should be reminded that in Turkey, the lockdown triggered by pandemic started approximately in mid to late March, restrictions were increased through March and April, and return to normal life has happened in a gradual manner starting from mid-May with increased pace at the beginning of June. As can be understood, the effect of pandemic on the consumption series in Turkey is mostly in accordance with the lock down in which effects are smoothed over time. To illustrate the effect of the day of week,
As can be seen from the plots above and this detailed plot the level of consumption is somewhat effected by day of the week with decreased consumption in Saturday and even reduced levels in Sunday since probably the industrial activity gets even lower. To check the seasonality from the aspect of serial correlation,
As visualized by the plot, there is high extent of serial correlation between consecutive hours, same hours of the day and same hours of the same day. These auto-correlations can be seen by the peaks in lag 24 and 168. As a consequence it can be stated that the hourly electricity consumption series include daily and weekly seasonality with amount of consumption in hours close in a day being highly positively correlated also.
The auto correlation plot of the daily aggregated series also demonstrate the underlying weekly positive serial correlation in the data, as the correlations arrive at peak levels in multiples of 7. In addition, although the coefficient is lower than the coefficient between consecutive hours, there is still a considerably high extent of positive serial correlation at lag 1 and lag 2, that can be benefited from while constructing the model. When the hourly observations are gathered in a histogram,
It is visible that the distribution might be fitted to a normal distribution with some rule breakers. First of all, the mean level of the consumption seem to be a little ambiguous as most of the classes in the middle have close number of observations. Furthermore, on the left tail of the histogram the number of observations get smaller in a different manner with a longer left tail. This situation might have arose because of the slight increase in trend, sharp crash in lockdown period or higher levels residing in midst of summer which is a comparably shorter season in a year. Nevertheless, explaining the exact reasons require more analysis that fall outside of the scope of this study. Hence, although not perfect, the observations are accepted to be coming from a roughly normal distribution. If year by year histograms of daily summed consumptions are visualized,
From this year by year histograms, it is obvious that in all years the left tail of the histograms are longer of which possible causes are declared before. The observations are not a perfect fit for normal distribution because of nearby classes including closer number of observations. Nevertheless, the year 2019 seem to be a better fit maybe because the market tends to reach a more steady state over time. However, distribution in 2020 is profoundly distorted probably because of the lockdown periods and effect of pandemic decreasing production in spring. Lastly, if daily mean temperature levels of the provided cities of Turkey are visualized,
The plot is very similar to the plot of daily consumption series. However, because of the unexpected or outlier observations in the series it is a better idea to actually include one or more regressors related to temperature levels in the model instead of letting the overlapping seasonality handle the situation. Although the consumption and averaged temperature show a similar yearly seasonality a more proper analysis of the relationship between the two variables will be carried out in the model building section.
From the general facts discussed and analysis, some of the predictors to be used are already decided such as day of week, season or month of the year, indicator for lockdown period, and indicator for special days in calendar along with maybe lagged variables and trend component. Temperature aggregated via some function such as average, max or difference between min and max can also be added to the model. However, before building a model it should be reminded again that the series also show a daily seasonality. Hodge (2020) has detailed the impact of hour of the day in the percentage of consumption indicating that the percentages may vary from season to season or day to day. From these observations, it may be a more practical idea to model the daily consumption series first and then distribute the predictions over hours via some defined vector or linear regression model considering each season or day of the week separately. This approach can be practical and less complex considering the amount of observations reduced. Furthermore from the auto-correlation plot of the hourly series the high serial correlation at lag 168, that is a week, seems to support the argument. In fact if the percentages of the hour by hour consumption in a day is obtained and an auto-correlation plot is visualized,
It can be seen that the auto-correlation at lag 168 is as striking as the one in lag 1. Although there are arguments in favor of the approach, the downside of the approach is that the interactions between the hourly temperature and the consumption percentage will be missing. Though a regression model including temperature as predictor for percentage can be used for distribution the percentages may not add up to 1 and the process may complicate the matters. As DiPersio et al. (2017) have pointed out there are also a lot of models in literature dealing with modelling 24 different time series for each hour of the day. However, the amount of work and the possibility of different predictors for different hours contributes complexity of this approach. Hence, in this study forecasting daily series and partitioning the consumption to 24 hours via proper methods will be the aim.
As a starting step, some variables are introduced to the data that is gathered in an aggregated manner. To introduce the smoothed effect of lock-down, index of days between mid-March till June 2020 is subtracted from the last index in this interval is copied into a column, leaving other dates as 0. It was discussed that the lock-down period have contributed to a decrease in the levels of electricity consumption and that this effect has become less pronounced over time. Therefore, the subtraction refers to this smoothing.
Moreover, although the special days have been discussed before, the exact effects of these days have not been examined. To see whether the consumption is actually decreased or maybe increased in these days the observations are sorted with respect to consumption in a non-decreasing fashion,
## Date Consumption
## 1: 2020-05-24 457272.1
## 2: 2020-05-25 460777.9
## 3: 2020-05-26 485418.7
## 4: 2020-04-12 512841.8
## 5: 2019-06-04 529731.9
## 6: 2020-05-23 531441.5
## 7: 2019-06-05 534571.8
## 8: 2020-04-11 542413.6
From these few observations the effect of lockdown on the amount of electricity consumption is clear. There are also some days from 2019 that are coinciding with a religious holiday. Once the same procedure is carried out with excluding observations after 1st January 2020,
## Date Consumption
## 1: 2019-06-04 529731.9
## 2: 2019-06-05 534571.8
## 3: 2018-06-16 543908.5
## 4: 2018-06-15 549178.6
## 5: 2017-06-25 554357.8
## 6: 2018-06-17 565114.6
## 7: 2019-06-06 566460.8
## 8: 2017-09-01 566858.0
## 9: 2017-06-26 572619.3
## 10: 2017-09-02 586048.2
It is obvious that there is a decrease in the consumption levels in special days. When the days with highest amount of consumed energy is obtained,
## Date Consumption
## 1: 2018-08-07 964024.0
## 2: 2018-08-09 964520.9
## 3: 2017-08-11 964719.7
## 4: 2018-08-08 965719.8
## 5: 2017-08-09 966648.8
## 6: 2019-08-01 968047.9
## 7: 2018-08-01 968743.4
## 8: 2017-07-26 969673.1
## 9: 2018-08-03 977870.3
## 10: 2018-08-02 979215.0
It can be commented that the consumption levels hit a peak in the late July and August, the calendar effects in Turkey may not contribute to much increase in consumption. As a result treating the holidays as a single type of special day for simplicity can be suitable. So, special days including national and religious holidays and probable days that are given as holidays because of a long enough holiday in same week such as a Friday after a three day holiday is marked with a binary indicator variable. Adding day of week and the month of year along with special days and smoothed lockdown effect the model is,
##
## Call:
## lm(formula = Consumption ~ as.factor(month(Date)) + as.factor(Day) +
## covid + special, data = daily_data %>% filter(Date < "2021-01-29"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -154953 -20125 987 18896 170261
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 740452 3638 203.502 < 2e-16 ***
## as.factor(month(Date))2 -22468 4381 -5.129 3.3e-07 ***
## as.factor(month(Date))3 -59288 4326 -13.706 < 2e-16 ***
## as.factor(month(Date))4 -91620 4480 -20.450 < 2e-16 ***
## as.factor(month(Date))5 -104108 4288 -24.282 < 2e-16 ***
## as.factor(month(Date))6 -63190 4323 -14.618 < 2e-16 ***
## as.factor(month(Date))7 59904 4266 14.042 < 2e-16 ***
## as.factor(month(Date))8 56289 4308 13.065 < 2e-16 ***
## as.factor(month(Date))9 -12317 4305 -2.861 0.00428 **
## as.factor(month(Date))10 -84475 4266 -19.801 < 2e-16 ***
## as.factor(month(Date))11 -49234 4307 -11.431 < 2e-16 ***
## as.factor(month(Date))12 -9248 4268 -2.167 0.03041 *
## as.factor(Day)2 103268 3421 30.190 < 2e-16 ***
## as.factor(Day)3 121612 3417 35.592 < 2e-16 ***
## as.factor(Day)4 124935 3416 36.569 < 2e-16 ***
## as.factor(Day)5 129128 3417 37.794 < 2e-16 ***
## as.factor(Day)6 123490 3422 36.087 < 2e-16 ***
## as.factor(Day)7 80517 3420 23.542 < 2e-16 ***
## covid -1586 109 -14.548 < 2e-16 ***
## special -160843 4743 -33.915 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 35250 on 1469 degrees of freedom
## Multiple R-squared: 0.8254, Adjusted R-squared: 0.8231
## F-statistic: 365.4 on 19 and 1469 DF, p-value: < 2.2e-16
Although all predictors are shown to be significant, it is suspected if it would be a better choice to add the effect of lock-down as an indicator variable since increases in consumption levels at the end of this season might also be the result of transition to summer time as it happens in all years,
##
## Call:
## lm(formula = Consumption ~ as.factor(month(Date)) + as.factor(Day) +
## covid + special, data = daily_data %>% filter(Date < "2021-01-29"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -160188 -19715 -339 18193 161689
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 740561 3510 210.997 < 2e-16 ***
## as.factor(month(Date))2 -22307 4226 -5.279 1.49e-07 ***
## as.factor(month(Date))3 -61269 4142 -14.794 < 2e-16 ***
## as.factor(month(Date))4 -88925 4304 -20.662 < 2e-16 ***
## as.factor(month(Date))5 -89413 4267 -20.953 < 2e-16 ***
## as.factor(month(Date))6 -63715 4170 -15.279 < 2e-16 ***
## as.factor(month(Date))7 59865 4115 14.548 < 2e-16 ***
## as.factor(month(Date))8 55479 4156 13.348 < 2e-16 ***
## as.factor(month(Date))9 -12369 4153 -2.978 0.00295 **
## as.factor(month(Date))10 -84565 4115 -20.549 < 2e-16 ***
## as.factor(month(Date))11 -49073 4155 -11.812 < 2e-16 ***
## as.factor(month(Date))12 -9089 4117 -2.208 0.02743 *
## as.factor(Day)2 102705 3300 31.125 < 2e-16 ***
## as.factor(Day)3 121217 3296 36.776 < 2e-16 ***
## as.factor(Day)4 124636 3296 37.819 < 2e-16 ***
## as.factor(Day)5 128927 3296 39.118 < 2e-16 ***
## as.factor(Day)6 123149 3301 37.307 < 2e-16 ***
## as.factor(Day)7 80424 3299 24.377 < 2e-16 ***
## covid -83345 4539 -18.362 < 2e-16 ***
## special -154511 4592 -33.644 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 34000 on 1469 degrees of freedom
## Multiple R-squared: 0.8375, Adjusted R-squared: 0.8354
## F-statistic: 398.5 on 19 and 1469 DF, p-value: < 2.2e-16
This model seems to be a better fit with decreased residual standard error. When residuals are checked,
##
## Breusch-Godfrey test for serial correlation of order up to 23
##
## data: Residuals
## LM test = 860.08, df = 23, p-value < 2.2e-16
It can be observed that although the residuals are a good fit to the normal distribution there are some problems with increasing trend in residuals and high extent of positive auto-correlation in the errors. These indicate that there are some more information that is left in the residuals which can be reflected in the model. The serial correlation will be discussed in the coming parts but to model the trend the index of the day is added as a predictor to the model,
##
## Call:
## lm(formula = Consumption ~ as.factor(month(Date)) + as.factor(Day) +
## trend + covid + special, data = daily_data %>% filter(Date <
## "2021-01-29"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -156602 -17492 -140 17207 154759
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.265e+05 3.764e+03 193.024 < 2e-16 ***
## as.factor(month(Date))2 -1.973e+04 4.127e+03 -4.782 1.91e-06 ***
## as.factor(month(Date))3 -5.785e+04 4.053e+03 -14.273 < 2e-16 ***
## as.factor(month(Date))4 -8.399e+04 4.229e+03 -19.860 < 2e-16 ***
## as.factor(month(Date))5 -8.506e+04 4.186e+03 -20.322 < 2e-16 ***
## as.factor(month(Date))6 -6.337e+04 4.063e+03 -15.599 < 2e-16 ***
## as.factor(month(Date))7 5.961e+04 4.009e+03 14.869 < 2e-16 ***
## as.factor(month(Date))8 5.464e+04 4.050e+03 13.491 < 2e-16 ***
## as.factor(month(Date))9 -1.382e+04 4.049e+03 -3.412 0.000662 ***
## as.factor(month(Date))10 -8.660e+04 4.016e+03 -21.565 < 2e-16 ***
## as.factor(month(Date))11 -5.170e+04 4.058e+03 -12.741 < 2e-16 ***
## as.factor(month(Date))12 -1.231e+04 4.027e+03 -3.057 0.002277 **
## as.factor(Day)2 1.026e+05 3.215e+03 31.927 < 2e-16 ***
## as.factor(Day)3 1.211e+05 3.211e+03 37.720 < 2e-16 ***
## as.factor(Day)4 1.245e+05 3.211e+03 38.775 < 2e-16 ***
## as.factor(Day)5 1.288e+05 3.211e+03 40.106 < 2e-16 ***
## as.factor(Day)6 1.231e+05 3.216e+03 38.283 < 2e-16 ***
## as.factor(Day)7 8.037e+04 3.214e+03 25.008 < 2e-16 ***
## trend 1.933e+01 2.164e+00 8.935 < 2e-16 ***
## covid -9.704e+04 4.680e+03 -20.736 < 2e-16 ***
## special -1.547e+05 4.474e+03 -34.570 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 33130 on 1468 degrees of freedom
## Multiple R-squared: 0.8459, Adjusted R-squared: 0.8438
## F-statistic: 402.9 on 20 and 1468 DF, p-value: < 2.2e-16
As can be seen from the decrease in the residual standard errors, the addition of trend have increased the training performance of the model. The increase in consumption levels might be a result of increasing number of facilities and organizations or activities using the technology. However it is surprising to see this effect being reflected in a few years span. When residuals of the model is checked,
##
## Breusch-Godfrey test for serial correlation of order up to 24
##
## data: Residuals
## LM test = 830.68, df = 24, p-value < 2.2e-16
The only problem with errors seem to be the high amount of positive correlation in most of the lags. However, before discussing this situation it can be a better idea to observe the relation between the residuals and special aggregations of temperature recordings from the given cities to check if there is any additional information that the residuals may incorporate that can be imposed onto the model. Some aggregations of temperature are the minimum and maximum temperature in the day from one of the centers, the difference between these two observations, the average temperature of the day, the average temperature in the hottest city and the average temperature in the coldest city.
Before visualizing any relation, it should be noted that there are two important factors that may be misleading in the interpretation of the aggregated temperature recordings. One factor is that the temperatures were provided using Celsius, however as in winter temperature levels hit below zero they change sign and such behavior might impair the performance of the correlation coefficient of any predictor variable related to temperature. Or, a correlation that is actually statistically significant might be demonstrated as not significant in the model because of the change in behavior arising from the change in sign. Therefore, all aggregated observations of temperature are converted into absolute temperature by adding 273, insuring that any misleading factor is eliminated and the relative differences in temperature have stayed constant. Another important point regarding the interpretation of temperature observations is the change in effects of increase or decrease in temperature to the amount of electricity consumed. From the plots visualized before, it has been obvious that the lowest levels are attained on the spring and autumn in which the temperature levels do not require any air conditioning. Whereas in winter and summer as these devices that use a lot of electricity are highly utilized because of different causes. Hence, an increase in temperature may decrease the usage of devices using electricity in winter, while this behavior usually contributes to an increase in the level of consumption in summer. As can be understood from this example, it should be more sensible to visualize or inspect the effect of temperature to the residuals with varying the months.
Keeping all these in mind, when the maximum temperature versus the residuals of the model before is visualized,
From this visualization it is revealed that in some months the maximum temperature levels affect the consumption of electricity in a positive or negative manner. However, for the first few months of the year the effects do not seem to be much strong. So investigating for a more comprehensive variable should be practical. Once the difference in temperature is shown versus the residuals,
These plots do not reveal any correspondence between the variable and the residuals. This may have happened because the minimum recording of temperature in a day can have an influence on the electricity consumption similar to the maximum recording. Since average temperature in the hottest and coldest centers may overlap with the minimum and maximum recordings in some degree, examining the relationship between average level of temperature and residuals can be more beneficial,
This regressor variable may be more useful since there is a revealed correlation between the variable and the regressor in more than half of the months. Since the relation in some months are ambiguous from the plot, it might be a better idea to check for the correlation coefficient,
## Month Correlation
## 1: 1 -0.006194685
## 2: 2 -0.188760847
## 3: 3 -0.227813506
## 4: 4 0.034827367
## 5: 5 0.374449877
## 6: 6 0.260390864
## 7: 7 0.214410902
## 8: 8 0.098867477
## 9: 9 0.436002904
## 10: 10 0.263964397
## 11: 11 -0.447796905
## 12: 12 -0.107974928
These outputs indicate that there is a considerable extent of correlation between two measures in May, September and November. However for better performance in the testing period, although the coefficients are somehow small, the predictor variables for average temperature in months February and March are also decided to be added to the model. Furthermore, the positive correlation between the average temperature level and the residuals of the first model has been visualized for June and July in the previous plot. Hence, it is surprising to observe a small correlation coefficient. Such behavior might have happened because of the left outliers coming from religious and national holidays that have been usually occurring in these months. Consequently, constructing a model including predictor variable of average temperature for these months can be tried to eliminate the predictors if they do not show to be statistically significant. Lastly, as can be guessed from the plot and by further investigation the level of temperature added as a predictor for May only if the recording is above 290 corresponding to 27 Celsius degrees, a hot weather that may require air conditioning! Although the missing relationship is predictable for some months in spring and fall since the weather is not too cold not too warm, such missing behavior is unexpected for December and January which are usually the coldest months of the year. Nonetheless, when a model is built with separate average temperature predictors of February, March, May, June, July, September and November,
##
## Call:
## lm(formula = Consumption ~ as.factor(month(Date)) + as.factor(Day) +
## trend + covid + special + tavg2 + tavg3 + tavg5 + tavg6 +
## tavg7 + tavg9 + tavg11, data = daily_data %>% filter(Date <
## "2021-01-29"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -152957 -14619 869 15301 158631
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.259e+05 3.259e+03 222.719 < 2e-16 ***
## as.factor(month(Date))2 1.126e+06 2.464e+05 4.571 5.27e-06 ***
## as.factor(month(Date))3 1.403e+06 2.675e+05 5.243 1.81e-07 ***
## as.factor(month(Date))4 -8.414e+04 3.660e+03 -22.987 < 2e-16 ***
## as.factor(month(Date))5 -9.807e+04 4.445e+03 -22.064 < 2e-16 ***
## as.factor(month(Date))6 -4.217e+06 3.710e+05 -11.367 < 2e-16 ***
## as.factor(month(Date))7 -2.941e+06 5.047e+05 -5.829 6.87e-09 ***
## as.factor(month(Date))8 5.509e+04 3.504e+03 15.722 < 2e-16 ***
## as.factor(month(Date))9 -4.185e+06 3.142e+05 -13.318 < 2e-16 ***
## as.factor(month(Date))10 -8.661e+04 3.474e+03 -24.932 < 2e-16 ***
## as.factor(month(Date))11 2.412e+06 2.685e+05 8.986 < 2e-16 ***
## as.factor(month(Date))12 -1.251e+04 3.484e+03 -3.592 0.000339 ***
## as.factor(Day)2 1.028e+05 2.783e+03 36.934 < 2e-16 ***
## as.factor(Day)3 1.211e+05 2.781e+03 43.530 < 2e-16 ***
## as.factor(Day)4 1.247e+05 2.778e+03 44.873 < 2e-16 ***
## as.factor(Day)5 1.288e+05 2.779e+03 46.360 < 2e-16 ***
## as.factor(Day)6 1.238e+05 2.784e+03 44.459 < 2e-16 ***
## as.factor(Day)7 8.099e+04 2.781e+03 29.121 < 2e-16 ***
## trend 1.992e+01 1.877e+00 10.613 < 2e-16 ***
## covid -9.604e+04 4.071e+03 -23.593 < 2e-16 ***
## special -1.585e+05 3.884e+03 -40.808 < 2e-16 ***
## tavg2 -4.119e+03 8.854e+02 -4.652 3.59e-06 ***
## tavg3 -5.183e+03 9.491e+02 -5.461 5.56e-08 ***
## tavg5 9.233e+01 1.768e+01 5.223 2.01e-07 ***
## tavg6 1.411e+04 1.260e+03 11.197 < 2e-16 ***
## tavg7 1.010e+04 1.698e+03 5.947 3.41e-09 ***
## tavg9 1.415e+04 1.066e+03 13.275 < 2e-16 ***
## tavg11 -8.682e+03 9.457e+02 -9.180 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 28660 on 1461 degrees of freedom
## Multiple R-squared: 0.8852, Adjusted R-squared: 0.8831
## F-statistic: 417.3 on 27 and 1461 DF, p-value: < 2.2e-16
The model is highly improved compared to the previous model with all predictors showing to be statistically significant. Furthermore, although some are close, most of the coefficients are considerably different from each other indicating the power introduced by differencing the regressor with respect to the months. When residuals of the model is checked,
##
## Breusch-Godfrey test for serial correlation of order up to 31
##
## data: Residuals
## LM test = 662.6, df = 31, p-value < 2.2e-16
The same assumptions and problems with the previous model are present. After that point, to reduce the auto-correlation between the residual errors the residuals of from two or seven days before can be added on top of the forecast of the day smoothed by some constant found via automated regression function. Two days are selected because it is the closest proximity to the day to be forecasted in an actual forecasting setting for this problem. However, adding the residuals by smoothing or maybe amplifying with the same coefficient might be misleading. To illustrate this problem, in such a setting the residual of Tuesday is added on top of the forecast of the Thursday, two days that have similar consumption profiles and amounts. Whereas when the residual from Sunday is thought, it is added to the forecast of Tuesday, two days that have considerably different behavior in consumption. To prevent this problem, there are two possible measures that are derived. Firstly, adding the residuals at lag 2 for each separate day of the week might be useful since the effects can be consistent with respect to two days considered. Secondly, for those days in which the residual from two days before is proven to be not statistically significant, residuals of the same day from previous week can be used which may eliminate the different characteristics of the days on lag. Nevertheless, the residuals do not seem to be additionally correlated at lag 7, so addition of residual from lag 2 will be tried first and if there is any problem with the predictors regressor from lag 7 will be added for that day instead of the one from lag 2. When the model is built by the first scenario,
##
## Call:
## lm(formula = Consumption ~ as.factor(month(Date)) + as.factor(Day) +
## trend + covid + special + tavg2 + tavg3 + tavg5 + tavg6 +
## tavg7 + tavg9 + tavg11 + lag2_mon + lag2_tue + lag2_wed +
## lag2_thu + lag2_fri + lag2_sat + lag2_sun, data = daily_data %>%
## filter(Date < "2021-01-29"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -157494 -10853 707 12647 143073
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.273e+05 2.916e+03 249.406 < 2e-16 ***
## as.factor(month(Date))2 1.005e+06 2.184e+05 4.603 4.53e-06 ***
## as.factor(month(Date))3 1.462e+06 2.371e+05 6.168 8.98e-10 ***
## as.factor(month(Date))4 -8.526e+04 3.256e+03 -26.182 < 2e-16 ***
## as.factor(month(Date))5 -9.734e+04 3.960e+03 -24.583 < 2e-16 ***
## as.factor(month(Date))6 -4.112e+06 3.293e+05 -12.485 < 2e-16 ***
## as.factor(month(Date))7 -3.291e+06 4.486e+05 -7.337 3.62e-13 ***
## as.factor(month(Date))8 5.325e+04 3.120e+03 17.070 < 2e-16 ***
## as.factor(month(Date))9 -3.729e+06 2.810e+05 -13.270 < 2e-16 ***
## as.factor(month(Date))10 -8.723e+04 3.090e+03 -28.233 < 2e-16 ***
## as.factor(month(Date))11 2.214e+06 2.382e+05 9.295 < 2e-16 ***
## as.factor(month(Date))12 -1.504e+04 3.097e+03 -4.855 1.34e-06 ***
## as.factor(Day)2 1.023e+05 2.472e+03 41.373 < 2e-16 ***
## as.factor(Day)3 1.206e+05 2.467e+03 48.891 < 2e-16 ***
## as.factor(Day)4 1.243e+05 2.465e+03 50.446 < 2e-16 ***
## as.factor(Day)5 1.284e+05 2.465e+03 52.094 < 2e-16 ***
## as.factor(Day)6 1.233e+05 2.470e+03 49.925 < 2e-16 ***
## as.factor(Day)7 8.074e+04 2.467e+03 32.724 < 2e-16 ***
## trend 1.980e+01 1.669e+00 11.859 < 2e-16 ***
## covid -9.704e+04 3.615e+03 -26.842 < 2e-16 ***
## special -1.486e+05 3.507e+03 -42.382 < 2e-16 ***
## tavg2 -3.685e+03 7.849e+02 -4.696 2.91e-06 ***
## tavg3 -5.398e+03 8.413e+02 -6.416 1.89e-10 ***
## tavg5 8.364e+01 1.568e+01 5.333 1.12e-07 ***
## tavg6 1.375e+04 1.119e+03 12.291 < 2e-16 ***
## tavg7 1.127e+04 1.509e+03 7.465 1.43e-13 ***
## tavg9 1.260e+04 9.534e+02 13.215 < 2e-16 ***
## tavg11 -7.985e+03 8.392e+02 -9.516 < 2e-16 ***
## lag2_mon 4.106e-01 6.100e-02 6.732 2.40e-11 ***
## lag2_tue 4.109e-01 7.259e-02 5.660 1.82e-08 ***
## lag2_wed 3.684e-01 5.721e-02 6.440 1.62e-10 ***
## lag2_thu 6.068e-01 6.047e-02 10.036 < 2e-16 ***
## lag2_fri 5.933e-01 6.623e-02 8.958 < 2e-16 ***
## lag2_sat 6.088e-01 6.684e-02 9.107 < 2e-16 ***
## lag2_sun 2.653e-01 5.464e-02 4.855 1.33e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25390 on 1452 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.9103, Adjusted R-squared: 0.9082
## F-statistic: 433.2 on 34 and 1452 DF, p-value: < 2.2e-16
As can be seen from the increase in r-squared and the decrease in residual standard error, the model’s training performance have been improved. All predictors are statistically significant and differencing the residuals with respect to days have been a more efficient idea for some of the days as can be understood from the difference in coefficients. When the residuals of the model is checked,
##
## Breusch-Godfrey test for serial correlation of order up to 38
##
## data: Residuals
## LM test = 496.44, df = 38, p-value < 2.2e-16
The residuals are still a nearly perfect fit for normal distribution with a few outliers in the left tail. Moreover, the serial correlation at all lags after 1 seem to be decreased to nearly non-statistically significant levels. Lastly, the outliers still existing in the residuals can be dealt with. It should be kept in mind that the reasons underlying the exceptional behavior of these observations are not covered by the researchers. These might be some holidays that may have left from the eye or another special day that are not known by the researchers, maybe big holidays affecting a community or important examination days. Nevertheless, to improve the model’s performance those days are treated as outliers and indicator variables indicating whether a day is big or small outlier or in normal behavior is added to the model. It can be noted that the days with residuals till the 0.07 quantile are denoted as small outliers and the days with residuals after the 0.93 quantile are denoted as large outliers. The choice of boundaries are not much detailed and they might be up to the user or researchers in another study. When a model is constructed that way,
##
## Call:
## lm(formula = Consumption ~ as.factor(month(Date)) + as.factor(Day) +
## trend + covid + special + tavg2 + tavg3 + tavg5 + tavg6 +
## tavg7 + tavg9 + tavg11 + lag2_mon + lag2_tue + lag2_wed +
## lag2_thu + lag2_fri + lag2_sat + lag2_sun + outlier_small +
## outlier_large, data = daily_data %>% filter(Date < "2021-01-29"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -100561 -10196 -317 10754 95630
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.297e+05 1.793e+03 407.036 < 2e-16 ***
## as.factor(month(Date))2 8.619e+05 1.339e+05 6.436 1.66e-10 ***
## as.factor(month(Date))3 1.433e+06 1.454e+05 9.861 < 2e-16 ***
## as.factor(month(Date))4 -8.512e+04 1.996e+03 -42.645 < 2e-16 ***
## as.factor(month(Date))5 -9.950e+04 2.429e+03 -40.959 < 2e-16 ***
## as.factor(month(Date))6 -4.064e+06 2.023e+05 -20.082 < 2e-16 ***
## as.factor(month(Date))7 -3.328e+06 2.752e+05 -12.094 < 2e-16 ***
## as.factor(month(Date))8 5.312e+04 1.913e+03 27.771 < 2e-16 ***
## as.factor(month(Date))9 -3.659e+06 1.724e+05 -21.225 < 2e-16 ***
## as.factor(month(Date))10 -8.986e+04 1.895e+03 -47.407 < 2e-16 ***
## as.factor(month(Date))11 2.281e+06 1.460e+05 15.622 < 2e-16 ***
## as.factor(month(Date))12 -1.587e+04 1.900e+03 -8.354 < 2e-16 ***
## as.factor(Day)2 1.010e+05 1.515e+03 66.649 < 2e-16 ***
## as.factor(Day)3 1.199e+05 1.512e+03 79.278 < 2e-16 ***
## as.factor(Day)4 1.222e+05 1.512e+03 80.863 < 2e-16 ***
## as.factor(Day)5 1.264e+05 1.514e+03 83.503 < 2e-16 ***
## as.factor(Day)6 1.220e+05 1.515e+03 80.539 < 2e-16 ***
## as.factor(Day)7 8.069e+04 1.513e+03 53.345 < 2e-16 ***
## trend 1.988e+01 1.026e+00 19.382 < 2e-16 ***
## covid -9.449e+04 2.228e+03 -42.408 < 2e-16 ***
## special -1.497e+05 2.266e+03 -66.066 < 2e-16 ***
## tavg2 -3.174e+03 4.812e+02 -6.597 5.86e-11 ***
## tavg3 -5.305e+03 5.158e+02 -10.284 < 2e-16 ***
## tavg5 8.704e+01 9.619e+00 9.049 < 2e-16 ***
## tavg6 1.358e+04 6.872e+02 19.755 < 2e-16 ***
## tavg7 1.140e+04 9.259e+02 12.312 < 2e-16 ***
## tavg9 1.235e+04 5.848e+02 21.125 < 2e-16 ***
## tavg11 -8.222e+03 5.145e+02 -15.982 < 2e-16 ***
## lag2_mon 4.990e-01 3.743e-02 13.330 < 2e-16 ***
## lag2_tue 4.224e-01 4.455e-02 9.483 < 2e-16 ***
## lag2_wed 4.270e-01 3.523e-02 12.121 < 2e-16 ***
## lag2_thu 6.766e-01 3.710e-02 18.236 < 2e-16 ***
## lag2_fri 6.397e-01 4.062e-02 15.750 < 2e-16 ***
## lag2_sat 6.407e-01 4.098e-02 15.636 < 2e-16 ***
## lag2_sun 2.498e-01 3.351e-02 7.456 1.53e-13 ***
## outlier_small -5.738e+04 1.664e+03 -34.484 < 2e-16 ***
## outlier_large 4.809e+04 1.689e+03 28.478 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15560 on 1450 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.9663, Adjusted R-squared: 0.9655
## F-statistic: 1156 on 36 and 1450 DF, p-value: < 2.2e-16
It can be seen that addition of these variables contributed to a jump in the performance of the model as the coefficients for normal days are probably better adjusted since the effect of outliers are modelled. Once residuals are checked,
##
## Breusch-Godfrey test for serial correlation of order up to 40
##
## data: Residuals
## LM test = 181.66, df = 40, p-value < 2.2e-16
The variance in the residuals have decreased. The model’s residuals are a good fit for normal distribution and excluding the auto-correlation at lag 1, the residuals are not serially correlated. This model can be used as a final model. However, it can be checked whether any prediction in behavior can be attained by ARIMA,
## Series: daily_data[3:1489, residual]
## ARIMA(0,0,1) with zero mean
##
## Coefficients:
## ma1
## 0.2801
## s.e. 0.0253
##
## sigma^2 estimated as 219411191: log likelihood=-16389.5
## AIC=32783.01 AICc=32783.02 BIC=32793.62
The only component is moving average at 1, however as the forecasting setting is limiting the possibility to use any info from the day the forecast is provided, use of ARIMA does not provide additional improvement to the model, at least not in the forecasting side.
After a suitable model has been found, the distribution of the consumption percentages to 24 hours of the day is another task that can be handled before testing the model. As a practical approach, a matrix filled with percentages from 24 hours of 7 days of week is found to be useful. However, it should be kept in mind that the consumption characteristics may vary from season to season just like it is proven to vary by the day of the week. One such contributor to the difference can be the reduced sunshine duration in winter time that have become more pronounced by the country deciding not to change time zones during the spring and autumn. Since the sun has not been risen when people are commuting to work or school during winter, more electricity is consumed just to lighten the streets which may be a huge contributor of the demand. Because of such reasons, the profiles of hours in a day may differ with season. Hence, as the test period is mostly in February, the matrix discussed before have been created by taking the averages of consumption percentage of hours observed in February in the data. Another assumption constructing this 7 vectors of percentages is that the percentages of consumption distributed to hours have reached to its steady state distribution over time. To check with this claim in the testing period, performance of the model for both daily series and hourly series will be analyzed and discussed using some well-known metrics.
Before testing, it might be useful to transfer the fitted values for daily consumption to hourly fitted amounts. Although it was mentioned that each month or season can require its own vector of percentages, an averaged vector utilizing percentages from 2017-2021 is used for this transfer to reduce the complexity. It should be kept in mind that the actual distribution model might probably be better at handling residuals that will be visualized now,
This distribution of residuals seem stationary with changing variances at some time intervals, this problem might have caused by using the same percent vector for all seasons as more similar errors can be committed for observations in the same season. There are actually much more analysis that can be carried out with residuals such as plotting residuals versus some of the predictors. However, since the model will be tested on a test period, a part of these useful analyses will be executed using the test data.
When the model with same predictors has been constructed repeatedly in a loop while updating the data set, the accuracy of the daily forecasts for the 2 week period is,
## n mean sd error FBias MAPE MAD WMAPE
## 1 14 851479.4 56383.32 -12382.29 0.007596589 0.02581095 22615.89 0.0265607
WMAPE is around %2.5 and FBias is very close to zero, indicating that the model do not over or under predict and the residuals are scattered around zero. The model seems to be adequate since in the study of Deb et al. (2017) accuracy MAPEs of various popular techniques that are used for same job are listed to be between %1 and %2. However, as mentioned before, there could have been some misleading factors while converting the daily forecast to hourly forecasts so if the accuracy of the hourly forecasts are to be checked,
## n mean sd error FBias MAPE MAD WMAPE
## 1 336 35159.79 4291.239 -5.184517 -0.00139363 0.0285163 1020.597 0.0290274
It can be seen that the MAPE and WMAPE values are between %2.5 and %3. This behavior demonstrates that, not much information is lost while partitioning the forecasts. Moreover, it can be natural for accuracy to worsen while predicting hourly series because of the additional degree of details involved. Although the metrics are compared with respect to some popular techniques in the literature, it should be useful to develop a baseline model to show the amount of the improvement attained by the linear regression developed. If a baseline model, using the consumption from 168 hours before, that is a week, is used as a forecasting model,
## n mean sd error FBias MAPE MAD WMAPE
## 1 336 35159.79 4291.239 86.44 -0.03768008 0.04102423 1482.362 0.0421607
As can be seen, the MAPE and WMAPE values of the baseline method is approximately around %4. It can be said that the model provided additional information and practicality in forecasting the hourly consumption of electricity in Turkey. If the predicted versus actual consumption values are visualized,
It is obvious that there are some problems with the forecasting methodology. Some of the most sparkling ones can be listed as the inadequacy of the model in predicting the unexpectedly high consumption amounts, failure to grasp the consumption percentages of the hours in a day and forecasting a misleadingly high amount of consumption on a Saturday. The possible reasons and solutions to some of these problems will be mentioned along with other possible shortcomings of the model in the next section. For now, if the residuals in the test period is plotted over the time,
Though it was stated that the FBias imply that on average the model do not over or under predict, from this plot it is visible that the behavior of under or over predicting is clustered to specific times. It can be argued that there are some problems with the residuals especially from the first week as most of the time the model under predicted the realized consumption amount. From the previous plot since the profile of days are obvious, it can be claimed that most of this under predicting behavior occurred after the midday. Hence, it can be stated that most of the peaks visible in the plot of residuals are from the noon and afternoon hours of the day. In fact if residuals are drawn with respect to the hours,
This box plot of residuals scattered into hours displays the problems associated with prediction in noon and afternoon. As obvious from the plot, there are the model has tended to predict less than the realized consumption in hours between 10 and 15. Although there are problems associated with other hours too, it should be noted that no prediction is perfect. Lastly, if residuals versus the average temperature in that hour is visualized,
As obvious from the plot, there can be a weak correlation between the two measures. This is actually interesting because in winter time, it is not expected for temperature to be positively correlated with the electricity consumption. Nonetheless, the relation might be a result of another third measure that is not introduced in the model as most of the time the warmest hours in winter are around the noon, the interval in which the model performed poorly. However, it can still be suggested that use of hourly average of temperature could bring some improvement to the model.
It is discussed before that there are some problems involved with the model. While some of these problems might have influenced the performance of the model in the test period, there can also be shortcomings of the model that have not been reflected in the results. The inefficiencies of the model can be separated into two categories stemming from various reasons. One category involves problematic behaviors introduced by the distribution of hourly electricity consumption, whereas the other category includes the problems caused by inadequate or misleading explanation of variation in the series in the linear regression model.
The problems involved with partitioning of hourly percentages may have resulted from several different reasons. It is argued before that the temperature levels do not only affect the daily amount of consumed electricity but also the hourly distributions. Since the effect of temperature is reflected in the model for approximately half of the months, it can be claimed that this addition might be able to grasp the relationship between the hourly consumption levels and the temperature. However, a simple example can oppose this argument. In the winter time, people tend to close or reduce the heating in night hours when the difference in temperature between afternoons and nights are not much high. There might be observed both kind of days, some demonstrating this kind of behavior and some showing a gradual difference in temperature levels between day and night. Although a similar scenario may apply for the summer with air conditioners, such effects can be missing in relatively warm weathers in spring or autumn which may change the hourly profiles in a day. Hence, it can be claimed that the effect of temperature is not the same for all hours in a day and its effect vary with seasons furthermore there might be days with different profiles in the same season. Though not related to temperature levels, a similar case may apply for the hourly profiles in holidays since one of the main contributors of consumption in working hours will be missing, namely the industrial production. Owing to all these reasons, it can be stated that aggregating the hourly percentages of amount of consumed electricity only based on the past data, months and day of the week might be misleading or inadequate in some ways. Although modelling the total consumption in a day is a more practical approach, fine tuning the hourly percentages is a down side of this technique. It is obvious that with better adjusted hourly percentages taking hourly temperature recordings and special features of the day, the model might perform better. However, this may of course require an additional workload which may impair the practicality and ease of use offered by the model before.
There are also some problematic behaviors in the linear regression model constructed. As observed before, the model has been unable to predict high electricity consumption levels in the test period. There might be different reasons contributing to this inadequacy of the model. One of the possible reasons is the missing temperature information for the month January. Nonetheless, when the residuals are analyzed, it can be seen that this under predicting behavior has took place in February more seriously than the January. As a result, it can be stated that there are additional information that are missing to the model. Although the average of the temperature observations has been a predictor for February in the model, the recording is a representative of only 7 cities from the Turkey. If the temperature information of Turkey in the first week of February 2021 is checked for more centers, it can be observed that there is an important number of cities with lower temperature levels compared to the second week of February or the average temperature levels of this winter. Hence, it can be argued that the model is not able to fully grasp the temperature information leading to problems when temperature levels hit more extreme values.
As can be understood from this discussion, more detailed analysis with hourly percentages based on the type of the day and hourly temperature info can be considered as a future addition to the model. Moreover, temperature information from more centers can be used with giving city specific weights to each observation as temperatures recordings in cities with more population, industrial development and tourism may become more important factors determining the total consumption of Turkey. Another future improvement to the model can be introduced by trying different aggregations of temperature observations for different days with different characteristics as consumption levels might be more influenced by day night difference in temperature on some days or centers while being more affected by the average value of temperature in periods or cities. Lastly, for the proposed approach, modelling special days via more detailed examinations or variation in the types of the special days based on the consumption profiles might increase the efficiency of the model. Nevertheless, it should be kept in mind that these proposed methods might call for additional detailed examination of the data or related data with more advanced tools.
All in all, it can be said that predicting hourly electricity consumption is a popular problem in literature with a wide range of possible approaches as linear regression is one of them. It can be claimed that linear regression of the daily load benefiting from trend, seasonality and auto regressive factors bring some additional improvement to baseline solutions. Moreover, the model might have become more practical and easy to use by incorporating the calendar and temperature effects compared to the time series models. Nonetheless, some deficiencies in the proposed model can be listed as inadequate attention dedicated to aggregations of temperature and effects of temperatures on the hourly percentages as well as lack of detailed analysis and differentiation for the special days. Although it is for sure that at least some of these additions will improve the performance of the model, it should be kept in mind that almost all require more detailed analysis. Hence, it can be stated that the proposed model is a useful method that is time and work load efficient in forecasting the hourly electricity demand.
Ziel, F. (2018). Modeling Public Holidays in Load Forecasting: a German Case Study
Hodge, T. (2020). Hourly Electricity Consumption Varies Throughout the Day and Across Seasons
The RMD File can be found Here.