## Warning: package 'knitr' was built under R version 4.0.2

1.Introduction

This project aims to forecast how much Trendyol will sell at each day by the forecasting method applied in the day before. The historical data covering daily sales of 8 different products listed in Trendyol with attributes of category level, brand and site level details was already given. The products were La Roche Posay Facial Cleanser, Sleepy Wet Towel, Xiaomi Bluetooth Headphone, Fakir Stick Vacuum Cleaner, Trendyolmilla Leggings, Oral-B Re-chargeable Toothbrush, Trendyolmilla Bikini Top and Koton Coat. It was also necessary to take into consideration the special occasions in which irregular sales volume occurs such as Black Friday or the quarantine. In order to have a comprehensive perception of the problem and methods that are used in the project, we needed to understand the business of Trendyol and various modelling techniques for the forecast. 
Trendyol is an e-commerce website that sales a high variety of products range from fashion to supermarket. It also has its special brand named Trendyolmilla, which provides affordable fashionable clothes to its customers. The company was founded in 2010 in Istanbul and grew rapidly since then. In 2018, the world’s one of the biggest e-commerce companies Alibaba invested in Trendyol and became a partner. Trendyol is known to its customer with its frequent special offers which makes it one of the most used e-commerce websites in Turkey. That’s why, although Corona virus affected its business, the company managed to get through the quarantine period which was also the period that this project was executed.

2.Approach

We have used 3 different approaches during the project submission period. Before the approaches were introduced, the data before 18th of March was disregarded since it is believed that coronavirus pandemic changed the e-commerce behaviour of the people. Comparison of the approaches was done in a way that data between 18th of March and 14th of May was used as train data, and the rest was used as test data. The number of data points increased as days passed but we always tried to make 70-80% of the data as train data. It should also be noted that MAE was used when approaches were compared to each other. Normally, MAPE is a better performance measure than MAE; however, in some SKU’s, MAPE resulted in infinity value since there were some days of 0 sales. That’s why we used MAE as a performance measure. Also, MAE is a good performance measure since the comparison is done SKU by SKU.

2.1 Naïve Approach

In the first week of the competition, naïve approach is used without any comparison to any other approaches to understand the behaviour of the approach and the data. Then, we compared naïve approach with basic auto.arima() function forecasts. The auto.arima() function in R combines unit root tests, minimisation of the AICc and MLE to obtain an ARIMA model. Thus, at first, it looked so convenient to use this function. Naïve approach is an approach in which we are basically trying to forecast today’s sales from yesterday’s values. This approach sets the today’s values to yesterday’s values. In our project, it sets tomorrow’s values to yesterday’s values since we have 2 days before data beforehand. Naïve approach yielded lower MAE values compared to auto.arima forecasts SKU by SKU. Since naïve approach gave better MAE values for every SKU, it was decided to continue with the naïve approach. This approach was used for 17 days of the competition in total.

2.2 Double Layered Auto.arima() Approach

For 17 days, we had used the first approach. Then, we have established the double layered model. In making this model, our approach was trying to find some possible drivers-regressors- which could help predicting the future sales values. The data given consists of 8 possible regressors about SKUs and Trendyol. It is told that price is determined by dividing revenue by sold count. So, it did not seem a reasonable regressor for sold count. Using information about category and Trendyol visit count looked too risky since they could be affected by a lot of factors. Among the rest, we concluded that basket count is likely to mislead our forecasts because one can keep a product in their basket for a long time. We think that visit count and favored count are good drivers of sold count since intuitively they are more likely to be directly correlated. Consequently, visit count and favored count were used as regressors. The forecasts for the regressors are done by basic auto.arima() function.

When MAE values are compared, Double-Layered auto.arima() approach yielded better results. Yet, this approach was better in all SKUs. So, this approach had been used for 3 days until a new improvement has been introduced.

2.3 Adding Linear Regression on Double Layered Auto.arima() Approach

In this approach, linear regression is done to forecast the future sales values of every SKU. Again, the regressors are forecasted using basic auto.arima() function. After introducing linear regression model on the SKUs, some of them yielded better MAE values whereas some do not. So, we had decided to  continue with the model that has the smallest MAE values for given SKU. For every SKU, we had compared 3 different model’s MAE values: Double-layered auto.arima(), linear regression, and the average of the forecasts of the previous two models, which is called ensembling the forecast. Whichever has the best MAE value was selected for a given SKU. For SKUs “3904356” & “5926527”, the values did not change significantly since their sales are almost always 0, and rarely 1. Thus, it is decided to continue with the double-layered auto.arima() approach. For SKUs “32939029”, “4066298” & “6676673”, double-layered auto.arima() approach yielded the smallest MAE values among 3 different models. Linear regression method gave the best MAE value only for the SKU “85004”. Finally, ensembling the forecasts of linear regression and double-layered auto.arima() approach yielded the smallest MAE value for the SKUs “31515569” & “7061886”. This final approach had been used for the last 10 days of the competition.

3.Results & Conclusion

    Forecasting is estimating the future values of data based on its past values. Forecasting can be challenging depending on the structure of the data, real-world issues etc. In our project, we faced this challenge due to Covid-19 pandemic. We believe that the e-commerce behaviour of consumers has changed. So, we disregarded the data before 18th of March. Also, due to the ambiguity of consumers behaviours, forecasting the sales in such a month was hard in terms of accuracy. Yet, we have implemented 3 different approaches throughout the project. We believe that our forecasts would have been more accurate if there had not been such a pandemic. It also taught us that we should be ready for this kind of unexpected outcomes in real-life. As a result, our best score in a day was 25, and the worst one was 7 out of 26. We have ranked 12th out of 24 groups. Our daily score list can be found below:

12 8 15 24 13 15 7 24 9 15 8 7 12 23 7 10 0 10 19 12 24 14 22 9 25 15 16 8 21

4.Code

# install the required packages first
require(jsonlite)
require(httr)
require(data.table)

get_token <- function(username, password, url_site){
  
  post_body = list(username=username,password=password)
  post_url_string = paste0(url_site,'/token/')
  result = POST(post_url_string, body = post_body)
  
  # error handling (wrong credentials)
  if(result$status_code==400){
    print('Check your credentials')
    return(0)
  }
  else if (result$status_code==201){
    output = content(result)
    token = output$key
  }
  
  return(token)
}

get_data <- function(start_date='2020-03-18', token, url_site){
  
  post_body = list(start_date=start_date,username=username,password=password)
  post_url_string = paste0(url_site,'/dataset/')
  
  header = add_headers(c(Authorization=paste('Token',token,sep=' ')))
  result = GET(post_url_string, header, body = post_body)
  output = content(result)
  data = data.table::rbindlist(output)
  data[,event_date:=as.Date(event_date)]
  data = data[order(product_content_id,event_date)]
  return(data)
}


send_submission <- function(predictions, token, url_site, submit_now=F){
  
  format_check=check_format(predictions)
  if(!format_check){
    return(FALSE)
  }
  
  post_string="list("
  for(i in 1:nrow(predictions)){
    post_string=sprintf("%s'%s'=%s",post_string,predictions$product_content_id[i],predictions$forecast[i])
    if(i<nrow(predictions)){
      post_string=sprintf("%s,",post_string)
    } else {
      post_string=sprintf("%s)",post_string)
    }
  }
  
  submission = eval(parse(text=post_string))
  json_body = jsonlite::toJSON(submission, auto_unbox = TRUE)
  submission=list(submission=json_body)
  
  print(submission)
  # {"31515569":2.4,"32939029":2.4,"4066298":2.4,"6676673":2.4,"7061886":2.4,"85004":2.4} 
  
  if(!submit_now){
    print("You did not submit.")
    return(FALSE)      
  }
  
  
  header = add_headers(c(Authorization=paste('Token',token,sep=' ')))
  post_url_string = paste0(url_site,'/submission/')
  result = POST(post_url_string, header, body=submission)
  
  if (result$status_code==201){
    print("Successfully submitted. Below you can see the details of your submission")
  } else {
    print("Could not submit. Please check the error message below, contact the assistant if needed.")
  }
  
  print(content(result))
  
}

check_format <- function(predictions){
  
  if(is.data.frame(predictions) | is.data.frame(predictions)){
    if(all(c('product_content_id','forecast') %in% names(predictions))){
      if(is.numeric(predictions$forecast)){
        print("Format OK")
        return(TRUE)
      } else {
        print("forecast information is not numeric")
        return(FALSE)                
      }
    } else {
      print("Wrong column names. Please provide 'product_content_id' and 'forecast' columns")
      return(FALSE)
    }
    
  } else {
    print("Wrong format. Please provide data.frame or data.table object")
    return(FALSE)
  }
  
}

# this part is main code
subm_url = 'http://167.172.183.67'

u_name = "Group7"
p_word = "q6W4qjU9MD0iqhp2"
submit_now = FALSE

username = u_name
password = p_word

token = get_token(username=u_name, password=p_word, url=subm_url)
data = get_data(token=token,url=subm_url)


predictions=unique(data[,list(product_content_id)])
predictions[,forecast:=rbind(p1,p2,p3,p4,p5,p6,p7,p8)]

send_submission(predictions, token, url=subm_url, submit_now=F)
data <- data[event_date>="2020-03-18"]
SKU1 <- data[product_content_id=="31515569"]
SKU2 <- data[product_content_id=="32939029"]
SKU3 <- data[product_content_id=="3904356"]
SKU4 <- data[product_content_id=="4066298"]
SKU5 <- data[product_content_id=="5926527"]
SKU6 <- data[product_content_id=="6676673"]
SKU7 <- data[product_content_id=="7061886"]
SKU8 <- data[product_content_id=="85004"]
SKU1test <- SKU1[event_date>="2020-05-14"]
SKU1train <- SKU1[event_date<"2020-05-14"]
SKU2test <- SKU2[event_date>="2020-05-14"]
SKU2train <- SKU2[event_date<"2020-05-14"]
SKU3test <- SKU3[event_date>="2020-05-14"]
SKU3train <- SKU3[event_date<"2020-05-14"]
SKU4test <- SKU4[event_date>="2020-05-14"]
SKU4train <- SKU4[event_date<"2020-05-14"]
SKU5test <- SKU5[event_date>="2020-05-14"]
SKU5train <- SKU5[event_date<"2020-05-14"]
SKU6test <- SKU6[event_date>="2020-05-14"]
SKU6train <- SKU6[event_date<"2020-05-14"]
SKU7test <- SKU7[event_date>="2020-05-14"]
SKU7train <- SKU7[event_date<"2020-05-14"]
SKU8test <- SKU8[event_date>="2020-05-14"]
SKU8train <- SKU8[event_date<"2020-05-14"]
install.packages("forecast")
library(forecast)
install.packages("xts")
library(xts)
#APPROACH1: Naive vs auto.arima()
tumdata = get_data(token=token,url=subm_url)
SKU1tum <- tumdata[product_content_id=="31515569"]
SKU2tum <- tumdata[product_content_id=="32939029"]
SKU3tum <- tumdata[product_content_id=="3904356"]
SKU4tum <- tumdata[product_content_id=="4066298"]
SKU5tum <- tumdata[product_content_id=="5926527"]
SKU6tum <- tumdata[product_content_id=="6676673"]
SKU7tum <- tumdata[product_content_id=="7061886"]
SKU8tum <- tumdata[product_content_id=="85004"]
SKU1naivefc<- subset(SKU1tum,event_date>="2020-05-13"&event_date<"2020-05-28",select = "sold_count")
SKU2naivefc<- subset(SKU2tum,event_date>="2020-05-13"&event_date<"2020-05-28",select = "sold_count")
SKU3naivefc<- subset(SKU3tum,event_date>="2020-05-13"&event_date<"2020-05-28",select = "sold_count")
SKU4naivefc<- subset(SKU4tum,event_date>="2020-05-13"&event_date<"2020-05-28",select = "sold_count")
SKU5naivefc<- subset(SKU5tum,event_date>="2020-05-13"&event_date<"2020-05-28",select = "sold_count")
SKU6naivefc<- subset(SKU6tum,event_date>="2020-05-13"&event_date<"2020-05-28",select = "sold_count")
SKU7naivefc<- subset(SKU7tum,event_date>="2020-05-13"&event_date<"2020-05-28",select = "sold_count")
SKU8naivefc<- subset(SKU8tum,event_date>="2020-05-13"&event_date<"2020-05-28",select = "sold_count")SKU1naivefc <- ts(SKU1naivefc)
SKU2naivefc <- ts(SKU2naivefc)
SKU3naivefc <- ts(SKU3naivefc)
SKU4naivefc <- ts(SKU4naivefc)
SKU5naivefc <- ts(SKU5naivefc)
SKU6naivefc <- ts(SKU6naivefc)
SKU7naivefc <- ts(SKU7naivefc)
SKU8naivefc <- ts(SKU8naivefc)
accuracy(SKU1naivefc,SKU1test$sold_count)
accuracy(SKU2naivefc,SKU2test$sold_count)
accuracy(SKU3naivefc,SKU3test$sold_count)
accuracy(SKU4naivefc,SKU4test$sold_count)
accuracy(SKU5naivefc,SKU5test$sold_count)
accuracy(SKU6naivefc,SKU6test$sold_count)
accuracy(SKU7naivefc,SKU7test$sold_count)
accuracy(SKU8naivefc,SKU8test$sold_count)
fc1 <- auto.arima(SKU1train$sold_count,stepwise = FALSE)
fc2 <- auto.arima(SKU2train$sold_count,stepwise = FALSE)
fc3 <- auto.arima(SKU3train$sold_count,stepwise = FALSE)
fc4 <- auto.arima(SKU4train$sold_count,stepwise = FALSE)
fc5 <- auto.arima(SKU5train$sold_count,stepwise = FALSE)
fc6 <- auto.arima(SKU6train$sold_count,stepwise = FALSE)
fc7 <- auto.arima(SKU7train$sold_count,stepwise = FALSE)
fc8 <- auto.arima(SKU8train$sold_count,stepwise = FALSE)
horizon= nrow(SKU1train)
pred1 <- forecast(fc1,h=horizon)$mean
pred2 <- forecast(fc2,h=horizon)$mean
pred3 <- forecast(fc3,h=horizon)$mean
pred4 <- forecast(fc4,h=horizon)$mean
pred5 <- forecast(fc5,h=horizon)$mean
pred6 <- forecast(fc6,h=horizon)$mean
pred7 <- forecast(fc7,h=horizon)$mean
pred8 <- forecast(fc8,h=horizon)$mean
p1 <- round(pred1)
p2 <- round(pred2)
p3 <- round(pred3)
p4 <- round(pred4)
p5 <- round(pred5)
p6 <- round(pred6)
p7 <- round(pred7)
p8 <- round(pred8)

accuracy(p1,SKU1test$sold_count)
accuracy(p2,SKU2test$sold_count)
accuracy(p3,SKU3test$sold_count)
accuracy(p4,SKU4test$sold_count)
accuracy(p5,SKU5test$sold_count)
accuracy(p6,SKU6test$sold_count)
accuracy(p7,SKU7test$sold_count)
accuracy(p8,SKU8test$sold_count)
#APPROACH 2: Double-layered auto.arima()
fc1 <- auto.arima(SKU1train$sold_count,xreg = cbind(SKU1train$visit_count,SKU1train$basket_count),stepwise = FALSE)
fc2 <- auto.arima(SKU2train$sold_count,xreg = cbind(SKU2train$visit_count,SKU2train$basket_count),stepwise = FALSE)
fc3 <- auto.arima(SKU3train$sold_count,xreg = cbind(SKU3train$visit_count,SKU3train$basket_count),stepwise = FALSE)
fc4 <- auto.arima(SKU4train$sold_count,xreg = cbind(SKU4train$visit_count,SKU4train$basket_count),stepwise = FALSE)
fc5 <- auto.arima(SKU5train$sold_count,xreg = cbind(SKU5train$visit_count,SKU5train$basket_count),stepwise = FALSE)
fc6 <- auto.arima(SKU6train$sold_count,xreg = cbind(SKU6train$visit_count,SKU6train$basket_count),stepwise = FALSE)
fc7 <- auto.arima(SKU7train$sold_count,xreg = cbind(SKU7train$visit_count,SKU7train$basket_count),stepwise = FALSE)
fc8 <- auto.arima(SKU8train$sold_count,xreg = cbind(SKU8train$visit_count,SKU8train$basket_count),stepwise = FALSE)
horizon =nrow(SKU1test)
pred1 <- forecast(fc1,xreg=cbind(SKU1test$visit_count,SKU1test$basket_count),h=horizon)$mean
pred2 <- forecast(fc2,xreg=cbind(SKU2test$visit_count,SKU2test$basket_count),h=horizon)$mean
pred3 <- forecast(fc3,xreg=cbind(SKU3test$visit_count,SKU3test$basket_count),h=horizon)$mean
pred4 <- forecast(fc4,xreg=cbind(SKU4test$visit_count,SKU4test$basket_count),h=horizon)$mean
pred5 <- forecast(fc5,xreg=cbind(SKU5test$visit_count,SKU5test$basket_count),h=horizon)$mean
pred6 <- forecast(fc6,xreg=cbind(SKU6test$visit_count,SKU6test$basket_count),h=horizon)$mean
pred7 <- forecast(fc7,xreg=cbind(SKU7test$visit_count,SKU7test$basket_count),h=horizon)$mean
pred8 <- forecast(fc8,xreg=cbind(SKU8test$visit_count,SKU8test$basket_count),h=horizon)$mean
pred1[pred1<0] <- 0
pred2[pred2<0] <- 0
pred3[pred3<0] <- 0
pred4[pred4<0] <- 0
pred5[pred5<0] <- 0
pred6[pred6<0] <- 0
pred7[pred7<0] <- 0
pred8[pred8<0] <- 0
p1 <- round(pred1)
p2 <- round(pred2)
p3 <- round(pred3)
p4 <- round(pred4)
p5 <- round(pred5)
p6 <- round(pred6)
p7 <- round(pred7)
p8 <- round(pred8)
accuracy(p1,SKU1test$sold_count)
accuracy(p2,SKU2test$sold_count)
accuracy(p3,SKU3test$sold_count)
accuracy(p4,SKU4test$sold_count)
accuracy(p5,SKU5test$sold_count)
accuracy(p6,SKU6test$sold_count)
accuracy(p7,SKU7test$sold_count)
accuracy(p8,SKU8test$sold_count)
predictions[,forecast:=rbind(p1,p2,p3,p4,p5,p6,p7,p8)]
#IMPLEMENTATION OF APPROACH 2
fc1 <- auto.arima(SKU1$sold_count,xreg = cbind(SKU1$visit_count,SKU1$basket_count),stepwise = FALSE)
fc2 <- auto.arima(SKU2$sold_count,xreg = cbind(SKU2$visit_count,SKU2$basket_count),stepwise = FALSE)
fc3 <- auto.arima(SKU3$sold_count,xreg = cbind(SKU3$visit_count,SKU3$basket_count),stepwise = FALSE)
fc4 <- auto.arima(SKU4$sold_count,xreg = cbind(SKU4$visit_count,SKU4$basket_count),stepwise = FALSE)
fc5 <- auto.arima(SKU5$sold_count,xreg = cbind(SKU5$visit_count,SKU5$basket_count),stepwise = FALSE)
fc6 <- auto.arima(SKU6$sold_count,xreg = cbind(SKU6$visit_count,SKU6$basket_count),stepwise = FALSE)
fc7 <- auto.arima(SKU7$sold_count,xreg = cbind(SKU7$visit_count,SKU7$basket_count),stepwise = FALSE)
fc8 <- auto.arima(SKU8$sold_count,xreg = cbind(SKU8$visit_count,SKU8$basket_count),stepwise = FALSE)
visitfc1 <- forecast(auto.arima(SKU1$visit_count,stepwise = FALSE),h=1)$mean
visitfc2 <- forecast(auto.arima(SKU2$visit_count,stepwise = FALSE),h=1)$mean
visitfc3 <- forecast(auto.arima(SKU3$visit_count,stepwise = FALSE),h=1)$mean
visitfc4 <- forecast(auto.arima(SKU4$visit_count,stepwise = FALSE),h=1)$mean
visitfc5 <- forecast(auto.arima(SKU5$visit_count,stepwise = FALSE),h=1)$mean
visitfc6 <- forecast(auto.arima(SKU6$visit_count,stepwise = FALSE),h=1)$mean
visitfc7 <- forecast(auto.arima(SKU7$visit_count,stepwise = FALSE),h=1)$mean
visitfc8 <- forecast(auto.arima(SKU8$visit_count,stepwise = FALSE),h=1)$mean
basketfc1 <- forecast(auto.arima(SKU1$basket_count,stepwise = FALSE),h=1)$mean
basketfc2 <- forecast(auto.arima(SKU2$basket_count,stepwise = FALSE),h=1)$mean
basketfc3 <- forecast(auto.arima(SKU3$basket_count,stepwise = FALSE),h=1)$mean
basketfc4 <- forecast(auto.arima(SKU4$basket_count,stepwise = FALSE),h=1)$mean
basketfc5 <- forecast(auto.arima(SKU5$basket_count,stepwise = FALSE),h=1)$mean
basketfc6 <- forecast(auto.arima(SKU6$basket_count,stepwise = FALSE),h=1)$mean
basketfc7 <- forecast(auto.arima(SKU7$basket_count,stepwise = FALSE),h=1)$mean
basketfc8 <- forecast(auto.arima(SKU8$basket_count,stepwise = FALSE),h=1)$mean
pred1 <- forecast(fc1,xreg=cbind(visitfc1,basketfc1),h=1)$mean
pred2 <- forecast(fc2,xreg=cbind(visitfc2,basketfc2),h=1)$mean
pred3 <- forecast(fc3,xreg=cbind(visitfc3,basketfc3),h=1)$mean
pred4 <- forecast(fc4,xreg=cbind(visitfc4,basketfc4),h=1)$mean
pred5 <- forecast(fc5,xreg=cbind(visitfc5,basketfc5),h=1)$mean
pred6 <- forecast(fc6,xreg=cbind(visitfc6,basketfc6),h=1)$mean
pred7 <- forecast(fc7,xreg=cbind(visitfc7,basketfc7),h=1)$mean
pred8 <- forecast(fc8,xreg=cbind(visitfc8,basketfc8),h=1)$mean
pred1[pred1<0] <- 0
pred2[pred2<0] <- 0
pred3[pred3<0] <- 0
pred4[pred4<0] <- 0
pred5[pred5<0] <- 0
pred6[pred6<0] <- 0
pred7[pred7<0] <- 0
pred8[pred8<0] <- 0
p1 <- round(pred1)
p2 <- round(pred2)
p3 <- round(pred3)
p4 <- round(pred4)
p5 <- round(pred5)
p6 <- round(pred6)
p7 <- round(pred7)
p8 <- round(pred8)
predictions[,forecast:=rbind(p1,p2,p3,p4,p5,p6,p7,p8)]
#APPROACH 3 : Adding Linear Regression to Double-layered auto.arima()
lmfc1 <- lm(sold_count~ visit_count+basket_count, data=SKU1train)
validlmfc1 <- as.data.frame(cbind(SKU1test$visit_count,SKU1test$basket_count))
colnames(validlmfc1) <- cbind("visit_count","basket_count")
lmpred1 <- predict(lmfc1,validlmfc1)
lmpred1[lmpred1<0] <- 0
lmp1 <- round(lmpred1)
accuracy(lmp1,SKU1test$sold_count)
arimafc1 <- auto.arima(SKU1train$sold_count,xreg = cbind(SKU1train$visit_count,SKU1train$basket_count),stepwise = FALSE)
horizon =nrow(SKU1test)
arimapred1 <- forecast(arimafc1,xreg=cbind(SKU1test$visit_count,SKU1test$basket_count),h=horizon)$mean
arimapred1[arimapred1<0] <- 0
arimap1 <- round(arimapred1)
accuracy(arimap1,SKU1test$sold_count)
mixedp1 <- 0.5*(arimap1+lmp1)
accuracy(mixedp1,SKU1test$sold_count)
lmfc2 <- lm(sold_count~ visit_count+basket_count, data=SKU2train)
validlmfc2 <- as.data.frame(cbind(SKU2test$visit_count,SKU2test$basket_count))
colnames(validlmfc2) <- cbind("visit_count","basket_count")
lmpred2 <- predict(lmfc2,validlmfc2)
lmpred2[lmpred2<0] <- 0
lmp2 <- round(lmpred2)
accuracy(lmp2,SKU2test$sold_count)
arimafc2 <- auto.arima(SKU2train$sold_count,xreg = cbind(SKU2train$visit_count,SKU2train$basket_count),stepwise = FALSE)
horizon =nrow(SKU2test)
arimapred2 <- forecast(arimafc2,xreg=cbind(SKU2test$visit_count,SKU2test$basket_count),h=horizon)$mean
arimapred2[arimapred2<0] <- 0
arimap2 <- round(arimapred2)
accuracy(arimap2,SKU2test$sold_count)
mixedp2 <- 0.5*(arimap2+lmp2)
accuracy(mixedp2,SKU2test$sold_count)
lmfc3 <- lm(sold_count~ visit_count+basket_count, data=SKU3train)
validlmfc3 <- as.data.frame(cbind(SKU3test$visit_count,SKU3test$basket_count))
colnames(validlmfc3) <- cbind("visit_count","basket_count")
lmpred3 <- predict(lmfc3,validlmfc3)
lmpred3[lmpred3<0] <- 0
lmp3 <- round(lmpred3)
accuracy(lmp3,SKU3test$sold_count)
arimafc3 <- auto.arima(SKU3train$sold_count,xreg = cbind(SKU3train$visit_count,SKU3train$basket_count),stepwise = FALSE)
horizon =nrow(SKU3test)
arimapred3 <- forecast(arimafc3,xreg=cbind(SKU3test$visit_count,SKU3test$basket_count),h=horizon)$mean
arimapred3[arimapred3<0] <- 0
arimap3 <- round(arimapred3)
accuracy(arimap3,SKU3test$sold_count)
mixedp3 <- 0.5*(arimap3+lmp3)
accuracy(mixedp3,SKU3test$sold_count)
lmfc4 <- lm(sold_count~ visit_count+basket_count, data=SKU4train)
validlmfc4 <- as.data.frame(cbind(SKU4test$visit_count,SKU4test$basket_count))
colnames(validlmfc4) <- cbind("visit_count","basket_count")
lmpred4 <- predict(lmfc4,validlmfc4)
lmpred4[lmpred4<0] <- 0
lmp4 <- round(lmpred4)
accuracy(lmp4,SKU4test$sold_count)
arimafc4 <- auto.arima(SKU4train$sold_count,xreg = cbind(SKU4train$visit_count,SKU4train$basket_count),stepwise = FALSE)
horizon =nrow(SKU4test)
arimapred4 <- forecast(arimafc4,xreg=cbind(SKU4test$visit_count,SKU4test$basket_count),h=horizon)$mean
arimapred4[arimapred4<0] <- 0
arimap4 <- round(arimapred4)
accuracy(arimap4,SKU4test$sold_count)
mixedp4 <- 0.5*(arimap4+lmp4)
accuracy(mixedp4,SKU4test$sold_count)
lmfc5 <- lm(sold_count~ visit_count+basket_count, data=SKU5train)
validlmfc5 <- as.data.frame(cbind(SKU5test$visit_count,SKU5test$basket_count))
colnames(validlmfc5) <- cbind("visit_count","basket_count")
lmpred5 <- predict(lmfc5,validlmfc5)
lmpred5[lmpred5<0] <- 0
lmp5 <- round(lmpred5)
accuracy(lmp5,SKU5test$sold_count)
arimafc5 <- auto.arima(SKU5train$sold_count,xreg = cbind(SKU5train$visit_count,SKU5train$basket_count),stepwise = FALSE)
horizon =nrow(SKU5test)
arimapred5 <- forecast(arimafc5,xreg=cbind(SKU5test$visit_count,SKU5test$basket_count),h=horizon)$mean
arimapred5[arimapred5<0] <- 0
arimap5 <- round(arimapred5)
accuracy(arimap5,SKU5test$sold_count)
mixedp5 <- 0.5*(arimap5+lmp5)
accuracy(mixedp5,SKU5test$sold_count)
lmfc6 <- lm(sold_count~ visit_count+basket_count, data=SKU6train)
validlmfc6 <- as.data.frame(cbind(SKU6test$visit_count,SKU6test$basket_count))
colnames(validlmfc6) <- cbind("visit_count","basket_count")
lmpred6 <- predict(lmfc6,validlmfc6)
lmpred6[lmpred6<0] <- 0
lmp6 <- round(lmpred6)
accuracy(lmp6,SKU6test$sold_count)
arimafc6 <- auto.arima(SKU6train$sold_count,xreg = cbind(SKU6train$visit_count,SKU6train$basket_count),stepwise = FALSE)
horizon =nrow(SKU6test)
arimapred6 <- forecast(arimafc6,xreg=cbind(SKU6test$visit_count,SKU6test$basket_count),h=horizon)$mean
arimapred6[arimapred6<0] <- 0
arimap6 <- round(arimapred6)
accuracy(arimap6,SKU6test$sold_count)
mixedp6 <- 0.5*(arimap6+lmp6)
accuracy(mixedp6,SKU6test$sold_count)
lmfc7 <- lm(sold_count~ visit_count+basket_count, data=SKU7train)
validlmfc7 <- as.data.frame(cbind(SKU7test$visit_count,SKU7test$basket_count))
colnames(validlmfc7) <- cbind("visit_count","basket_count")
lmpred7 <- predict(lmfc7,validlmfc7)
lmpred7[lmpred7<0] <- 0
lmp7 <- round(lmpred7)
accuracy(lmp7,SKU7test$sold_count)
arimafc7 <- auto.arima(SKU7train$sold_count,xreg = cbind(SKU7train$visit_count,SKU7train$basket_count),stepwise = FALSE)
horizon =nrow(SKU7test)
arimapred7 <- forecast(arimafc7,xreg=cbind(SKU7test$visit_count,SKU7test$basket_count),h=horizon)$mean
arimapred7[arimapred7<0] <- 0
arimap7 <- round(arimapred7)
accuracy(arimap7,SKU7test$sold_count)
mixedp7 <- 0.5*(arimap7+lmp7)
accuracy(mixedp7,SKU7test$sold_count)
lmfc8 <- lm(sold_count~ visit_count+basket_count, data=SKU8train)
validlmfc8 <- as.data.frame(cbind(SKU8test$visit_count,SKU8test$basket_count))
colnames(validlmfc8) <- cbind("visit_count","basket_count")
lmpred8 <- predict(lmfc8,validlmfc8)
lmpred8[lmpred8<0] <- 0
lmp8 <- round(lmpred8)
accuracy(lmp8,SKU8test$sold_count)
arimafc8 <- auto.arima(SKU8train$sold_count,xreg = cbind(SKU8train$visit_count,SKU8train$basket_count),stepwise = FALSE)
horizon =nrow(SKU8test)
arimapred8 <- forecast(arimafc8,xreg=cbind(SKU8test$visit_count,SKU8test$basket_count),h=horizon)$mean
arimapred8[arimapred8<0] <- 0
arimap8 <- round(arimapred8)
accuracy(arimap8,SKU8test$sold_count)
mixedp8 <- 0.5*(arimap8+lmp8)
accuracy(mixedp8,SKU8test$sold_count)
#IMPLEMENTATION OF APPROACH 3
visitfc1 <- forecast(auto.arima(SKU1$visit_count,stepwise = FALSE),h=1)$mean
visitfc2 <- forecast(auto.arima(SKU2$visit_count,stepwise = FALSE),h=1)$mean
visitfc3 <- forecast(auto.arima(SKU3$visit_count,stepwise = FALSE),h=1)$mean
visitfc4 <- forecast(auto.arima(SKU4$visit_count,stepwise = FALSE),h=1)$mean
visitfc5 <- forecast(auto.arima(SKU5$visit_count,stepwise = FALSE),h=1)$mean
visitfc6 <- forecast(auto.arima(SKU6$visit_count,stepwise = FALSE),h=1)$mean
visitfc7 <- forecast(auto.arima(SKU7$visit_count,stepwise = FALSE),h=1)$mean
visitfc8 <- forecast(auto.arima(SKU8$visit_count,stepwise = FALSE),h=1)$mean
basketfc1 <- forecast(auto.arima(SKU1$basket_count,stepwise = FALSE),h=1)$mean
basketfc2 <- forecast(auto.arima(SKU2$basket_count,stepwise = FALSE),h=1)$mean
basketfc3 <- forecast(auto.arima(SKU3$basket_count,stepwise = FALSE),h=1)$mean
basketfc4 <- forecast(auto.arima(SKU4$basket_count,stepwise = FALSE),h=1)$mean
basketfc5 <- forecast(auto.arima(SKU5$basket_count,stepwise = FALSE),h=1)$mean
basketfc6 <- forecast(auto.arima(SKU6$basket_count,stepwise = FALSE),h=1)$mean
basketfc7 <- forecast(auto.arima(SKU7$basket_count,stepwise = FALSE),h=1)$mean
basketfc8 <- forecast(auto.arima(SKU8$basket_count,stepwise = FALSE),h=1)$mean
arimafc1 <- auto.arima(SKU1$sold_count,xreg = cbind(SKU1$visit_count,SKU1$basket_count),stepwise = FALSE)
arimapred1 <- forecast(arimafc1,xreg=cbind(visitfc1,basketfc1),h=1)$mean
arimapred1[arimapred1<0] <- 0
lmfc1 <- lm(sold_count~ visit_count+basket_count, data=SKU1)
validlmfc1 <- as.data.frame(cbind(visitfc1,basketfc1))
colnames(validlmfc1) <- cbind("visit_count","basket_count")
lmpred1 <- predict(lmfc1,validlmfc1)
lmpred1[lmpred1<0] <- 0
pred1 <- 0.5*(arimapred1 + lmpred1)
p1 <- round(pred1)
arimafc2 <- auto.arima(SKU2$sold_count,xreg = cbind(SKU2$visit_count,SKU2$basket_count),stepwise = FALSE)
pred2 <- forecast(arimafc2,xreg=cbind(visitfc2,basketfc2),h=1)$mean
pred2[pred2<0] <- 0
p2 <- round(pred2)
arimafc3 <- auto.arima(SKU3$sold_count,xreg = cbind(SKU3$visit_count,SKU3$basket_count),stepwise = FALSE)
pred3 <- forecast(arimafc3,xreg=cbind(visitfc3,basketfc3),h=1)$mean
pred3[pred3<0] <- 0
p3 <- round(pred3)
arimafc4 <- auto.arima(SKU4$sold_count,xreg = cbind(SKU4$visit_count,SKU4$basket_count),stepwise = FALSE)
pred4 <- forecast(arimafc4,xreg=cbind(visitfc4,basketfc4),h=1)$mean
pred4[pred4<0] <- 0
p4 <- round(pred4)
arimafc5 <- auto.arima(SKU5$sold_count,xreg = cbind(SKU5$visit_count,SKU5$basket_count),stepwise = FALSE)
pred5 <- forecast(arimafc5,xreg=cbind(visitfc5,basketfc5),h=1)$mean
pred5[pred5<0] <- 0
p5 <- round(pred5)
arimafc6 <- auto.arima(SKU6$sold_count,xreg = cbind(SKU6$visit_count,SKU6$basket_count),stepwise = FALSE)
pred6 <- forecast(arimafc6,xreg=cbind(visitfc6,basketfc6),h=1)$mean
pred6[pred6<0] <- 0
p6 <- round(pred6)
arimafc7 <- auto.arima(SKU7$sold_count,xreg = cbind(SKU7$visit_count,SKU7$basket_count),stepwise = FALSE)
arimapred7 <- forecast(arimafc7,xreg=cbind(visitfc7,basketfc7),h=1)$mean
arimapred7[arimapred7<0] <- 0
lmfc7 <- lm(sold_count~ visit_count+basket_count, data=SKU7)
validlmfc7 <- as.data.frame(cbind(visitfc7,basketfc7))
colnames(validlmfc7) <- cbind("visit_count","basket_count")
lmpred7 <- predict(lmfc7,validlmfc7)
lmpred7[lmpred7<0] <- 0
pred7 <- 0.5*(arimapred7 + lmpred7)
p7 <- round(pred7)
lmfc8 <- lm(sold_count~ visit_count+basket_count, data=SKU8)
validlmfc8 <- as.data.frame(cbind(visitfc8,basketfc8))
colnames(validlmfc8) <- cbind("visit_count","basket_count")
lmpred8 <- predict(lmfc8,validlmfc8)
lmpred8[lmpred8<0] <- 0
p8 <- round(lmpred8)
predictions[,forecast:=rbind(p1,p2,p3,p4,p5,p6,p7,p8)]

IE 360 TERM PROJECT / GROUP 7