Introduction & Problem Description

Turkey has been sailing through tough economic waters lately, marked by rising inflation, declining national currency, and shifting consumer confidence. In the midst of these difficulties, speculation and discussion have centered on the real estate market. There is a common misconception that inhabitants of nearby nations—Iraq, Iran, Syria, Russia, and Afghanistan, for example—are becoming Turkish citizens and, as a result, driving up property values by making large housing purchases. The rising expense of housing and rent has been attributed to this phenomena, which has reduced the purchasing power of the domestic population.

The goal of this project is to conduct an empirical investigation into the relationships between the various economic statistics that the Central Bank of the Republic of Turkey provides and the impact that new citizenship is thought to have on house prices and sales. We will investigate whether there is a measurable correlation between the Consumer Confidence Index, household financial conditions, the overall state of the economy, and housing sales in Turkey, and specifically in Istanbul, using a combination of time series data manipulation and regression analysis. We will also investigate the quantity of particular visitors—likely new residents—in order to see how it relates to the dynamics of the property market.

In addition to giving the anecdotal observations a statistical basis, our analysis will shed light on the larger economic ramifications of these changes. Using information from Google Trends and the CBRT, we hope to create a story that makes sense and is in line with the current state of Turkey’s economy.

Research Questions

  1. What impact do changes in the state of the economy have on the selling of real estate or real rents in Turkey, and Istanbul in particular?

    This inquiry aims to investigate any potential relationships between consumer attitude and real estate purchase decisions, offering insight into the psychological effects of economic indicators on housing markets.

  2. Is there a statistical connection between the rise in home sales and rental rates in Turkey and the quantity of new citizenships awarded to foreign people, particularly those who come from Iraq, Syria, Iran, Russia, and Afghanistan, besides economic effects?

    This issue attempts to clarify whether there is factual basis for the idea that new citizenships are the driving force behind the changes in the housing market by evaluating immigration statistics along with home sales and price indices.

  3. Do external economic factors, such as exchange rate fluctuations, have a more pronounced effect on the real estate market than internal factors like the specific visitors’ number?

    This inquiry seeks to determine which factors—internal social dynamics or external economic pressures—have a greater bearing on the trends in the real estate market.

  4. Can Google Trends data on citizenship-related search terms predict future trends in house sales in Turkey?

    This inquiry looks at whether there’s a correlation between interest in citizenship/citizenship application searches and real sales by leveraging online search activity.


Required Packages and Deafult Data Settings

## 
## The downloaded binary packages are in
##  /var/folders/5q/fp_p1x2n2tsf936p87tljqh00000gn/T//RtmpXew0ey/downloaded_packages
## 
## The downloaded binary packages are in
##  /var/folders/5q/fp_p1x2n2tsf936p87tljqh00000gn/T//RtmpXew0ey/downloaded_packages
## 
## The downloaded binary packages are in
##  /var/folders/5q/fp_p1x2n2tsf936p87tljqh00000gn/T//RtmpXew0ey/downloaded_packages
## 
## The downloaded binary packages are in
##  /var/folders/5q/fp_p1x2n2tsf936p87tljqh00000gn/T//RtmpXew0ey/downloaded_packages
## 
## The downloaded binary packages are in
##  /var/folders/5q/fp_p1x2n2tsf936p87tljqh00000gn/T//RtmpXew0ey/downloaded_packages
## 
## The downloaded binary packages are in
##  /var/folders/5q/fp_p1x2n2tsf936p87tljqh00000gn/T//RtmpXew0ey/downloaded_packages
## 
## The downloaded binary packages are in
##  /var/folders/5q/fp_p1x2n2tsf936p87tljqh00000gn/T//RtmpXew0ey/downloaded_packages
## 
## The downloaded binary packages are in
##  /var/folders/5q/fp_p1x2n2tsf936p87tljqh00000gn/T//RtmpXew0ey/downloaded_packages
## 
## The downloaded binary packages are in
##  /var/folders/5q/fp_p1x2n2tsf936p87tljqh00000gn/T//RtmpXew0ey/downloaded_packages
## 
## The downloaded binary packages are in
##  /var/folders/5q/fp_p1x2n2tsf936p87tljqh00000gn/T//RtmpXew0ey/downloaded_packages
## 
## The downloaded binary packages are in
##  /var/folders/5q/fp_p1x2n2tsf936p87tljqh00000gn/T//RtmpXew0ey/downloaded_packages
## 
## The downloaded binary packages are in
##  /var/folders/5q/fp_p1x2n2tsf936p87tljqh00000gn/T//RtmpXew0ey/downloaded_packages
## 
## The downloaded binary packages are in
##  /var/folders/5q/fp_p1x2n2tsf936p87tljqh00000gn/T//RtmpXew0ey/downloaded_packages
library(ggcorrplot)
## Loading required package: ggplot2
library(readxl)
library(readr)
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(RColorBrewer)
library(zoo)
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(corrplot)
## corrplot 0.92 loaded
library(viridis)
## Loading required package: viridisLite
library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
library(forecast)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
# Read the combined dataset
data <- read_csv("/Users/ilyada/Desktop/1/Data1_Gen.csv")
## Rows: 48 Columns: 12
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Date
## dbl (9): Trend, Consumer_Confidence_Index, Households_Fin_Sit, Turkey_House_Sales, Istanbul_House_Sales, Specific_Visitors_Number, Dolar_A...
## num (2): Real_Rent, Istanbul_House_Prices
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Ensure the Date column is in the appropriate Date format
data$Date <- as.Date(paste0(data$Date, "-01"))
list(data)
## [[1]]
## # A tibble: 48 × 12
##    Date       Trend Consumer_Confidence_Index Households_Fin_Sit Turkey_House_Sales Istanbul_House_Sales Specific_Visitors_Number Real_Rent
##    <date>     <dbl>                     <dbl>              <dbl>              <dbl>                <dbl>                    <dbl>     <dbl>
##  1 2020-01-01     1                      79.4               66.6              51243                13423                   439475      531.
##  2 2020-02-01     2                      79.3               67.1              30472                23714                   311229      536.
##  3 2020-03-01     3                      80.4               67.2              29230                15187                   126930      539.
##  4 2020-04-01     4                      77.4               63.4              30488                14941                    11618      541.
##  5 2020-05-01     5                      75.5               61.3              35310                15247                     8709      543.
##  6 2020-06-01     6                      74.6               59.7              31641                17408                    13744      546.
##  7 2020-07-01     7                      71.5               56.2              25886                15724                    31510      552.
##  8 2020-08-01     8                      68                 56.2              34413                13578                   322331      557.
##  9 2020-09-01     9                      80.1               64.5              26952                18435                   651583      561.
## 10 2020-10-01    10                      85.1               69.1              32899                13944                   671408      565.
## # ℹ 38 more rows
## # ℹ 4 more variables: Dolar_Alis <dbl>, Arabic_Citizenship <dbl>, Istanbul_House_Prices <dbl>, `log(House_Prices)` <dbl>

Remarks

Inspecting Category Variables

Target Variables Correlations:

  1. Arabic_Citizenship: This variable shows a high positive correlation with “Dolar_Alis” (0.953) and “Real_Rent” (0.909), suggesting that as the exchange rate and rents increase, there might be an increase in the citizenship granted, possibly indicating investment-driven citizenships.

  2. Turkey_House_Sales: This variable has a very high positive correlation with “Istanbul_House_Sales” (0.981), implying that house sales in Istanbul are a strong predictor of national house sales, which could be expected if Istanbul represents a large portion of the national market.

  3. Real_Rent: The correlation is strong with “Dolar_Alis” (0.942) and “Arabic_Citizenship” (0.909), indicating that increases in rent may be associated with the exchange rate and possibly the number of citizenships granted to Arabic nationals.

Notes

  • Predictor Variables: In terms of predictor variables for the three target variables, we need to look for those with the highest absolute correlations that are also meaningful from an economic standpoint. For example, exchange rates and perhaps other economic indicators could be strong predictors, but care must be taken to understand the direction of causality.

  • Model Implications: When building the three time series models, we will need to consider not just the correlation but also the potential for multicollinearity, the impact of outliers, the stationarity of the series, and any time-based dependencies such as trends or seasonality.

  • Multicollinearity: There is evidence of multicollinearity, given the high correlation between variables like “Dolar_Alis” and “Real_Rent.” This will need to be addressed in the time series models, possibly by using techniques like Principal Component Analysis (PCA) for dimensionality reduction or regularization to penalize complex models.

Model 1

  1. What impact do changes in the state of the economy have on the selling/prices of real estate or real rents in Turkey, and Istanbul in particular?

Visualization

library(ggplot2)
library(readr)
library(lubridate)
library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:viridis':
## 
##     viridis_pal
## The following object is masked from 'package:readr':
## 
##     col_factor
# Read the combined dataset
data <- read_csv("/Users/ilyada/Desktop/1/Data1_Gen.csv")
## Rows: 48 Columns: 12
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Date
## dbl (9): Trend, Consumer_Confidence_Index, Households_Fin_Sit, Turkey_House_Sales, Istanbul_House_Sales, Specific_Visitors_Number, Dolar_A...
## num (2): Real_Rent, Istanbul_House_Prices
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary(data)
##      Date               Trend       Consumer_Confidence_Index Households_Fin_Sit Turkey_House_Sales Istanbul_House_Sales
##  Length:48          Min.   : 1.00   Min.   :63.40             Min.   :44.80      Min.   :14848      Min.   : 6113       
##  Class :character   1st Qu.:12.75   1st Qu.:72.85             1st Qu.:56.20      1st Qu.:29133      1st Qu.:15232       
##  Mode  :character   Median :24.50   Median :79.20             Median :62.40      Median :34862      Median :19174       
##                     Mean   :24.50   Mean   :77.62             Mean   :61.99      Mean   :36893      Mean   :20664       
##                     3rd Qu.:36.25   3rd Qu.:81.60             3rd Qu.:67.12      3rd Qu.:40413      3rd Qu.:24564       
##                     Max.   :48.00   Max.   :91.10             Max.   :77.40      Max.   :77889      Max.   :39432       
##  Specific_Visitors_Number   Real_Rent        Dolar_Alis     Arabic_Citizenship Istanbul_House_Prices log(House_Prices)
##  Min.   :   8709          Min.   : 531.5   Min.   : 5.920   Min.   :  5.00     Min.   : 5056         Min.   :4.172    
##  1st Qu.: 282894          1st Qu.: 578.4   1st Qu.: 7.697   1st Qu.: 36.75     1st Qu.: 6361         1st Qu.:4.464    
##  Median : 550232          Median : 651.9   Median :13.525   Median : 99.00     Median :10935         Median :4.542    
##  Mean   : 616388          Mean   : 840.9   Mean   :14.016   Mean   :141.12     Mean   :18124         Mean   :4.542    
##  3rd Qu.: 950014          3rd Qu.: 968.8   3rd Qu.:18.670   3rd Qu.:217.25     3rd Qu.:28144         3rd Qu.:4.606    
##  Max.   :1525212          Max.   :1972.6   Max.   :29.020   Max.   :457.00     Max.   :44557         Max.   :4.891
# Ensure the Date column is in the appropriate Date format
data$Date <- as.Date(paste0(data$Date, "-01"))

# Extract Year and Month
data$Year <- year(data$Date)
data$Month <- month(data$Date)

1.1. Visualization of Istanbul House Sales

# Plot
ggplot(data, aes(x = factor(Month), y = Istanbul_House_Sales, fill = factor(Month))) +
  geom_bar(stat = "identity", color = "black") +
  facet_wrap(~Year, nrow = 3, ncol = 2) +
  theme(legend.position = "none",
        axis.ticks.x = element_blank(),
        axis.text.x = element_blank()) +
  scale_fill_viridis_d(begin = 0.2, end = 0.9, direction = 1, option = "A") +
  labs(x="Months", y="Istanbul House Sales", title="Histogram of Monthly Istanbul House Sales")

# Time series line plot with points
ggplot(data, aes(x=Date, y=Istanbul_House_Sales)) +
  geom_line() +
  geom_point(color="coral") +
  theme_minimal() +
  scale_x_date(date_breaks = "1 month", date_labels = "%b %Y") +
  theme(axis.text.x = element_text(angle=45, hjust=1)) +
  labs(x="Date", y="Istanbul House Sales", title="Monthly Istanbul House Sales Over Time") +
  scale_y_continuous(labels=scales::comma)

# Create the box plot
ggplot(data, aes(x=Year, y=Istanbul_House_Sales)) +
  geom_boxplot(aes(fill=factor(year(Date)))) +  # Fill box by Year for color distinction
  theme_minimal() +
  theme(legend.position = "none") +  # Remove legend
  labs(x="Year", y="Istanbul House Sales", title="Yearly Distribution of Istanbul House Sales") +
  scale_fill_viridis_d()  # Use viridis discrete color scale for better visuals

A clear seasonal trend can be seen in the visual data analysis of Istanbul house sales, with some months continuously registering greater sales numbers than others. Although there is considerable annual variety, a comparison of house sales year over year indicates a rising trend. The boxplot suggests that sales numbers have been fluctuating more over time, suggesting that the housing market is becoming more dynamic and thus more prone to abrupt changes. Sales peaks may be caused by a number of things, including as market incentives, economic policies, or other outside events.

1.2. Visualization of Istanbul House Prices

# Plot
ggplot(data, aes(x = factor(Month), y = Istanbul_House_Prices, fill = factor(Month))) +
  geom_bar(stat = "identity", color = "black") +
  facet_wrap(~Year, nrow = 3, ncol = 2) +
  theme(legend.position = "none",
        axis.ticks.x = element_blank(),
        axis.text.x = element_blank()) +
  scale_fill_viridis_d(begin = 0.2, end = 0.9, direction = 1, option = "D") +
  labs(x="Months", y="Istanbul_House_Prices", title="Histogram of Monthly Istanbul House Prices")

# Time series line plot with points
ggplot(data, aes(x=Date, y=Istanbul_House_Prices)) +
  geom_line() +
  geom_point(color="coral") +
  theme_minimal() +
  scale_x_date(date_breaks = "1 month", date_labels = "%b %Y") +
  theme(axis.text.x = element_text(angle=45, hjust=1)) +
  labs(x="Date", y="Istanbul House Prices", title="Monthly Istanbul House Prices Over Time") +
  scale_y_continuous(labels=scales::comma)

# Create the box plot
ggplot(data, aes(x=Year, y=Istanbul_House_Prices)) +
  geom_boxplot(aes(fill=factor(year(Date)))) +  # Fill box by Year for color distinction
  theme_minimal() +
  theme(legend.position = "none") +  # Remove legend
  labs(x="Year", y="Istanbul House Prices", title="Yearly Distribution of Istanbul House Prices") +
  scale_fill_viridis_d()  # Use viridis discrete color scale for better visuals

The price of houses has been steadily rising throughout the years, with bigger price ranges being seen in each succeeding year. It appears from the steadily rising trend that Istanbul’s housing costs have been rapidly increasing. Every year, the median house price has increased, and the price spread has widened as well, suggesting that average prices are rising along with the range of prices.

1.3. Visualization of Real Rent

# Plot
ggplot(data, aes(x = factor(Month), y = Real_Rent, fill = factor(Month))) +
  geom_bar(stat = "identity", color = "black") +
  facet_wrap(~Year, nrow = 3, ncol = 2) +
  theme(legend.position = "none",
        axis.ticks.x = element_blank(),
        axis.text.x = element_blank()) +
  scale_fill_viridis_d(begin = 0.2, end = 0.9, direction = 1, option = "E") +
  labs(x="Months", y="Real Rent", title="Histogram of Monthly Real Rent")

# Time series line plot with points
ggplot(data, aes(x=Date, y=Real_Rent)) +
  geom_line() +
  geom_point(color="coral") +
  theme_minimal() +
  scale_x_date(date_breaks = "1 month", date_labels = "%b %Y") +
  theme(axis.text.x = element_text(angle=45, hjust=1)) +
  labs(x="Date", y="Real Rent", title="Monthly Real Rent Over Time") +
  scale_y_continuous(labels=scales::comma)

# Create the box plot
ggplot(data, aes(x=Year, y=Real_Rent)) +
  geom_boxplot(aes(fill=factor(year(Date)))) +  # Fill box by Year for color distinction
  theme_minimal() +
  theme(legend.position = "none") +  # Remove legend
  labs(x="Year", y="Real Rent", title="Yearly Distribution of Real Rent") +
  scale_fill_viridis_d()  # Use viridis discrete color scale for better visuals

The accompanying visualizations show a gradual growth in Istanbul real rent from 2020 to 2023, with a notable spike in the latter half of that year. Plots of the time series highlight the consistent increase during the measured duration. The box plots show how rents have spread out over time, showing rising unpredictability along with the main upward trend.

1.4. Visualization of Specific Visitors’ Number

# Plot
ggplot(data, aes(x = factor(Month), y = Specific_Visitors_Number, fill = factor(Month))) +
  geom_bar(stat = "identity", color = "black") +
  facet_wrap(~Year, nrow = 3, ncol = 2) +
  theme(legend.position = "none",
        axis.ticks.x = element_blank(),
        axis.text.x = element_blank()) +
  scale_fill_viridis_d(begin = 0.2, end = 0.9, direction = 1, option = "F") +
  labs(x="Months", y="Specific Visitors Number", title="Histogram of Monthly Specific Visitors Number")

# Time series line plot with points
ggplot(data, aes(x=Date, y=Specific_Visitors_Number)) +
  geom_line() +
  geom_point(color="coral") +
  theme_minimal() +
  scale_x_date(date_breaks = "1 month", date_labels = "%b %Y") +
  theme(axis.text.x = element_text(angle=45, hjust=1)) +
  labs(x="Date", y="Specific Visitors Number", title="Monthly Specific Visitors Number Over Time") +
  scale_y_continuous(labels=scales::comma)

# Create the box plot
ggplot(data, aes(x=Year, y=Specific_Visitors_Number)) +
  geom_boxplot(aes(fill=factor(year(Date)))) +  # Fill box by Year for color distinction
  theme_minimal() +
  theme(legend.position = "none") +  # Remove legend
  labs(x="Year", y="Specific Visitors Number", title="Yearly Distribution of Specific Visitors Number") +
  scale_fill_viridis_d()  # Use viridis discrete color scale for better visuals

The number of visitors increases significantly in the later months of the year, suggesting a seasonal pattern or perhaps an event that attracts more visitors during that time. There’s a clear cyclical pattern with peaks and troughs. The peaks might indicate a time of year with increased visitor activity, which seems to be consistent annually. It's noticeable that there’s an upward trend in the median number of visitors each year, indicating growth over time.

1.5. Visualization of Households’ Financial Situtation

# Plot
ggplot(data, aes(x = factor(Month), y = Households_Fin_Sit, fill = factor(Month))) +
  geom_bar(stat = "identity", color = "black") +
  facet_wrap(~Year, nrow = 3, ncol = 2) +
  theme(legend.position = "none",
        axis.ticks.x = element_blank(),
        axis.text.x = element_blank()) +
  scale_fill_viridis_d(begin = 0.2, end = 0.9, direction = 1, option = "G") +
  labs(x="Months", y="Households' Financial Situation", title="Histogram of Monthly Households' Financial Situation")

# Time series line plot with points
ggplot(data, aes(x=Date, y=Households_Fin_Sit)) +
  geom_line() +
  geom_point(color="coral") +
  theme_minimal() +
  scale_x_date(date_breaks = "1 month", date_labels = "%b %Y") +
  theme(axis.text.x = element_text(angle=45, hjust=1)) +
  labs(x="Date", y="Households' Financial Situation", title="Monthly Households' Financial Situation Over Time") +
  scale_y_continuous(labels=scales::comma)

# Create the box plot
ggplot(data, aes(x=Year, y=Households_Fin_Sit)) +
  geom_boxplot(aes(fill=factor(year(Date)))) +  # Fill box by Year for color distinction
  theme_minimal() +
  theme(legend.position = "none") +  # Remove legend
  labs(x="Year", y="Households' Financial Situation", title="Yearly Distribution of Households' Financial Situation") +
  scale_fill_viridis_d()  # Use viridis discrete color scale for better visuals

The consistent height of the bars across months suggests a stable financial situation without significant monthly fluctuations. A time series from 2019 to a timeframe that looks to extend into 2023 is displayed on the line graph. The financial status of the households fluctuates more dramatically in this graph, with dips and rises. The tendency appears to be increasing over time, which could indicate a general improvement in the state of the economy or could be a reflection of seasonal or economic cycles. The median household financial condition shows a general increasing tendency, suggesting that households may be getting better off year on average. But in some years, there are anomalies or a greater range of data points, indicating that various households have varied financial circumstances.

1.6. Visualization of Consumer Confidence Index

# Plot
ggplot(data, aes(x = factor(Month), y = Consumer_Confidence_Index, fill = factor(Month))) +
  geom_bar(stat = "identity", color = "black") +
  facet_wrap(~Year, nrow = 3, ncol = 2) +
  theme(legend.position = "none",
        axis.ticks.x = element_blank(),
        axis.text.x = element_blank()) +
  scale_fill_viridis_d(begin = 0.2, end = 0.9, direction = 1, option = "B") +
  labs(x="Months", y="Consumer Confidence Index", title="Histogram of Monthly Consumer Confidence Index")

# Time series line plot with points
ggplot(data, aes(x=Date, y=Consumer_Confidence_Index)) +
  geom_line() +
  geom_point(color="coral") +
  theme_minimal() +
  scale_x_date(date_breaks = "1 month", date_labels = "%b %Y") +
  theme(axis.text.x = element_text(angle=45, hjust=1)) +
  labs(x="Date", y="Households' Financial Situation", title="Monthly Consumer Confidence Index Over Time") +
  scale_y_continuous(labels=scales::comma)

# Create the box plot
ggplot(data, aes(x=Year, y=Consumer_Confidence_Index)) +
  geom_boxplot(aes(fill=factor(year(Date)))) +  # Fill box by Year for color distinction
  theme_minimal() +
  theme(legend.position = "none") +  # Remove legend
  labs(x="Year", y="Consumer Confidence Index", title="Yearly Distribution of Consumer Confidence Index") +
  scale_fill_viridis_d()  # Use viridis discrete color scale for better visuals

Although there is considerable variance across months, the height of the bars indicates that the confidence levels are generally constant throughout each year. There are obvious oscillations, with certain peaks and troughs signifying different customer confidence levels. After a notable decline, there seems to be a general recovery in confidence, with sporadic declines that may be connected to occurrences in the economy or seasonal changes. There is some variance from year to year, with certain years exhibiting a greater range of consumer confidence levels, which may signify times of economic instability or transition.

1.7. Visualization of Exchange Rate ($)

# Plot
ggplot(data, aes(x = factor(Month), y = Dolar_Alis, fill = factor(Month))) +
  geom_bar(stat = "identity", color = "black") +
  facet_wrap(~Year, nrow = 3, ncol = 2) +
  theme(legend.position = "none",
        axis.ticks.x = element_blank(),
        axis.text.x = element_blank()) +
  scale_fill_viridis_d(begin = 0.2, end = 0.9, direction = 1, option = "D") +
  labs(x="Months", y="Exchange Rate", title="Histogram of Monthly Exchange Rate")

# Time series line plot with points
ggplot(data, aes(x=Date, y=Dolar_Alis)) +
  geom_line() +
  geom_point(color="coral") +
  theme_minimal() +
  scale_x_date(date_breaks = "1 month", date_labels = "%b %Y") +
  theme(axis.text.x = element_text(angle=45, hjust=1)) +
  labs(x="Date", y="Exchange Rate", title="Monthly Exchange Rate Over Time") +
  scale_y_continuous(labels=scales::comma)

# Create the box plot
ggplot(data, aes(x=Year, y=Dolar_Alis)) +
  geom_boxplot(aes(fill=factor(year(Date)))) +  # Fill box by Year for color distinction
  theme_minimal() +
  theme(legend.position = "none") +  # Remove legend
  labs(x="Year", y="Exchange Rate", title="Yearly Distribution of Exchange Rate") +
  scale_fill_viridis_d()  # Use viridis discrete color scale for better visuals

It is evident that the bars gradually get taller as each year draws to a finish, indicating an increasing trend in the values of the exchange rates over time. Starting in late 2021, there is a notable increasing trend that keeps going up sharply. Over this time frame, the trend shows a steady growth in exchange rate values. The annual median values appear to be rising, which suggests that exchange rates have increased during these years. The boxes’ spread and range indicate that the exchange rates fluctuate and are volatile from year to year.

Comparisons

  • The peaks in Istanbul house sales do not appear to correlate directly with the patterns observed in the specific visitors graph, suggesting that the factors driving real estate sales are different from those influencing visitor numbers.

  • The household financial situation and consumer confidence graphs both show signs of recovery and growth over time, but the confidence index has more pronounced fluctuations. This could mean that while the overall financial situation is improving, consumer sentiment is more sensitive to short-term factors.

  • The upward trends in the financial situation and consumer confidence might be expected to correlate with increased house sales, as improved financial circumstances and confidence can lead to more real estate investment. However, the house sales graph indicates that other factors are also at play, given its variability.

  • The simultaneous increase in the cost of homes, rent, and currency may point to a larger economic trend in Istanbul, such as inflation or a property market boom, which may be driven by foreign investment or regional economic policy.

  • If the market is dependent on a foreign currency or is impacted by the dynamics of foreign investment, the increase in the exchange rate may also have an impact on rent and home prices.

  • The trend of searches for “Arabic Citizenship” does not appear to be directly correlated with economic variables such as the exchange rate or the cost of homes and rent. This tendency may have broader political or social roots, and it may be a reflection of interest in citizenship as a result of Istanbul’s economic potential.

Regression Models

1. Regression Model on Real Rent

1.1. Raw Model

data0 <- read_csv("/Users/ilyada/Desktop/1/Data1_Gen.csv")
## Rows: 48 Columns: 12
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Date
## dbl (9): Trend, Consumer_Confidence_Index, Households_Fin_Sit, Turkey_House_Sales, Istanbul_House_Sales, Specific_Visitors_Number, Dolar_A...
## num (2): Real_Rent, Istanbul_House_Prices
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Ensure the Date column is in the appropriate Date format
data0$Date <- as.Date(paste0(data0$Date, "-01"))


#log-transform used to stabilze variance
l_fit1.1 = lm(log(Real_Rent) ~ .,data=data0) #log-transform used to stabilze variance
l_fit1.1
## 
## Call:
## lm(formula = log(Real_Rent) ~ ., data = data0)
## 
## Coefficients:
##               (Intercept)                       Date                      Trend  Consumer_Confidence_Index         Households_Fin_Sit  
##                -2.505e+02                  1.400e-02                 -4.119e-01                 -1.969e-02                  2.362e-02  
##        Turkey_House_Sales       Istanbul_House_Sales   Specific_Visitors_Number                 Dolar_Alis         Arabic_Citizenship  
##                -3.937e-06                 -2.192e-06                 -4.933e-09                 -9.478e-03                 -7.699e-05  
##     Istanbul_House_Prices        `log(House_Prices)`  
##                 1.622e-05                  3.571e-01
summary(l_fit1.1)
## 
## Call:
## lm(formula = log(Real_Rent) ~ ., data = data0)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.105398 -0.024328  0.004076  0.036496  0.078021 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               -2.505e+02  2.531e+02  -0.990   0.3289    
## Date                       1.400e-02  1.387e-02   1.009   0.3196    
## Trend                     -4.119e-01  4.228e-01  -0.974   0.3364    
## Consumer_Confidence_Index -1.969e-02  3.885e-03  -5.067 1.22e-05 ***
## Households_Fin_Sit         2.362e-02  4.383e-03   5.390 4.55e-06 ***
## Turkey_House_Sales        -3.937e-06  2.640e-06  -1.491   0.1446    
## Istanbul_House_Sales      -2.192e-06  1.278e-06  -1.715   0.0949 .  
## Specific_Visitors_Number  -4.933e-09  2.664e-08  -0.185   0.8542    
## Dolar_Alis                -9.478e-03  8.030e-03  -1.180   0.2456    
## Arabic_Citizenship        -7.699e-05  2.374e-04  -0.324   0.7476    
## Istanbul_House_Prices      1.622e-05  4.721e-06   3.437   0.0015 ** 
## `log(House_Prices)`        3.571e-01  2.436e-01   1.466   0.1514    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05614 on 36 degrees of freedom
## Multiple R-squared:  0.9838, Adjusted R-squared:  0.9788 
## F-statistic: 198.3 on 11 and 36 DF,  p-value: < 2.2e-16
checkresiduals(l_fit1.1,12)

## 
##  Breusch-Godfrey test for serial correlation of order up to 12
## 
## data:  Residuals
## LM test = 35.102, df = 12, p-value = 0.0004511

Comments

  • Turkey_House_Sales, Istanbul_House_Sales, Specific_Visitors_Number, Dolar_Alis, Arabic_Citizenship: These variables are not statistically significant at the typical alpha levels of 0.05, 0.01, etc.

  • Consumer_Confidence_Index: This predictor is highly significant (p-value < 0.001) and negative, indicating a strong negative association with the dependent variable.

  • Households_Fin_Sit: Also highly significant (p-value < 0.001) with a positive coefficient, indicating a strong positive association with the dependent variable.

  • Istanbul_House_Prices: Statistically significant at the 0.01 level, with a positive association with the dependent variable.

The overall fit of the model seems to be very good, with a multiple R-squared of 0.9838, indicating that about 98.38% of the variability in the dependent variable is explained by the model. The adjusted R-squared, which adjusts for the number of predictors, is also very high (0.9788), suggesting that the model fits the data well and is not unduly complicated.

The F-statistic is very large (198.3), and with a p-value practically at zero, it indicates that the overall model is statistically significant, and there is a relationship between the predictors and the dependent variable.

  1. Residuals Plot: This plot should ideally show no clear pattern. The presence of patterns can indicate non-linearity, autocorrelation, or other violations of the regression assumptions. The plot of our raw model that all variables included shows some patterns that suggest the possibility of non-linear relationships not captured by the model.

  2. ACF Plot: The plot shows significant autocorrelation at several lags, as indicated by the bars extending beyond the blue dashed significance bounds. This suggests that the residuals are not independent of one another, which violates one of the key assumptions of linear regression.

  3. Histogram of Residuals: This suggests the residuals are approximately normally distributed, but there might be some slight deviation from normality, the plot seem like left-skewed. The existence of longer tails is indicated by the spikes at the far ends of the histogram, which suggests potential outliers or heavy tails not captured by the normal distribution.

  4. Breusch-Godfrey Test: The test result indicates that there is significant autocorrelation in the residuals (p-value = 0.0006285), which means that the residuals are not independent across observations. This can occur in time-series data where subsequent values are correlated with past values.

cor_matrix <- cor(data0[, sapply(data0, is.numeric)], use = "complete.obs", method = "pearson")

# Visualizing the correlation matrix
ggcorrplot(cor_matrix, hc.order = TRUE, type = "lower",
           lab = TRUE, lab_size = 2, tl.srt = 45, # Use tl.srt to rotate labels
           lab_col = "black", sig.level = 0.0)

It’s important to check whether the indicator variables are correlated among themselves. After fitting linear regression models, we’ll check them as well.

#Irrelevant indicators due to research question is discarded
#Other economic situation indicators are discarded since Households_Fin_Sit is a better indicator for real rent levels (asked Vedat Akgiray)
l_fit1.2 = lm(log(Real_Rent) ~ . - Consumer_Confidence_Index - Specific_Visitors_Number - Arabic_Citizenship,data=data0)
l_fit1.2
## 
## Call:
## lm(formula = log(Real_Rent) ~ . - Consumer_Confidence_Index - 
##     Specific_Visitors_Number - Arabic_Citizenship, data = data0)
## 
## Coefficients:
##           (Intercept)                   Date                  Trend     Households_Fin_Sit     Turkey_House_Sales   Istanbul_House_Sales  
##            -2.620e+02              1.461e-02             -4.420e-01              4.601e-03             -3.242e-06             -1.520e-06  
##            Dolar_Alis  Istanbul_House_Prices    `log(House_Prices)`  
##             1.211e-02              1.794e-05              3.430e-01
summary(l_fit1.2)
## 
## Call:
## lm(formula = log(Real_Rent) ~ . - Consumer_Confidence_Index - 
##     Specific_Visitors_Number - Arabic_Citizenship, data = data0)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.149217 -0.032709 -0.001028  0.049320  0.163641 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)   
## (Intercept)           -2.620e+02  3.207e+02  -0.817   0.4190   
## Date                   1.461e-02  1.758e-02   0.831   0.4110   
## Trend                 -4.420e-01  5.359e-01  -0.825   0.4145   
## Households_Fin_Sit     4.601e-03  2.710e-03   1.698   0.0975 . 
## Turkey_House_Sales    -3.242e-06  3.349e-06  -0.968   0.3390   
## Istanbul_House_Sales  -1.520e-06  1.627e-06  -0.934   0.3560   
## Dolar_Alis             1.211e-02  9.024e-03   1.342   0.1872   
## Istanbul_House_Prices  1.794e-05  5.321e-06   3.372   0.0017 **
## `log(House_Prices)`    3.430e-01  3.067e-01   1.118   0.2703   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07284 on 39 degrees of freedom
## Multiple R-squared:  0.9704, Adjusted R-squared:  0.9643 
## F-statistic: 159.8 on 8 and 39 DF,  p-value: < 2.2e-16
checkresiduals(l_fit1.2,12)

## 
##  Breusch-Godfrey test for serial correlation of order up to 12
## 
## data:  Residuals
## LM test = 43.048, df = 12, p-value = 2.216e-05
  • The Istanbul_House_Prices variable has a significant positive effect on log(Real_Rent) at the 0.01 level, indicating that as house prices in Istanbul increase, there is a significant increase in the percentage change in real rent.

  • Households_Fin_Sit shows a positive coefficient, suggesting that an improvement in households’ financial situation is associated with an increase in the percentage change in real rent, but the p-value indicates only marginal significance.

  • Other variables such as Date, Trend, Turkey_House_Sales, Istanbul_House_Sales, Dolar_Alis, and log(House_Prices) are not statistically significant, which suggests that they may not have a strong linear effect on log(Real_Rent) or there might be collinearity issues that obscure their effects.

  • The adjusted R-squared is very high (0.9643), suggesting that the model explains a significant portion of the variability in log(Real_Rent).

  • The residual standard error is relatively low, and the distribution of residuals does not deviate significantly from normality, which is positive for the model assumptions.

  • However, the Breusch-Godfrey test indicates significant autocorrelation in the residuals, which violates the assumption of independence and suggests that the model might be improved by addressing this issue.

The residual analysis shows an approximate normal distribution with constant variance, which supports some of the linear regression assumptions. However, the presence of autocorrelation, as evidenced by the ACF plot and confirmed by the Breusch-Godfrey test, indicates that the model may benefit from incorporating time series elements or lagged variables to account for this temporal dependency.

#Category variable selection
selected_corr <- cor_matrix[
  c("Trend","Turkey_House_Sales", "Households_Fin_Sit", "Istanbul_House_Sales", "Istanbul_House_Prices", "Dolar_Alis"),
  c("Trend","Turkey_House_Sales", "Households_Fin_Sit","Istanbul_House_Sales", "Istanbul_House_Prices", "Dolar_Alis")]

ggcorrplot(selected_corr, 
           hc.order = TRUE, 
           type = "lower",
           lab = TRUE)

The linear regression model “l_fit1.2” identifies Istanbul_House_Prices as a key factor influencing the log-transformed real rent, although the relationship is marginally significant. The lagged variables Real_Rent_lag1 and Real_Rent_lag11 are highly significant, revealing the temporal dynamics within the real estate market.

Despite the high significance of the lagged variables, the presence of autocorrelation, as revealed by the Breusch-Godfrey test, suggests that additional temporal structure may need to be accounted for in the model. This could include exploring further lags or considering time-series modeling approaches that can better capture the correlation between observations over time.

Note: The model exhibits an extremely high fit, with an Adjusted R-squared of 0.9983, indicating that nearly all the variance in the log of real rent is explained by the predictors in the model. However, the exceptionally high R-squared value should be approached with caution, as it may indicate overfitting, especially considering the small sample size after the deletion of observations due to lag inclusion.

  • The model predicts the log of Real_Rent, with Istanbul_House_Prices showing a near-significant p-value (0.061), suggesting an important relationship with the log of real rent.

  • Real_Rent_lag1 and Real_Rent_lag11 are included as lagged independent variables and are highly significant, indicating that past values of real rent have a strong influence on the current value, which may capture autoregressive behavior in the series.

  • The residuals plot shows no clear patterns, suggesting that the model does not suffer from obvious non-linearity or heteroscedasticity.

  • The ACF plot shows some significant autocorrelation at certain lags, as indicated by the bars that cross the blue dashed significance bounds. (They had been taken care of for lag1 and lag11)

  • The histogram of residuals seems to suggest a fairly normal distribution, although with a sharp peak, which could imply some kurtosis in the distribution of residuals.

  • The test for serial correlation up to lag 12 is significant (p-value = 0.01111), indicating that autocorrelation is still present in the residuals and should be addressed.

data0$Real_Rent_lag1 <- lag(data0$Real_Rent, 1)
data0$Real_Rent_lag11 <- lag(data0$Real_Rent, 11)
data0$Real_Rent_lag15 <- lag(data0$Real_Rent, 15)

l_fit1.3 = lm(log(Real_Rent) ~ .-Households_Fin_Sit - Consumer_Confidence_Index - Specific_Visitors_Number- Dolar_Alis - Arabic_Citizenship + Real_Rent_lag1 + Real_Rent_lag11 - Real_Rent_lag15 + Real_Rent_lag1,data=data0)
l_fit1.3
## 
## Call:
## lm(formula = log(Real_Rent) ~ . - Households_Fin_Sit - Consumer_Confidence_Index - 
##     Specific_Visitors_Number - Dolar_Alis - Arabic_Citizenship + 
##     Real_Rent_lag1 + Real_Rent_lag11 - Real_Rent_lag15 + Real_Rent_lag1, 
##     data = data0)
## 
## Coefficients:
##           (Intercept)                   Date                  Trend     Turkey_House_Sales   Istanbul_House_Sales  Istanbul_House_Prices  
##            -2.771e+01              1.907e-03             -5.391e-02              2.449e-06             -2.832e-07              8.743e-06  
##   `log(House_Prices)`         Real_Rent_lag1        Real_Rent_lag11  
##            -2.069e-01              9.181e-04             -8.868e-04
summary(l_fit1.3)
## 
## Call:
## lm(formula = log(Real_Rent) ~ . - Households_Fin_Sit - Consumer_Confidence_Index - 
##     Specific_Visitors_Number - Dolar_Alis - Arabic_Citizenship + 
##     Real_Rent_lag1 + Real_Rent_lag11 - Real_Rent_lag15 + Real_Rent_lag1, 
##     data = data0)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.039169 -0.010872  0.000783  0.007759  0.065993 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -2.771e+01  1.406e+02  -0.197   0.8454    
## Date                   1.907e-03  7.702e-03   0.248   0.8065    
## Trend                 -5.391e-02  2.368e-01  -0.228   0.8218    
## Turkey_House_Sales     2.449e-06  1.330e-06   1.841   0.0780 .  
## Istanbul_House_Sales  -2.832e-07  6.067e-07  -0.467   0.6449    
## Istanbul_House_Prices  8.743e-06  3.614e-06   2.419   0.0235 *  
## `log(House_Prices)`   -2.069e-01  1.196e-01  -1.730   0.0965 .  
## Real_Rent_lag1         9.181e-04  1.582e-04   5.804 5.53e-06 ***
## Real_Rent_lag11       -8.868e-04  4.092e-04  -2.167   0.0404 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0226 on 24 degrees of freedom
##   (15 observations deleted due to missingness)
## Multiple R-squared:  0.9973, Adjusted R-squared:  0.9965 
## F-statistic:  1126 on 8 and 24 DF,  p-value: < 2.2e-16
checkresiduals(l_fit1.3,12)

## 
##  Breusch-Godfrey test for serial correlation of order up to 12
## 
## data:  Residuals
## LM test = 25.694, df = 12, p-value = 0.01186

Istanbul_House_Prices plays a significant role in predicting the variations in the real rent levels. The model results and diagnostics highlight the importance of this variable, reflecting the economic intuition that housing prices in a major market like Istanbul would indeed be indicative of rent levels. Given Istanbul’s substantial impact on national housing market trends, it is logical to observe its house prices as a determinant of rent prices.

However, while Istanbul_House_Prices is an influential factor, the model also hints at the complexity of the real estate market, as indicated by the necessity to incorporate lagged variables to capture the dynamic nature of rent prices. This inclusion of temporal elements suggests that past rent prices exert a lasting influence on current rents, a reflection of potential inertia or trends in the housing market.

The exclusion of variables such as Households_Fin_Sit in the presence of high multicollinearity allows for a more stable model where Istanbul_House_Prices becomes a standout predictor. This decision underscores a methodological consideration in regression modeling—balancing the inclusion of diverse factors with the need to minimize statistical distortions that can arise from closely interrelated variables.

The regression model sheds light on the dynamics of real rent pricing, with Istanbul_House_Prices emerging as a critical predictor. This is consistent with market observations, as Istanbul’s property market significantly influences nationwide trends. The connection between house sales prices and rent levels in Istanbul is not surprising, considering the city’s leading role in Turkey’s real estate economy. With Istanbul’s housing prices being a major contributor to the overall housing price index in Turkey, the model corroborates the economic theory that these prices have a consequential impact on the rents. In examining the rental and sales market, we observe that property sale prices in Istanbul are a bellwether for understanding fluctuations in real rent levels.

Then, let’s try to find out what drivers in play for house prices.

Model 2

2.1 Raw Model

data0 <- read_csv("/Users/ilyada/Desktop/1/Data1_Gen.csv")
## Rows: 48 Columns: 12
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Date
## dbl (9): Trend, Consumer_Confidence_Index, Households_Fin_Sit, Turkey_House_Sales, Istanbul_House_Sales, Specific_Visitors_Number, Dolar_A...
## num (2): Real_Rent, Istanbul_House_Prices
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Ensure the Date column is in the appropriate Date format
data0$Date <- as.Date(paste0(data0$Date, "-01"))


#log-transform used to stabilze variance
l_fit2.1 = lm(log(Istanbul_House_Prices) ~ .,data=data0) #log-transform used to stabilze variance
l_fit2.1
## 
## Call:
## lm(formula = log(Istanbul_House_Prices) ~ ., data = data0)
## 
## Coefficients:
##               (Intercept)                       Date                      Trend  Consumer_Confidence_Index         Households_Fin_Sit  
##                 8.590e+01                 -4.467e-03                  1.686e-01                 -6.364e-03                  1.770e-02  
##        Turkey_House_Sales       Istanbul_House_Sales   Specific_Visitors_Number                  Real_Rent                 Dolar_Alis  
##                -9.119e-06                 -2.519e-06                 -7.589e-08                 -8.962e-04                  7.637e-02  
##        Arabic_Citizenship        `log(House_Prices)`  
##                 9.317e-04                  7.970e-01
summary(l_fit2.1)
## 
## Call:
## lm(formula = log(Istanbul_House_Prices) ~ ., data = data0)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.27771 -0.06156  0.01861  0.06144  0.23403 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                8.590e+01  5.238e+02   0.164 0.870637    
## Date                      -4.467e-03  2.870e-02  -0.156 0.877190    
## Trend                      1.686e-01  8.739e-01   0.193 0.848144    
## Consumer_Confidence_Index -6.364e-03  1.121e-02  -0.568 0.573848    
## Households_Fin_Sit         1.770e-02  1.278e-02   1.384 0.174740    
## Turkey_House_Sales        -9.119e-06  5.782e-06  -1.577 0.123541    
## Istanbul_House_Sales      -2.519e-06  2.775e-06  -0.908 0.370038    
## Specific_Visitors_Number  -7.589e-08  5.489e-08  -1.383 0.175302    
## Real_Rent                 -8.962e-04  2.212e-04  -4.052 0.000259 ***
## Dolar_Alis                 7.637e-02  1.499e-02   5.096 1.12e-05 ***
## Arabic_Citizenship         9.317e-04  4.329e-04   2.152 0.038143 *  
## `log(House_Prices)`        7.970e-01  5.330e-01   1.495 0.143537    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1167 on 36 degrees of freedom
## Multiple R-squared:  0.9837, Adjusted R-squared:  0.9787 
## F-statistic: 197.6 on 11 and 36 DF,  p-value: < 2.2e-16
checkresiduals(l_fit2.1,12)

## 
##  Breusch-Godfrey test for serial correlation of order up to 12
## 
## data:  Residuals
## LM test = 30.64, df = 12, p-value = 0.002235
  • The significant coefficient of “Dolar_Alis” in the regression model predicting log-transformed Istanbul house prices underscores the impact of exchange rates on the housing market. This assumed to be attributed to the dependence of construction costs on imported materials, which are affected by exchange rate fluctuations.

  • The inclusion of the lagged variable of Istanbul house prices indicates that past prices are predictive of current ones, reflecting the continuity and perhaps speculative trends in the real estate market.

  • The high R-squared value suggests that the model explains a large proportion of the variance in house prices, with the exchange rate being a key driver.

It’s clear that “Istanbul_House_Prices” and “Real_Rent” is highly related. But when we look other variables, “Dolar_Alis” and “Arabic_Citizenship” came into play. Let’s dive in their correlations.

#Category variable selection
selected_corr2 <- cor_matrix[
  c("Real_Rent","Dolar_Alis", "Arabic_Citizenship"),
  c("Real_Rent","Dolar_Alis", "Arabic_Citizenship")]

ggcorrplot(selected_corr2, 
           hc.order = TRUE, 
           type = "lower",
           lab = TRUE)

  • According to economic literature, the cost of building materials, often influenced by exchange rates, can significantly affect both house prices and rental levels. The results from the regression analysis align with this theory, as changes in the “Dolar_Alis” appear to have a direct and substantial impact on “Istanbul_House_Prices.”

  • The financial indicators such as “Households_Fin_Sit” and “Consumer_Confidence_Index” likely influence the exchange rate, which in turn, creates a cascading effect impacting real rent levels and housing prices. This relationship illustrates how macroeconomic factors are interlinked with the real estate market, suggesting that the exchange rate serves as a transmission mechanism through which broader economic conditions are reflected in the housing sector.

  • But what about “Arabic_Citizenship”? (Check out 2.3 Version)

2.2 Version

# We created a lagged version of the dependent variable since ACF indicates significant dependences.
data0$Istanbul_House_Prices_lag1 <- lag(data0$Istanbul_House_Prices, 1)
data0$Istanbul_House_Prices_lag2 <- lag(data0$Istanbul_House_Prices, 2)
data0$Istanbul_House_Prices_lag3 <- lag(data0$Istanbul_House_Prices, 3)

l_fit2.2 = lm(log(Istanbul_House_Prices) ~ Dolar_Alis+Istanbul_House_Prices_lag1+Istanbul_House_Prices_lag2++Istanbul_House_Prices_lag3,data=data0)

l_fit2.2
## 
## Call:
## lm(formula = log(Istanbul_House_Prices) ~ Dolar_Alis + Istanbul_House_Prices_lag1 + 
##     Istanbul_House_Prices_lag2 + +Istanbul_House_Prices_lag3, 
##     data = data0)
## 
## Coefficients:
##                (Intercept)                  Dolar_Alis  Istanbul_House_Prices_lag1  Istanbul_House_Prices_lag2  Istanbul_House_Prices_lag3  
##                  8.324e+00                   3.518e-02                   1.644e-04                  -4.558e-05                  -8.953e-05
summary(l_fit2.2)
## 
## Call:
## lm(formula = log(Istanbul_House_Prices) ~ Dolar_Alis + Istanbul_House_Prices_lag1 + 
##     Istanbul_House_Prices_lag2 + +Istanbul_House_Prices_lag3, 
##     data = data0)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.33627 -0.07462 -0.01147  0.07892  0.35866 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 8.324e+00  8.998e-02  92.510  < 2e-16 ***
## Dolar_Alis                  3.518e-02  1.602e-02   2.195  0.03399 *  
## Istanbul_House_Prices_lag1  1.644e-04  4.825e-05   3.408  0.00151 ** 
## Istanbul_House_Prices_lag2 -4.558e-05  8.621e-05  -0.529  0.59996    
## Istanbul_House_Prices_lag3 -8.953e-05  4.995e-05  -1.792  0.08063 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1494 on 40 degrees of freedom
##   (3 observations deleted due to missingness)
## Multiple R-squared:  0.9671, Adjusted R-squared:  0.9638 
## F-statistic: 294.2 on 4 and 40 DF,  p-value: < 2.2e-16
checkresiduals(l_fit2.2,12)

## 
##  Breusch-Godfrey test for serial correlation of order up to 12
## 
## data:  Residuals
## LM test = 41.206, df = 12, p-value = 4.527e-05

The analysis indicates that the exchange rate, denoted as “Dolar_Alis,” is closely linked to Istanbul’s real estate market. This connection is supported by economic theories which propose that construction costs—strongly influenced by the exchange rate—play a critical role in determining house prices and rental rates. As the exchange rate fluctuates, it directly affects the cost of imported construction materials, thereby influencing the pricing trends in Istanbul’s housing market.

The financial climate in Turkey, as captured by indicators such as “Households_Fin_Sit” and “Consumer_Confidence_Index,” appears to exert a substantial influence on the exchange rate. This relationship then ripples through to the real estate market, suggesting a pronounced knock-on effect whereby economic conditions influence the exchange rate, which in turn, affects both the cost of housing and the levels of real rent. This pattern underscores the interconnectedness of macroeconomic variables with the housing sector, and the pivotal role of the exchange rate in mediating these effects.

2.3 Verison

  • What about “Arabic_Citizenship”?
# We created a lagged version of the dependent variable (here we use 'lag(Real_Rent, 1)' to indicate a lag of one period) since ACF indicates significant dependences.
data0$Istanbul_House_Prices_lag1 <- lag(data0$Istanbul_House_Prices, 1) # no need to use -1, just 1 for a lag of one period
data0$Istanbul_House_Prices_lag2 <- lag(data0$Istanbul_House_Prices, 2)
data0$Istanbul_House_Prices_lag3 <- lag(data0$Istanbul_House_Prices, 3)

l_fit2.3 = lm(log(Istanbul_House_Prices) ~ Arabic_Citizenship,data=data0)

l_fit2.3
## 
## Call:
## lm(formula = log(Istanbul_House_Prices) ~ Arabic_Citizenship, 
##     data = data0)
## 
## Coefficients:
##        (Intercept)  Arabic_Citizenship  
##            8.60037             0.00637
summary(l_fit2.3)
## 
## Call:
## lm(formula = log(Istanbul_House_Prices) ~ Arabic_Citizenship, 
##     data = data0)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.92639 -0.17433 -0.06025  0.21003  0.81107 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        8.6003736  0.0668445   128.7   <2e-16 ***
## Arabic_Citizenship 0.0063699  0.0003662    17.4   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2938 on 46 degrees of freedom
## Multiple R-squared:  0.8681, Adjusted R-squared:  0.8652 
## F-statistic: 302.6 on 1 and 46 DF,  p-value: < 2.2e-16
checkresiduals(l_fit2.3,12)

## 
##  Breusch-Godfrey test for serial correlation of order up to 12
## 
## data:  Residuals
## LM test = 27.043, df = 12, p-value = 0.007619
  • The residuals plot shows variability around zero but no clear trend or pattern, which might suggest that the model does capture much of the systematic structure in the data.

  • The ACF plot shows evidence of autocorrelation at several lags, as indicated by bars extending beyond the blue dashed significance bounds, which is concerning for a regression model’s residuals.

  • The histogram of the residuals indicates a distribution that has a peak close to the center and a spread that suggests moderate variability, with a possible slight skewness, although not excessively pronounced.

  • The significant result from the Breusch-Godfrey test (p-value = 0.007619) further corroborates the presence of autocorrelation within the residuals, indicating that there might be a temporal dependency that the current model does not account for.

Given these results, while the model identifies a clear statistical relationship between the presence of Arabic citizenship related Google search and house prices in Istanbul, the data suggests that other dynamic factors are at play that affect house prices over time, which are not fully captured by this model. The presence of autocorrelation hints that house prices in Istanbul might be influenced by past prices or other time-dependent variables not included in the model. Further investigation using time series analysis might be warranted to adequately model these dynamics.

This analysis should be considered in the broader context of Istanbul’s real estate market, where multiple economic and social factors interact complexly to influence house prices, beyond the scope of a single variable like Arabic_Citizenship.

While a statistical correlation between the presence of Arabic citizenship Google search and the increase in house prices in Istanbul is observed, it is crucial to recognize that correlation does not imply causation. The underlying economic conditions within Turkey play a more substantial role in influencing the housing market dynamics. The observed correlation may suggest that, following an economic downturn, while house prices have risen, the relative stability of foreign currencies like the dollar may make property investment more attractive to foreign buyers. Consequently, there could be an uptick in searches related to obtaining citizenship, which may facilitate or be associated with investment behaviors, rather than being driven by a primary desire to acquire Turkish citizenship. This pattern reflects a strategic response to economic conditions, where foreign investors capitalize on the opportunity presented by a devalued local currency to make real estate investments.

Model 3

  1. Is there a statistical connection between the rise in home sales and rental rates in Turkey and the quantity of new citizenships awarded to foreign people, particularly those who come from Iraq, Syria, Iran, Russia, and Afghanistan besides economic effects?
data0 <- read_csv("/Users/ilyada/Desktop/1/Data1_Gen.csv")
## Rows: 48 Columns: 12
## ── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Date
## dbl (9): Trend, Consumer_Confidence_Index, Households_Fin_Sit, Turkey_House_Sales, Istanbul_House_Sales, Specific_Visitors_Number, Dolar_A...
## num (2): Real_Rent, Istanbul_House_Prices
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Ensure the Date column is in the appropriate Date format
data0$Date <- as.Date(paste0(data0$Date, "-01"))

#Irrelevant indicators due to research question is discarded
l_fit3.1 = lm(log(Istanbul_House_Prices) ~ Specific_Visitors_Number+Arabic_Citizenship,data=data0) #log-transform used to stabilze variance
l_fit3.1
## 
## Call:
## lm(formula = log(Istanbul_House_Prices) ~ Specific_Visitors_Number + 
##     Arabic_Citizenship, data = data0)
## 
## Coefficients:
##              (Intercept)  Specific_Visitors_Number        Arabic_Citizenship  
##                8.606e+00                -2.066e-08                 6.422e-03
summary(l_fit3.1)
## 
## Call:
## lm(formula = log(Istanbul_House_Prices) ~ Specific_Visitors_Number + 
##     Arabic_Citizenship, data = data0)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.92556 -0.17629 -0.05841  0.21360  0.80872 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               8.606e+00  7.563e-02  113.78   <2e-16 ***
## Specific_Visitors_Number -2.066e-08  1.291e-07   -0.16    0.874    
## Arabic_Citizenship        6.422e-03  4.912e-04   13.07   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2969 on 45 degrees of freedom
## Multiple R-squared:  0.8681, Adjusted R-squared:  0.8623 
## F-statistic: 148.1 on 2 and 45 DF,  p-value: < 2.2e-16
checkresiduals(l_fit3.1,12)

## 
##  Breusch-Godfrey test for serial correlation of order up to 12
## 
## data:  Residuals
## LM test = 26.986, df = 12, p-value = 0.007763

The regression analysis indicates that Arabic_Citizenship is statistically significant and positively related to Istanbul_House_Prices. This could reflect the impact of foreign investment or demand on the housing market. The variable Specific_Visitors_Number does not show a significant association with house prices, suggesting that visitor numbers may not be a determining factor in this context or that the effect is masked by other unaccounted factors.

The model’s R-squared is quite high, indicating a good fit to the data. However, the residual diagnostics suggest that there might be additional complexity in the relationship between the predictors and the housing prices that has not been fully accounted for. The presence of autocorrelation in the residuals, confirmed by the Breusch-Godfrey test, points to the need for further investigation, potentially through the inclusion of additional lagged variables, model refinement, or consideration of different modeling techniques that can account for serial correlation in time series data.

#Category variable selection
selected_corr <- cor_matrix[
  c("Specific_Visitors_Number", "Dolar_Alis", "Arabic_Citizenship", "Istanbul_House_Prices"),
  c("Specific_Visitors_Number", "Dolar_Alis", "Arabic_Citizenship", "Istanbul_House_Prices")]

ggcorrplot(selected_corr, 
           hc.order = TRUE, 
           type = "lower",
           lab = TRUE)

3.2 Version

data0$Istanbul_House_Prices_lag1 <- lag(data0$Istanbul_House_Prices, 1)
data0$Istanbul_House_Prices_lag2 <- lag(data0$Istanbul_House_Prices, 2)
data0$Istanbul_House_Prices_lag3 <- lag(data0$Istanbul_House_Prices, 3)

#Irrelevant indicators due to research question is discarded
#log-transform used to stabilze variance
l_fit3.2 = lm(log(Istanbul_House_Prices) ~ Arabic_Citizenship,data=data0) 
l_fit3.2
## 
## Call:
## lm(formula = log(Istanbul_House_Prices) ~ Arabic_Citizenship, 
##     data = data0)
## 
## Coefficients:
##        (Intercept)  Arabic_Citizenship  
##            8.60037             0.00637
summary(l_fit3.2)
## 
## Call:
## lm(formula = log(Istanbul_House_Prices) ~ Arabic_Citizenship, 
##     data = data0)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.92639 -0.17433 -0.06025  0.21003  0.81107 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        8.6003736  0.0668445   128.7   <2e-16 ***
## Arabic_Citizenship 0.0063699  0.0003662    17.4   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2938 on 46 degrees of freedom
## Multiple R-squared:  0.8681, Adjusted R-squared:  0.8652 
## F-statistic: 302.6 on 1 and 46 DF,  p-value: < 2.2e-16
checkresiduals(l_fit3.2,12)

## 
##  Breusch-Godfrey test for serial correlation of order up to 12
## 
## data:  Residuals
## LM test = 27.043, df = 12, p-value = 0.007619
  • The regression output indicates that Arabic_Citizenship is a significant predictor of the logarithm of Istanbul_House_Prices. The positive coefficient suggests that as the number of Arabic citizens increases, there is a corresponding increase in the house prices in Istanbul. The Intercept is significantly different from zero, which indicates the expected value of log(Istanbul_House_Prices) when Arabic_Citizenship is zero.

  • There seems to be a pattern in the residuals, which might suggest that the model does not capture all the predictive structure in the data.

  • There are bars that extend beyond the blue dashed significance bounds, indicating that there is autocorrelation in the residuals at various lags. This is a sign that there might be a temporal structure in the data that the model has not accounted for. But when autocorrelation lag extension is added to the model, residuals worsen. Further calibration is required.

  • The histogram of the residuals with the overlaid normal density curve indicates a departure from normality with potential outliers, as seen by the tails.

  • The Breusch-Godfrey test indicates the presence of autocorrelation in the residuals (p-value = 0.007619), which is consistent with the patterns seen in the ACF plot. This suggests that a simple linear model may not be sufficient to model the data and that time series analysis may be required.

The regression model identifies a significant link between the presence of Arabic citizens and the housing prices in Istanbul, with the number of Arabic citizens positively impacting the logarithmic house prices. The pattern in the residuals and the ACF plot suggest the need for a more sophisticated time-series model to capture inherent autocorrelation. The presence of outliers and a potential departure from normality in the residuals could indicate extreme values or non-linearity in the relationship that are not addressed by the current model. Further investigation and possibly the incorporation of additional variables or transformations are recommended to improve the model’s performance and address the autocorrelation observed in the residuals.

Model 4 - Bonus

l_fit4.1 = lm(log(Istanbul_House_Sales) ~ Arabic_Citizenship,data=data0) #log-transform used to stabilze variance
l_fit4.1
## 
## Call:
## lm(formula = log(Istanbul_House_Sales) ~ Arabic_Citizenship, 
##     data = data0)
## 
## Coefficients:
##        (Intercept)  Arabic_Citizenship  
##          9.8434549           0.0002157
summary(l_fit4.1)
## 
## Call:
## lm(formula = log(Istanbul_House_Sales) ~ Arabic_Citizenship, 
##     data = data0)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.20594 -0.21813 -0.01275  0.22860  0.70718 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        9.8434549  0.0843617 116.682   <2e-16 ***
## Arabic_Citizenship 0.0002157  0.0004621   0.467    0.643    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3708 on 46 degrees of freedom
## Multiple R-squared:  0.004712,   Adjusted R-squared:  -0.01692 
## F-statistic: 0.2178 on 1 and 46 DF,  p-value: 0.6429
checkresiduals(l_fit4.1,12)

## 
##  Breusch-Godfrey test for serial correlation of order up to 12
## 
## data:  Residuals
## LM test = 22.17, df = 12, p-value = 0.03566

In this regression analysis of Istanbul’s housing market, the number of Arabic citizens does not significantly influence the log of house sales, indicating that other factors may be at play in determining housing sales dynamics. Despite the significance of the Intercept, the model’s explanatory power is minimal, as reflected by a low R-squared value.

The presence of autocorrelation in the residuals, as detected by the Breusch-Godfrey test, suggests that house sales are influenced by more complex temporal dependencies than are captured in the current model. This warrants further exploration of time-series models or the inclusion of additional explanatory variables that could better account for the trends and cycles in the data. The residual analysis indicates potential model misspecification or omitted variable bias, highlighting the need for a more nuanced approach to understanding the factors driving house sales in Istanbul.

Model 5

data0$Arabic_Citizenship_lag1 <- lag(data0$Arabic_Citizenship, 1)
data0$Arabic_Citizenship_lag4 <- lag(data0$Arabic_Citizenship, 4)

l_fit5.1 = lm(log(Arabic_Citizenship) ~ .,data=data0) #log-transform used to stabilze variance
l_fit5.1
## 
## Call:
## lm(formula = log(Arabic_Citizenship) ~ ., data = data0)
## 
## Coefficients:
##                (Intercept)                        Date                       Trend   Consumer_Confidence_Index          Households_Fin_Sit  
##                 -6.787e+02                   3.706e-02                  -1.057e+00                  -2.839e-02                   1.914e-02  
##         Turkey_House_Sales        Istanbul_House_Sales    Specific_Visitors_Number                   Real_Rent                  Dolar_Alis  
##                 -1.857e-05                  -7.732e-06                   1.423e-07                  -1.211e-03                  -3.010e-03  
##      Istanbul_House_Prices         `log(House_Prices)`  Istanbul_House_Prices_lag1  Istanbul_House_Prices_lag2  Istanbul_House_Prices_lag3  
##                  5.229e-05                   1.801e+00                   4.200e-05                  -1.859e-04                   1.403e-04  
##    Arabic_Citizenship_lag1     Arabic_Citizenship_lag4  
##                  1.152e-03                  -3.362e-03
summary(l_fit5.1)
## 
## Call:
## lm(formula = log(Arabic_Citizenship) ~ ., data = data0)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.63088 -0.10664  0.00537  0.10627  0.28305 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)  
## (Intercept)                -6.787e+02  1.157e+03  -0.587   0.5624  
## Date                        3.706e-02  6.340e-02   0.584   0.5638  
## Trend                      -1.057e+00  1.931e+00  -0.548   0.5885  
## Consumer_Confidence_Index  -2.839e-02  2.614e-02  -1.086   0.2871  
## Households_Fin_Sit          1.914e-02  3.105e-02   0.617   0.5427  
## Turkey_House_Sales         -1.857e-05  1.312e-05  -1.416   0.1683  
## Istanbul_House_Sales       -7.732e-06  6.267e-06  -1.234   0.2279  
## Specific_Visitors_Number    1.423e-07  1.110e-07   1.282   0.2107  
## Real_Rent                  -1.211e-03  5.625e-04  -2.154   0.0404 *
## Dolar_Alis                 -3.010e-03  3.601e-02  -0.084   0.9340  
## Istanbul_House_Prices       5.229e-05  8.575e-05   0.610   0.5471  
## `log(House_Prices)`         1.801e+00  1.226e+00   1.470   0.1531  
## Istanbul_House_Prices_lag1  4.200e-05  1.589e-04   0.264   0.7935  
## Istanbul_House_Prices_lag2 -1.859e-04  1.665e-04  -1.117   0.2740  
## Istanbul_House_Prices_lag3  1.403e-04  9.049e-05   1.551   0.1326  
## Arabic_Citizenship_lag1     1.152e-03  1.271e-03   0.906   0.3729  
## Arabic_Citizenship_lag4    -3.362e-03  1.535e-03  -2.190   0.0373 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2273 on 27 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.9602, Adjusted R-squared:  0.9366 
## F-statistic: 40.71 on 16 and 27 DF,  p-value: 9.177e-15
checkresiduals(l_fit5.1,12)

## 
##  Breusch-Godfrey test for serial correlation of order up to 12
## 
## data:  Residuals
## LM test = 34.567, df = 12, p-value = 0.0005487

The regression results highlight Istanbul_House_Prices as a factor with a notable positive influence on the Arab population in Istanbul. This could reflect a trend of investment in real estate in Istanbul by Arabic citizens. The significant negative coefficient for Arabic_Citizenship_lag4 could indicate that there was a decrease four periods ago that impacts current numbers, potentially due to economic or policy changes affecting migration or investment patterns.

Despite the high R-squared value indicating that the model explains a significant portion of the variability, the presence of autocorrelation as revealed by the Breusch-Godfrey test indicates that a more refined model, possibly a time series model, would be more appropriate for capturing the dynamics affecting the Arab population in Istanbul. The significance of certain lagged variables underlines the importance of considering historical context when evaluating demographic changes.

Conclusion

The initial the argument that some visitors greatly push up property prices and rental rates in Istanbul does not find strong support in the data, despite a thorough examination of multiple regression models and diagnostics. Although there are variations in the pricing and sales of houses, they don’t seem to be statistically significantly connected with the number of tourists arriving from particular nations.

Rather, the economic data indicate that the dynamics of the housing market are mostly driven by internal economic considerations. It seems that the cost of raw materials—which is probably impacted by currency rates and general economic conditions—has a greater bearing on home prices, which in turn affect rents. The property market in Turkey is exhibiting inconsistent patterns, which could lead to unforeseen variations in pricing and sales due to the country’s declining economic welfare.

A yearly increase in tourists from nations such as Iraq, Iran, Syria, and Afghanistan that is associated with rising property prices is not supported by the facts. Nonetheless, there is a noticeable rise in volume, indicating that although the rate of inbound tourists stays constant, there are more people from these nations entering. This pattern suggests that a number of economic factors, in addition to the rising number of tourists, also have an impact on the housing market’s oscillations.

In conclusion, it is more likely that Turkey’s internal economic prospects and problems are responsible for the rise in property purchases and interest in rental properties among tourists from these regions. The increase in citizenship-related inquiries may indicate more the simplicity of the investing process than a primary goal of obtaining Turkish citizenship. The research indicates that rather than having an innate desire to settle in Turkey, these tourists are likely looking to make wise investments, possibly motivated by attractive conditions for international buyers.

Note: Basic code faults and cohesiveness of comments are handled by Chat-GPT4.0.