US Real Estate Rental Price Analysis
For our final project, we sought to better understand the drivers of rental price for various real estate properties across the US. To that end, we selected five datasets.
real_estate_dataset_df
, comprises an array of variables such as rental pricing, geographic loaction, the number of available bedrooms & bathrooms, amongst other pertinent attributes of rental listings.crime_data_df
, encompasses crime rate statistics across all U.S. states and territories, serving as a proxy for the regional safety.state_gdp_df
, encapsulates the Gross Domestic Product (GDP) per capita for each state, which we posited as an indicator of the local economic status.weather_score_df
, quantifies the climatic livability, particularly temperature.spending_merged_df
, conveys the government expenditures across various sectors within each state for the year 2022.Aggregating, pre-processing these results and merging them with our previous dataset, we are able to develop a holistic understanding of both the intrinsic factors (e.g., room type, facilities available etc) and external factors (crime levels, weather condition, etc.) that drive rental price.
These results are detailed below in my notebook. Above or beneath each relevant visualization or finding, we take care to explain the motivation for the analysis, the key takeaways, and how our findings serve to inform our understanding of the relationships between various internal and external factors and listing price.
I hope you find our findings insightful, and I am eager to answer any questions you may have.
You can straightly run the
.ipynb
file without downloading any extra datasets in Google Colab. If you want to check out the datasets, please refer to theDatasets
folders.
XPath.
-Spending on Police
,
-GDP
-bath
-laundry_options
-Spending on Police
-sqfeet
,
-lat
-laundry_options
-parking_options
-bath
-Linear Regression:
Because we have datasets with multifaceted relationships and variables influencing rental prices (such as crime rates, GDP, weather conditions), Linear Regression's assumption of linearity limits its ability to capture these complex interactions which is probably why we have a low R^2 score.
-RandomForestRegressor:
This model excels in handling diverse datasets with complex and non-linear relationships, like the variety of complex relationships found in our real estate data encompassing economic, geographic, and socio-demographic factors. Its ensemble approach effectively captures the multifaceted nature of such data.
-XGBoost Regressor:
Also particularly suited for datasets with mixed types of variables and intricate patterns, XGBoost efficiently manages the diverse range of features from real estate, economic, and environmental datasets, leveraging its gradient boosting mechanism to improve predictive accuracy.
-Neural Network Model:
While theoretically capable of modeling complex, non-linear relationships in multifaceted datasets, its performance heavily relies on appropriate network architecture and hyperparameter tuning. This is something we can consider as future work to see if trying out different architecture and tuning will imporve results.