Since the dataset has already been cleaned and integrated by Inside Airbnb, together with the data dictionary which provides detailed descriptions for each variable, the dataset is quite straightforward to read and understand. However, as the dataset is directly obtained from Airbnb listing data, there are some listings whose names contain special characters, emojis, or other languages. Therefore, what we did was to extract all listings with non-English characters in their names and create a subset of the original dataset without these listings in case any of our future research would focus on the information hidden behind the names of the listings. We also obtained a subset with only these listings with special characters and the original dataset if we would like to have a comprehensive understanding using all information provided by the dataset.
We combined the listing file with the reviews file with the Airbnb ids provided in both datasets with R. Since the reviews dataset is simply a record of Airbnb id and the date when the review was made, the date recorded in each observation would correspond to a review made on that particular day. Therefore, by counting the number of observations for each Airbnb id, we were able to find the number of reviews for each Airbnb recorded. Then we used the merge function to add reviews count to the original listing dataset.
We also combined an updated neighborhood dataset to the listing dataset. For the updated neighborhood dataset, we found the median income in each neighborhood from LA times and combined it with the original neighborhood dataset by the name of each neighborhood. Once again, we use the neighborhood name to combine the updated neighborhood dataset to our listing dataset in order to have all our data in a single table.
We mainly used Tableau and R for data visualization, so we converted all subsets of the original dataset into .csv files for convenience.