Practice Problem 6

The following practice problem is aimed to give you some practice with exploring data and running a linear regression on your own using statistical software. You are welcome to use any statistical software you wish and you are also free to work in groups of up to 3 for this practice problem. If you work in groups, please have every member of the group complete the ICON survey.

Instructions

What to turn in

Please turn in a document that includes any relevant statistics/figures created. You will be asked to complete a graded survey on ICON as part of this practice problem.

Finally, upload the final document to ICON and complete the graded survey.

Due Date

Due around April 22, 2024. No penalty for late submissions as long as it is submitted by May 9th, 2024.

Data

The data for this activity is San Francisco rental data. The data originally come from Tidy Tuesday. I have done some processing to drop some missing data and remove some attributes from the larger data for our use.

Note: Use the data linked here or posted to the IDAS. The data can be obtained in csv format. A short description for each attribute is as follows.

Attribute Name Description
post_id Unique ID
date date
year year
nhood neighborhood
city city
county county
price price in USD
beds n of beds
sqft square feet of rental
room_in_apt room in apartment

Guiding Question

  1. Does the number of bedrooms and year explain variation in the price of the San Francisco rental?

Questions

  1. Fit a linear regression to answer the question #1 above. Use this model equation: price ~ factor(beds) + factor(year).

  2. Evaluate the data conditions for this model.

  3. Does the association for number of bedrooms and year persist after controlling for the square footage? This model equation would help answer this question: price ~ I(sqft - mean(sqft)) + factor(beds) + factor(year).

  4. Is there evidence of an interaction between the square footage and the number of bedrooms? This model equation can fit this model: price ~ I(sqft - mean(sqft)) * factor(beds) + factor(year).

Previous