R代写|Econ 321 Homework 3 combined

这是一篇关于导入“nal_cloud_compressed.dta”和“srpp_cloud_compressed.dta”数据集。然后使用键将它们合并在一起,并调用生成的数据帧“mended_house”的R代写

 

One individual from your your group will submit two deliverables for this project:

  1. An R file of your code.
  2. Code must be commented so that what you are doing is clear. Use the commented question lines as a starting framework. The code should be written so that once I change the file paths to import the datasets, everything should work out of the box.
  1. Answers to specific questions should be entered via Canvas Quizzes.
  2. All uploaded figures should be professionally labeled and titled.

Import the “nal_cloud_compressed.dta” and “srpp_cloud_compressed.dta” datasets. Then merge them together using the key and call the resulting data frame “merged_house”. You may now remove NAL and SRPP from your R environment if your computer has limited RAM.

  1. Restrict your sample to properties that meet the following criteria and call the resulting data frame “sfh” (for single family home). How many observations are in sfh?
  1. Residential class code (classcodes == “R”)
  2. Land used for single family home (see “landuse”)
  3. Sold on or after 1980 (see “owner_dateacquired”)
  4. Using sfh, calculate the mean sale price and acres of residential properties by “addresscity” for all “addresscity”’s that had at least 100 property sales during the sample period.
  1. Visualize via table/figure the mean sale price by “addresscity”.
  2. Which “addresscity” has the highest mean sale price and what is the mean value? Which “addresscity” has the highest mean property “acres” and what is the mean value?
  1. Suppose we are interested in predicting property sale price. Estimate a regression with “ownersaleprice” as the dependent variable and include “addresscity” as a categorical variable (factor variable in R).
  1. Interpret the intercept. Which city is acting as the reference group?
  2. Interpret the point estimate on Brentwood.
  3. When comparing the reference group to other cities, do you see anything problematic that might point toward a data entry error?
  1. Add “finishedsqft” to the specification above and interpret the point estimate on “finishedsqft.”
  2. Suppose that you think the relationship between finished square footage and sale price is nonlinear. Alter the specification above to capture this nonlinearity via using a squared term. Do results imply a nonlinear relationship and can you back this up with a figure?
  1. Estimate the relationship between sale price and address city, finishedsqft, acres, and year.

Control for year as a factor/categorical variable.

  1. Interpret the point estimate on acres. Does it make sense? Why or why not?
  2. Interpret the point estimate on 2019.
  3. Model the relationship between sale price and finishedsqft, acres, year (of sale), and addressfullstreet. Treat year as a factor/categorical variable. There are over 10,000 street names in Nashville, which means that the traditional “lm” command would create over 10,000 binary variables to add as controls. If you run a model like this as you’ve done above with “addresscity”, you might as well take a vacation because it could take hours/days/weeks to run.

“Big data” problems like this are common but fortunately econometricians have created R packages that run models significantly faster than traditional programs. Use the “fixest” package and use “feols” instead of “lm” to estimate this model (documentation here).

  1. What happened to the adjusted r-squared?
  2. What does controlling for addressfullstreet do (statistically speaking) and do you see any issues controlling for the name of the street?
  1. Using the above model, predict the sale price of a residential single-family home sold in 2019, with 1218 square feet, .27 acres, built in 1946, and located on McClellan Avenue. *** You will need to play around with addressfullstreet and “McClellan Avenue” because of the quirky way in which Nashville analysts designed the variable.
  1. Suppose that prior to selling, the seller had the option of enclosing a mudroom that would have added 200 square feet to the finished square footage of the house. If finishing the mudroom would have cost $6,000, would our model predict this to be an economically intelligent decision? Why or why not?
  1. Because of reasons discussed in class, you decide to use a log-linear model where you take a natural log transformation of sale price. Use a log-linear model with the same explanatory variables in part (c) to predict the sale price of a residential single-family home sold in 2019, with 1218 square feet, .27 acres, built in 1946, and located on McClellan Avenue. What is your prediction?
  1. Build the “best” model you can to predict the sale price of a residential single-family home (withheld from this dataset) sold in 2019, with 1218 finished square feet, .268 acres, built in 1946, located on McClellan Avenue, 2 bedrooms, 1 bath, 0 half baths, 0 basement area, crawl space foundation, wood frame exterior wall, 1 story building. Think about nonlinearity,interactions, and trade-offs between linear/log-linear/log-log models. The group with the prediction closest to the true sale price will automatically receive a HW 3 group grade no lower than an A- (it is not advantageous to share your group’s prediction with other groups).