# 这是一篇关于导入“nal_cloud_compressed.dta”和“srpp_cloud_compressed.dta”数据集。然后使用键将它们合并在一起，并调用生成的数据帧“mended_house”的R代写

One individual from your your group will submit **two **deliverables for this project:

- An R file of your code.
- Code
**must be commented so that what you are doing is clear**. Use the commented question lines as a starting framework. The code should be written so that once I change the file paths to import the datasets, everything should work out of the box.

- Answers to specific questions should be entered via Canvas Quizzes.
- All uploaded figures should be professionally labeled and titled.

Import the “nal_cloud_compressed.dta” and “srpp_cloud_compressed.dta” datasets. Then merge them together using the key and call the resulting data frame “merged_house”. You may now remove NAL and SRPP from your R environment if your computer has limited RAM.

- Restrict your sample to properties that meet the following criteria and call the resulting data frame “sfh” (for single family home). How many observations are in sfh?

- Residential class code (classcodes == “R”)
- Land used for single family home (see “landuse”)
- Sold on or after 1980 (see “owner_dateacquired”)
- Using sfh, calculate the mean sale price and acres of residential properties
*by “addresscity”*for all “addresscity”’s that had at least 100 property sales during the sample period.

- Visualize via table/figure the mean sale price by “addresscity”.
- Which “
*addresscity*” has the highest mean sale price and what is the mean value? Which “*addresscity*” has the highest mean property “acres” and what is the mean value?

- Suppose we are interested in predicting property sale price. Estimate a regression with “ownersaleprice” as the dependent variable and include “addresscity” as a categorical variable (factor variable in R).

- Interpret the intercept. Which city is acting as the reference group?
- Interpret the point estimate on Brentwood.
- When comparing the reference group to other cities, do you see anything problematic that might point toward a data entry error?

- Add “finishedsqft” to the specification above and interpret the point estimate on “finishedsqft.”
- Suppose that you think the relationship between finished square footage and sale price is nonlinear. Alter the specification above to capture this nonlinearity via using a squared term. Do results imply a nonlinear relationship and can you back this up with a figure?

- Estimate the relationship between sale price and address city, finishedsqft, acres, and year.

*Control for year as a factor/categorical variable. *

- Interpret the point estimate on acres. Does it make sense? Why or why not?
- Interpret the point estimate on 2019.
- Model the relationship between sale price and finishedsqft, acres, year (of sale), and addressfullstreet.
*Treat year as a factor/categorical variable.*There are over 10,000 street names in Nashville, which means that the traditional “lm” command would create over 10,000 binary variables to add as controls. If you run a model like this as you’ve done above with “addresscity”, you might as well take a vacation because it could take hours/days/weeks to run.

“Big data” problems like this are common but fortunately econometricians have created R packages that run models significantly faster than traditional programs. Use the “fixest” package and use “feols” instead of “lm” to estimate this model (documentation here).

- What happened to the adjusted r-squared?
- What does controlling for addressfullstreet do (statistically speaking) and do you see any issues controlling for the name of the street?

- Using the above model, predict the sale price of a residential single-family home sold in 2019, with 1218 square feet, .27 acres, built in 1946, and located on McClellan Avenue.
**** You will need to play around with addressfullstreet and “McClellan Avenue” because**of the quirky way in which Nashville analysts designed the variable.*

- Suppose that prior to selling, the seller had the option of enclosing a mudroom that would have added 200 square feet to the finished square footage of the house. If finishing the mudroom would have cost $6,000, would our model predict this to be an economically intelligent decision? Why or why not?

- Because of reasons discussed in class, you decide to use a log-linear model where you take a natural log transformation of sale price. Use a log-linear model with the same explanatory variables in part (c) to predict the sale price of a residential single-family home sold in 2019, with 1218 square feet, .27 acres, built in 1946, and located on McClellan Avenue. What is your prediction?

- Build the “best” model you can to predict the sale price of a residential single-family home (withheld from this dataset) sold in 2019, with 1218 finished square feet, .268 acres, built in 1946, located on McClellan Avenue, 2 bedrooms, 1 bath, 0 half baths, 0 basement area, crawl space foundation, wood frame exterior wall, 1 story building. Think about nonlinearity,interactions, and trade-offs between linear/log-linear/log-log models. The group with the prediction closest to the true sale price will automatically receive a HW 3 group grade no lower than an A- (it is not advantageous to share your group’s prediction with other groups).

**程序代写代做C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB**

本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！

**E-mail:** itcsdx@outlook.com **微信:**itcsdx

如果您使用手机请先保存二维码，微信识别。如果用电脑，直接掏出手机果断扫描。