R代写|Econ 321 Homework 3 combined
这是一篇关于导入“nal_cloud_compressed.dta”和“srpp_cloud_compressed.dta”数据集。然后使用键将它们合并在一起,并调用生成的数据帧“mended_house”的R代写
One individual from your your group will submit two deliverables for this project:
- An R file of your code.
- Code must be commented so that what you are doing is clear. Use the commented question lines as a starting framework. The code should be written so that once I change the file paths to import the datasets, everything should work out of the box.
- Answers to specific questions should be entered via Canvas Quizzes.
- All uploaded figures should be professionally labeled and titled.
Import the “nal_cloud_compressed.dta” and “srpp_cloud_compressed.dta” datasets. Then merge them together using the key and call the resulting data frame “merged_house”. You may now remove NAL and SRPP from your R environment if your computer has limited RAM.
- Restrict your sample to properties that meet the following criteria and call the resulting data frame “sfh” (for single family home). How many observations are in sfh?
- Residential class code (classcodes == “R”)
- Land used for single family home (see “landuse”)
- Sold on or after 1980 (see “owner_dateacquired”)
- Using sfh, calculate the mean sale price and acres of residential properties by “addresscity” for all “addresscity”’s that had at least 100 property sales during the sample period.
- Visualize via table/figure the mean sale price by “addresscity”.
- Which “addresscity” has the highest mean sale price and what is the mean value? Which “addresscity” has the highest mean property “acres” and what is the mean value?
- Suppose we are interested in predicting property sale price. Estimate a regression with “ownersaleprice” as the dependent variable and include “addresscity” as a categorical variable (factor variable in R).
- Interpret the intercept. Which city is acting as the reference group?
- Interpret the point estimate on Brentwood.
- When comparing the reference group to other cities, do you see anything problematic that might point toward a data entry error?
- Add “finishedsqft” to the specification above and interpret the point estimate on “finishedsqft.”
- Suppose that you think the relationship between finished square footage and sale price is nonlinear. Alter the specification above to capture this nonlinearity via using a squared term. Do results imply a nonlinear relationship and can you back this up with a figure?
- Estimate the relationship between sale price and address city, finishedsqft, acres, and year.
Control for year as a factor/categorical variable.
- Interpret the point estimate on acres. Does it make sense? Why or why not?
- Interpret the point estimate on 2019.
- Model the relationship between sale price and finishedsqft, acres, year (of sale), and addressfullstreet. Treat year as a factor/categorical variable. There are over 10,000 street names in Nashville, which means that the traditional “lm” command would create over 10,000 binary variables to add as controls. If you run a model like this as you’ve done above with “addresscity”, you might as well take a vacation because it could take hours/days/weeks to run.
“Big data” problems like this are common but fortunately econometricians have created R packages that run models significantly faster than traditional programs. Use the “fixest” package and use “feols” instead of “lm” to estimate this model (documentation here).
- What happened to the adjusted r-squared?
- What does controlling for addressfullstreet do (statistically speaking) and do you see any issues controlling for the name of the street?
- Using the above model, predict the sale price of a residential single-family home sold in 2019, with 1218 square feet, .27 acres, built in 1946, and located on McClellan Avenue. *** You will need to play around with addressfullstreet and “McClellan Avenue” because of the quirky way in which Nashville analysts designed the variable.
- Suppose that prior to selling, the seller had the option of enclosing a mudroom that would have added 200 square feet to the finished square footage of the house. If finishing the mudroom would have cost $6,000, would our model predict this to be an economically intelligent decision? Why or why not?
- Because of reasons discussed in class, you decide to use a log-linear model where you take a natural log transformation of sale price. Use a log-linear model with the same explanatory variables in part (c) to predict the sale price of a residential single-family home sold in 2019, with 1218 square feet, .27 acres, built in 1946, and located on McClellan Avenue. What is your prediction?
- Build the “best” model you can to predict the sale price of a residential single-family home (withheld from this dataset) sold in 2019, with 1218 finished square feet, .268 acres, built in 1946, located on McClellan Avenue, 2 bedrooms, 1 bath, 0 half baths, 0 basement area, crawl space foundation, wood frame exterior wall, 1 story building. Think about nonlinearity,interactions, and trade-offs between linear/log-linear/log-log models. The group with the prediction closest to the true sale price will automatically receive a HW 3 group grade no lower than an A- (it is not advantageous to share your group’s prediction with other groups).
CONTACT
Service Scope
C|C++|Java|Python|Matlab|Android|Jsp|Prolog|MIPS|Haskell|R|Linux|C#|PHP|SQL|
.Net|Hadoop|Processing|JS|Ruby|Scala|Rust|Data Mining|数据库|Oracle|Mysql|
Sqlite|IOS|Data Mining|网络编程|多线程编程|Linux编程|操作系统|
计算机网络|留学生|编程|程序|代写|加急|个人代写|作业代写|Assignment