DS 310 Machine Learning
Project #4: Drug User Prediction
Task: This is a multi-class classification on the homeless youth dataset. The task is to
predict what kind of drug the homeless youth use, or whether he/she does not use any drug.
There is no test set in this dataset, so your target is to maximize the 10-fold cross validation
Just a reminder: you are required to predict the behavior of homeless youth, so the first
thing you should do is to remove those who are not homeless.
Dataset: Homeless youth dataset (we have seen it before). You can download it on Canvas.
Note that all datapoints which have missing labels can be discarded. On the other hand,
NaN or any other kind of spurious feature values in some datapoints should be replaced (and
missing values should be filled) with some reasonable approach; Or you can simply choose
to discard the feature containing spurious NaN type values altogether. Please make sure you
have enough data for training.
Deliverables: Each team needs to accomplish the following tasks:
1. Measure Score: At the end, we will evaluate the AUC as the team’s final submission
2. Jupyter notebook: Each team posts its final model in Jupyter notebook, named as
3. Project Report: Project report in PDF that contains all the details of the major steps
of the project.
Grading Rubric (100 points)
• 80 points: performance (65 points) and ranking (15 points) , measured in AUC. 50
points for baseline performance and 15 points for better performance.
• 10 points: submitted Jupyter notebook, submission.ipynb.
• 10 points: quality of report.
本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！
E-mail: [email protected] 微信:itcsdx