INF553 Foundations and Applications of Data Mining
In this assignment, we generated the review data from the original Yelp datasets with some filters, such
as the condition: “state” == “CA”. We randomly took 80% of the data for training, 10% of the data for
testing, and 10% of the data as the blind dataset. We do not share the blind dataset.
You can access the files (a-e) under the fixed directory on the Vocareum:
b. user.json – user metadata
c. business.json – business metadata, including locations, attributes, and categories
d. user_avg.json – containing the average stars for the users in the train dataset
e. business_avg.json – containing the average stars for the businesses in the train dataset
Besides, the Google Drive provides the above files (a-e) and the following testing files (f and g)
f. test_review.json – containing only the target user and business pairs for the prediction task
g. test_review_ratings.json – containing the ground truth rating for the testing pairs
4. Task (5 points)
You need to submit the following files on Vocareum: (all lowercase) a.
[REQUIRED] Two Python scripts: train.py, predict.py
b. [REQUIRED] Model files/folders (you can name them yourself)
c. [REQUIRED] One PDF file: model.pdf (describing your model in 200 words)
d. You can include other Python scripts to support your programs (e.g., callable functions).
4.1 Task description
In the competition project, you will build a recommendation system with the provided datasets on the
Vocareum and use the model(s) to predict the ratings for a given pair of user and business.
本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！
E-mail: [email protected] 微信:itcsdx