数据分析代写|COMP7810 Assignment 1
这是一个数据分析的作业代写
Question 1. Design a star schema for the sales information (20 points)
Let us consider the case of a real estate agency whose database is composed by the following
tables:
OWNER (IDOwner, Name, Surnames, Address, City, Phone)
ESTATE (IDEstate, Category, Area, City, Province, Rooms, Bedrooms, Garage, Meters)
CUSTOMER (IDCust, Name, Surname, Budget, Address, City, Phone)
AGENT (IDAgent, Name, Surname, Office, Address, City, Phone)
SALE (IDEstate, IDAgent, IDCust, IDOwner, Time, OfferedPrice, Status)
TIME (TimeID, Date, Month, Year)
Hint: including the fact table and dimension tables.
Question 2. Find a dataset that is personally interesting to you. It may be a publicly-available
dataset, or a dataset for which you have permission to use and share results. There are many
places on to find publicly-available dataset, and simply searching Google for your preferred
topic plus “public dataset” may provide many hits. Here some additional resources to get you
started:
Kaggle Datasets (https://www.kaggle.com/datasets)
US Government datasets (https://catalog.data.gov/dataset)
Center for Disease Control (CDC) data (https://data.cdc.gov)
NASA datasets (https://nssdc.gsfc.nasa.gov)
World Bank Open Data (https://data.worldbank.org)
This should not be the dataset you will use for your group project. It requires your
independent work.
Perform data cleaning and basic data analysis methods on the dataset, using at least two
techniques learned in lecture 3&4. You can use any tools (e.g., excel) or write your own
codes (e.g., Python).
Describe your key findings from the dataset. Make sure you cite the source of the data! (80
points)
Assessment:
You don’t need to submit your code. You just need to:
a) briefly describe how you analyse the dataset (e.g., number of sample/features and
feature types, descriptive summary, boxplot, measure of dispersion, correlation,
regression) (30 points)
b) briefly describe any data cleaning methods you have applied to the dataset (e.g., for
handling missing value, removing noise). If you think your dataset does not need data
cleaning, please describe how you find it is cleaned already (e.g., boxplot shows no
outliers). (30 points)
c) summarize your findings (e.g., two features are related to the dependent variable) (20
points)