knime数据挖掘代写 | Data Mining Assignment
This assignment is a practical data analytics project that follows on from the data exploration you did in Assignment 2.
You will be acting as a data scientist at a consultant company and you need to make a prediction on a dataset. The dataset can be found below.
You need to build classifiers using the techniques covered in the lectures to predict the class attribute. At the very minimum, you need to produce a classifier for each method we have covered. However, if you explore the problem very thoroughly (as you should do in the industry), preprocessing the data, looking at different methods, choosing their best parameters settings and identifying the best classifier in a principled and explainable way, then you should be able to get a better mark. If you choose to use KNIME and you show ‘expert’ use (i.e. exploring multiple classifiers, with different settings, choosing the best in a principled way and being able to explain why you built the model the way you did), this will attract a better mark. If you choose to use R or Python to build, optimise and test different models, this will also attract better marks.
You need to write a short report describing how you solved the problem and the results you found. See below for requirements.
You also need to attend a short oral defence of your classifier of around 5 minutes where you show the classifier (e.g. using the KNIME workflow or Python/R code) and answer some questions about it. Details about oral defences will be given by email and in class.
Below you will find 3 datasets: a weather dataset to build and optimise your model (it contains the target values), an “unknown” dataset for the final model assessment (it does not have the target values – you need to predict them) and a submission sample which shows you what the file submitted to Kaggle should look like. In particular, you will need to set the column names in your submission file correctly – that is, “Row ID” and “Predict_RainTomorrow”.
For this dataset, you only have the attribute headings and a brief description of what they mean, which you can find here: Assignment3-Attribute-Description.pdf Download Assignment3-Attribute-Description.pdf
Build a classifier that classifies the “RainTomorrow” attribute.
You can do different data pre-processing and transformations (e.g. grouping values of attributes, converting them to binary, etc.), providing explanations for why you have chosen to do that. You may need to split the training set into training, validation and test sets to accurately set the parameters and evaluate the quality of the classifier.
You can use KNIME to build classifiers. Feel free to use any other tool such as R, Weka, Python, Orange, scikit-learn or other software. If you do this, though, please explain more about your classifier – and be sure that you are producing valid results! You don’t need to limit yourself to the classifiers we used in class, but if you do use other classifiers you need to describe them in your report and make sure you are producing valid results.
A hint: usually it’s not a case of having a ‘better’ classifier that will produce good results. Rather, it’s a case of identifying or generating good features that can be used to solve the problem.