Python代写 | 机器学习 | BENG0095


Predicting Credit Card Default

Practical Application of Machine Learning to a Real World Problem

BENG0095 Assignment 2 


Assignment Release Date: Friday 2 nd November, 2018 Assignment Submission Date: Thursday 8 th January, 2019 Weighting: 50% of module total Final Submission Format: Jupyter Notebook file

This is a group assignment: As a first step, please arrange yourself in groups of three and then register your group on Moodle under the ‘Group Assignment:Group Choices’ section under the ‘Assessment’ tab.

Assignment Description

This assignment will get you to examine one of the oldest and still current use cases of machine learning in finance: credit card default prediction.

You are provided with data taken from the ‘Default of Credit Card Clients’ Data Set taken from the UCI Machine Learning Repository (Lichman, M. (2013). UCI Machine Learning Repository []. Irvine, CA: University of California, School of Information and Computer Science). The data is split into a training and a test data set. Your task is to use the training data to build a machine learning model that can predict the outcome of credit card default on the test data.

You will need to build a model that predicts the value of the default_payment_next_month variable which can take the values {0,1} indicating no default or default. You can use the training data along with a suitable evaluation method (i.e. splitting the training data into training, validation, and test sets) to train and validate your model.

The primary purpose of this assignment is not the final predictive accuracy you obtain, but rather your approach when attempting this problem and the level of understanding that you show. Be creative and ask questions of this data, engineer your own features as an input to your classifier, and think hard about what constitutes success. There are many existing approaches to credit card default prediction so do a search of the literature to find inspiration. Most of all, please reflect upon what you have learned during the term, and seek to apply machine learning in a way that is appropriate for this task.

The assignment submission will take the form of a Jupyter notebook (containing the source code of your approach as well as in-line documentation forming the write-up). The notebook should contain an analysis of the performance of your classifier on the data from the test set file.

Data Description

The data is available online via the course’s Moodle page. There are two files on the course webpage: CreditCard_train.csv and CreditCard_test.csv. Each of these are described below.

CreditCard_train.csv: This file contains the data that you are to train and evaluate your model on. It consists of input and output data for the task. The features are as follows:

ID: The client id X1: Amount of the given credit (NT dollar): it includes both the individual consumer credit and his/her family (supplementary) credit X2: Gender (1 = male; 2 = female).

X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others).

X4: Marital status (1 = married; 2 = single; 3 = others).

X5: Age (year).

CreditCard_test.csv: This file contains the data that will be used to perform the final predictions which form part of your submission analysis.

Getting Started

Some points to help you get started:

There are lots of papers available online that detail different approaches to this problem. It is worth spending some time at the start of the project doing background research and getting a feel for the data but do reference any work that you have taken inspiration from.

Also, please note that while this is a well-explored data set in the literature, you will receive a portion of your marks for the novelty with which you approach the problem. If you merely seek to re-implement an existing solution you should not expect high marks.

You can use existing libraries such as scikit-learn to provide implementations of key algorithms. I do not expect you to write your own versions of individual algorithms.

All source code should be written in Python.

Notebook Submission Format and Structure

A notebook should follow the following structure:

1. Introduction A brief description of your approach to the problem and the results that you have obtained on the training data.

2. Data Import This section is how you import the data into the notebook. It should be written in such a way that I can modify it to run on my own machine by simply changing the location of the training data and any additional data sources that you have used.

3. Data Transformation and Exploration Any transformations that you apply to the data prior to training. Also, any exploration of the data that you performed such as visualization, feature selection, etc.

4. Methodology Overview Start by describing in broad terms your methodology. Include any background reading you may have done and a step by step description of how you have trained and evaluated your model. Describe any feature engineering that you have applied. If you had attempted different approaches prior to landing on your final methodology, then describe those approaches here.

5. Model training/validation This contains a breakdown of how your model was trained and evaluated.

6. Results Here you show the results that you obtain using your model on the training data. If you have multiple variations or approaches, this is where you compare them.

7. Final predictions on test set This is the section where you perform your final predictions on the test set using the model that you have trained in the previous section.

Keep in mind that your notebook should be written in such a way that I can modify the location of the data and then step through your notebook to obtain the same results as you have submitted.


本网站支持淘宝 支付宝 微信支付  paypal等等交易。如果不放心可以用淘宝交易!

E-mail: [email protected]  微信:dmxyzl003