2023年11月25日

Python代写 | 60711-Cwk3-S-Third

本次Python代写是完成类聚与数据挖掘相关的问题

60711-Cwk3-S-Third

2 Question 1: Clustering (16 marks)
The following sub-questions are about clustering. In general, the topics covered are as follows: *
Question 1.1 focuses on clustering algorithmic behaviour, and their sensitivity to data. * Question
1.2 focuses on method for estimating the number of clusters. * Question 1.3 uses a large real-world
dataset to look at how clustering can be used for knowledge discovery.
The following reading is likely to be of use for the following questions: Note that certain sections may be
useful, you are not expected to read it all!
For this question, we will use multiple datasets, which can all be found on Blackboard. To load
these datasets in, we can use the following code. Note that you may need to adjust the path to the
dataset, depending on where they are located on your system.
2.1 Q1.1 (6 marks)
2.1.1 Q1.1a (2 marks)
Using simple.csv, run the K-Means and single-linkage algorithms (available in scikit-learn) on
the dataset, using the true number of clusters (K = 5). Produce a graph (e.g. bar chart) showing
the performance (measured using the adjusted rand index) across 10 independent runs. Discuss the
results obtained, using your knowledge of how the algorithms work to explain why the behaviour
observed occurred.
Hints: * This question is more difficult without the use of error bars! * For singlelinkage, use AgglomerativeClustering(n_clusters=5, linkage=”single”) * For KMeans, use
KMeans(n_clusters=5, init=”random”, n_init=1) as arguments.
2.3 Q1.3 (5 marks)
For this question, we will use the online_retail_full.csv dataset, which is a real-world dataset
of transactions for an online retail store. Full information about the dataset can be found here.
Here, we do not have true labels, and need to explore the data instead. This is a common scenario
in practice, and will require you explore the data and use clustering (likely requiring multiple
iterations and tweaks) to try to find patterns.
We’re going to investigate whether there are groups of customers, how they are similar, and what
they may represent. For simplicity, we will start by using KMeans as our model, and we’ll remove
some of the columns from our input data. Use a range of K values and whichever techniques in
Q1.2 are useful to propose interesting K value(s). Comment on the clusters that are produced in
terms of the context of the data.
Hints: * As this dataset has no truth, there is a lot of scope in this question – remember to have
some justification for why you have taken the steps you have. * The quality of your final clusters
is not important for marks, as long as you have taken reasonable steps. * The overall aim is to
try to find patterns in the data. KMeans is suggested as a starting point, but it is not always the
best algorithm to use as we have seen in previous questions. * You can create features from the
existing ones. For example, the quantity and price can be multiplied to get a total amount (thus
simplifying the data). Other features may require transformation before they can be used.
3 Question 2: Itemset Rule Mining (4 marks)
For this question, we will be using a real-world dataset which gives the votes of 435 U.S. congressmen
on 16 key issues gathered in the mid-1980s, and also includes their party affiliation as a binary
attribute. This is a purely nominal dataset with some missing values (corresponding to abstentions).
It is normally treated as a classification problem, the task being to predict party affiliation based
on voting patterns. However, association-rule mining can also be applied to this data to seek
interesting associations.
We will be using Weka, both for its utility for itemset rule mining, and to use a different approach
for exploring data. You should have some experience using Weka from the first (non-assessed)
week.
You may need to take screenshots of Weka and include them in your answer below, or copy & paste
the relevant rules. Please ensure that your answer and rules are clearly legible.

程序代写代做C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB

CS代写,留学生编程代写,CS作业代写,Java代写,程序代写，代码代写 | ITCS代写

本网站支持淘宝支付宝微信支付 paypal等等交易。如果不放心可以用淘宝交易！

E-mail:itcsdx@outlook.com 微信:itcsdx

如果您使用手机请先保存二维码，微信识别。如果用电脑，直接掏出手机果断扫描。

数据库代写 | COEN 280 – Database Systems Java代写 | CSSE2002/7023 | Semester 1, 2021 Assignment 2

CONTACT

Assignment Example

Service Scope

Recent Case

2024年10月8日

ITCS代写

Python代写 | 60711-Cwk3-S-Third

CONTACT

Assignment Example

Service Scope

Recent Case

MySQL数据库学习指南：留学生如何在不同国家的课程和就业形势下脱颖而出

北美计算机留学高校整理与热门专业前景分析

留学生计算机代写常见服务有哪些？

留学生程序代写靠谱吗

留学生如何选择机器学习方向的专业

Tags