作业代写｜Assignment 2 – Latent Variables and Neural Networks

这是一篇关于潜在变量和神经网络的作业代写，以下是部分作业内容：

Objectives

This assignment consists of three parts, which cover latent variables models and neural networks (Modules 4 and 5). The total marks of this assessment are 100 and will contribute 16% to your final s core. In your Jupyter notebook, you can use any import from the libraries numpy, scipy, matplotlib, and scikit-learn to solve the tasks.

1 Document Clustering

In this part, you solve a document clustering problem using unsupervised learning algorithms (i.e., soft and hard Expectation Maximization) for document clustering.

Question 1 [EM for Document Clustering, 40 Marks]

I Derive Expectation and Maximization steps of the hard-EM algorithm for Document Clustering, show your work in your submitted PDF report. In particular, include all model parameters that should be learnt and the exact expression (using the same math convention that we saw in the Module 4) that should be used to update these parameters during the learning process (ie., E step, M step and assignments).

III Load Task2A.txt file (if needed, perform text preprocessing similar to what we did in Activity 4.2).

IV Implement the hard-EM (you derived above) and soft-EM (derived in Chapter 5 of Module 4). Please provide enough comments in your submitted code. Hint: If it helps, feel free to base your code on the provided code for EM algorithm for GMM in Activity 4.1).

V Set the number of clusters K=4, and run both the soft-EM and hard-EM algorithms on the provided data.

VI Perform a PCA on the clusterings that you get based on the hard-EM and soft-EM in the same way we did in Activity 4.2. Then, visualize the obtained clusters with different colors where x and y axes are the first two principal components (similar to Activity 4.2). Attach the plots to your PDF report and report how and why the hard and soft-EM are different,based on your plots in the report.

2 Perceptron vs. Neural Network [30 Marks]

In this part, you will be working on a binary classification task on a given synthetic dataset. You will use machine learning tools including a Perceptron and a 3-layer Neural Network to solve the task. Here, we are looking for your meaningful observations and discussions towards the differences between Perceptron and Neural Networks.

Question 2 [Neural Network’s Decision Boundary, 30 Marks]

I Load Task2B_train.csv and Task2B_test.csv datasets, plot the training and testing data separately in two plots. Mark the data with different labels in different colors. Attach the plot to your PDF report submission.

II Train two Perceptron models on the loaded training data by setting the learning rates η to 0.1 and 1.0 respectively. Calculate the test errors of two models and find the best η and its corresponding model, then plot the decision boundary and the test data in one plot; attach the plot to your PDF report. Hint: We expect the decision boundary of your perceptron to be a linear function that separates the testing data into two parts. You may also choose to change the labels from [0, 1] to [1, +1] for your convenience.

III For each combination of K (i.e, number of units in the hidden layer) in 5, 10,15, …, 40, (i.e. from 5 to 40 with a step size of 5), and η (i.e., learning rate) in 0.01, 0.001 run the 3-layer Neural Network and record testing error for each of them. Plot the effect of different K values on the accuracy of the testing data and attach it to your PDF report. Based on this plot, find the best combination of K and η and obtain your best model,then plot the decision boundary and the test data in one plot; attach the plot to your PDF report.

IV In your PDF report, explain the reason(s) responsible for such difference between Perceptron and a 3-layer Neural Network by comparing the plots you generated in Steps II and III. Hint: Look at the plots and think about the model assumptions.

……

1 文档聚类

II 加载必要的包。

III 加载 Task2A.txt 文件（如果需要，执行类似于我们在活动 4.2 中所做的文本预处理）。

IV 实施硬 EM（您在上面导出）和软 EM（在模块 4 的第 5 章中导出）。 请在您提交的代码中提供足够的注释。 提示：如果有帮助，请随意将您的代码基于练习 4.1 中为 GMM 提供的 EM 算法代码。

V 设置簇数 K=4，并在提供的数据上同时运行软 EM 和硬 EM 算法。

VI 按照我们在活动 4.2 中所做的相同方式，对基于硬 EM 和软 EM 获得的聚类执行 PCA。 然后，用不同颜色可视化获得的聚类，其中 x 和 y 轴是前两个主成分（类似于活动 4.2）。 将这些图附加到您的 PDF 报告中，并根据您在报告中的图报告硬 EM 和软 EM 不同的方式和原因。

2 感知器与神经网络 [30 分]

II 通过将学习率 η 分别设置为 0.1 和 1.0，在加载的训练数据上训练两个感知器模型。 计算两个模型的测试误差，找出最佳的η及其对应的模型，然后将决策边界和测试数据绘制在一张图中； 将绘图附加到您的 PDF 报告中。 提示：我们希望你的感知器的决策边界是一个将测试数据分成两部分的线性函数。 为了方便起见，您还可以选择将标签从 [0, 1] 更改为 [ −1, +1]。

III 对于 5, 10,15, …, 40（即从 5 到 40，步长为 5）和 η（即学习 rate) 在 0.01, 0.001 中运行 3 层神经网络并记录它们每个的测试错误。 绘制不同 K 值对测试数据准确性的影响，并将其附加到您的 PDF 报告中。 基于此图，找到 K 和 η 的最佳组合并获得最佳模型，然后将决策边界和测试数据绘制在一个图中； 将绘图附加到您的 PDF 报告中。

IV 在你的 PDF 报告中，通过比较你在步骤 II 和 III 中生成的图来解释造成感知器和 3 层神经网络之间这种差异的原因。 提示：查看图表并思考模型假设。

…….