# Python代写 | Homework Assignment 2

Homework Assignment 2
Part 1: Deep Learning 1 & 2
1. Prove that the softmax and sigmoid functions are equivalent when the number of possible
labels is two (for binary classification problem).
2. According to what you have learned about BP algorithm including feed forward
propagation and backward propagation processes of Neural Network, answer Question 1
in the slide below to prove whether the 6 equations hold or do not hold:
3. Answer Question 2 in the slide “Homework” above to “implement an NN …”. Using the
3-layer NN codes I gave you as a reference example
Requirements: Implement the NN algorithm by using Python (submit the codes and
make sure that your codes can be tested by our TAs or instructors through a simple
“click”. You can submit a readme file if needed). Note: that the architecture (2 – 3 – 2)
used in the given code is different from the one (2 – 2 – 2) in this Question 2 of
Homework Assignment 2. Also, note that the activation functions are different as well.
4. Given the network architecture as shown in the slide “Homework” above in Question 2
(assumed that the loss function is: 𝐿(𝑦, ŷ) =
1
2
∑ (𝑦 − ŷ)
𝑁
𝑛=1
2
, learning rate, α =0.5,
answer the questions below (provide the details step by step):
a. According to the forward propagation process, calculate the output: ŷ = ?
b. Calculate the loss (errors): L (𝑦, ŷ)= ?
c. According to the BP algorithm: calculate the updated values of the weights W1
after 1 iteration.
5. Given the same dataset as Qn5 in HA1:
• “on the street” 50 times;
• “on the table” 1000 times;
• “on the computer” 2000 times;
• “standing on the street” 40 times;
• “standing on the computer” 1 time;
• “standing on the table” 100 times;
• “on the sky” 5 times;
• “on the water” 10 times;
• “on the cloud” 2 times;
• “it is magic that the guy smiles standing on the sky” 1 time;
• “it is magic that the guy smiles standing on the water” 2 times;
• “it is magic that the guy smiles standing on the cloud” 3 times;
a. Leverage RNN to perform the same task (build the language model) to guess the
next word: I notice three guys standing on the ( ).
b. Compare this with the results obtained by probabilistic language model in HA1.
c. Leverage the feedforward neural network for the same task, compare the result
with that of RNN.
d. If we use the corpus plus (shown in “HA1-Question 5 corpus plus.xlsx”), compare
the results obtained by language model, feedforward neural network, and RNN.
Explain what you find.
6. According to what you have learned about RNN algorithm, do the homework in the slide
below. Compare the results obtained by a feedforward neural network and RNN for this
task. Explain what you find. (You can leverage a large dataset, such as “shakespeare.txt”
to further explain your findings or explain why RNN is better (or worse) than feedforward
NN for sequence analysis.)
7. (Optional question) For a three-layer feedforward neural network, how do you
determine the number of nodes/neurons in the hidden layer? For the given dataset below,
how many neurons in the hidden layer is the best choice? (Hint: general, the following 3
equations can be used:
∑𝐶𝑛ℎ
𝑖
𝑛
𝑖=0
> 𝑘 (1)
where k is the number of samples, 𝑛ℎ is number of nodes/neurons in the hidden layer; 𝑛 is
the number of inputs; 𝑖 ∈ [0, 𝑛].
𝑛ℎ = √𝑛 + 𝑚 + 𝑎 (2)
where 𝑛ℎ is number of nodes/neurons in the hidden layer; 𝑛 is the number of input units,
𝑚 is the number of output units; 𝑎 ∈ [1, 10].
𝑛ℎ = log2 𝑛 (3)
where 𝑛ℎ is number of nodes/neurons in the hidden layer; 𝑛 is the number of input units.
Table 1: Training dataset:
No. x1 x2 x3 x4 x5 x6 x7 x8 y
1 -1.7817 -0.2786 -0.2954 -0.2394 -0.1842 -0.1572 -0.1584 -0.1998 1
2 -1.8710 -0.2957 -0.3494 -0.2904 -0.1460 -0.1387 -0.1492 -0.2228 1
3 -1.8347 -0.2817 -0.3566 -0.3476 -0.1820 -0.1435 -0.1778 -0.1849 1
4 -1.8807 -0.2467 -0.2316 -0.2419 -0.1938 -0.2103 -0.2010 -0.2533 1
5 -1.4151 -0.2282 -0.2124 -0.2147 -0.1271 -0.0680 -0.0872 -0.1684 2
6 -1.2879 -0.2252 -0.2012 -0.1298 -0.0245 -0.0390 -0.0762 -0.1672 2
7 -1.5239 -0.1979 -0.1094 -0.1402 -0.0094 -0.1394 -0.1673 -0.2810 2
8 -1.6781 -0.2047 -0.1180 -0.1532 -0.1732 -0.1716 -0.1851 -0.2006 2
9 0.1605 -0.0920 -0.0160 0.1246 0.1802 0.2087 0.2234 0.1003 3
10 0.2045 0.1078 0.2246 0.203I 0.2428 0.2050 0.0704 0.0403 3
11 -1.0242 -0.1461 -0.1018 -0.0778 -0.0363 -0.0476 -0.0160 -0.0253 3
12 -0.7915 -0.1018 -0.0737 -0.0945 -0.0955 0.0044 0.0467 0.0719 3
Table 2: Test dataset:
13 -1.4736 -0.2845 -3.0724 -0.2108 -0. 190 4 -0.1467 -0.1696 -0.2001 1
14 -1.6002 -0.2011 -0.1021 -0.1394 -0.1001 -0.1572 -0.1584 -0.2790 2
15 -1.0314 -0.1521 -0.1101 -0.0801 -0.0347 -0.0482 -0.0158 -0.0301 3
Part 2: Sequence Labelling and POS
Application of POS Tagging to Sentiment Analysis
POS tagging is a necessary step for performing sentiment analysis, as the part of speech has
a great impact on a word’s sentiment polarity. Design an algorithm in which different parts of
speech are assigned different sentiment weights. For example, we assume that adjectives
convey the stronger sentiment information than verbs and nouns. So we assign larger sentiment
weights to the adjectives. Verbs and nouns may also convey sentiment information from time
to time. For example, the verb love and the noun congratulations are often associated with
positive sentiment. However, to express the sentiment, we believe adjectives play a much more
dominant role than verbs and nouns. Therefore, we will assign smaller sentiment weights to
verbs and nouns than adjectives. Similarly, we should assign smaller or zero sentiment weights
to determiner and preposition, ….
A small dataset “amazon_cells_labelled.csv” can be used as a case study for your homework.
1. Based on what we learnt from our class and what is described above, improve the Naïve
Bayes method for sentiment classification and implement it (Python codes are
preferred)—(application of POS Tagging to improve Naïve Bayes method for sentiment
classificationn).
2. Application of POS Tagging for knowledge/lexicon extracting: design a new method or
re-implement the method described in the paper: “Lexicon Knowledge Extraction
with Sentiment Polarity Computation (Wang Z, Tong VJ, Ruan P, Li F. Lexicon knowledge
extraction with sentiment polarity computation. In 2016 IEEE 16th International Conference on Data
Mining Workshops (ICDMW) 2016 Dec 12 (pp. 978-983), Barcelona, Spain.)” (Python codes and
english language only)
3. Explain your findings. For example:
a. The performance of Basic Naïve Bayes (e.g. Accuracy: x0%)
b. The performance of improved Naïve Bayes with POS tag as feature extracted (e.g.
Accuracy: x1%)
c. The performance of improved method with sentiment polarity as feature extracted E-mail: [email protected]  微信:itcsdx 