Homework Assignment 2
Part 1: Deep Learning 1 & 2
1. Prove that the softmax and sigmoid functions are equivalent when the number of possible
labels is two (for binary classification problem).
2. According to what you have learned about BP algorithm including feed forward
propagation and backward propagation processes of Neural Network, answer Question 1
in the slide below to prove whether the 6 equations hold or do not hold:
3. Answer Question 2 in the slide “Homework” above to “implement an NN …”. Using the
3-layer NN codes I gave you as a reference example (Python codes are given:
“A_3_layer_NN_3.py”. The input and the activation function in the codes may be
different from that of the example in the lecture slides).
Requirements: Implement the NN algorithm by using Python (submit the codes and
make sure that your codes can be tested by our TAs or instructors through a simple
“click”. You can submit a readme file if needed). Note: that the architecture (2 – 3 – 2)
used in the given code is different from the one (2 – 2 – 2) in this Question 2 of
Homework Assignment 2. Also, note that the activation functions are different as well.
4. Given the network architecture as shown in the slide “Homework” above in Question 2
(assumed that the loss function is: 𝐿(𝑦, ŷ) =
∑ (𝑦 − ŷ)
, learning rate, α =0.5,
answer the questions below (provide the details step by step):
a. According to the forward propagation process, calculate the output: ŷ = ?
b. Calculate the loss (errors): L (𝑦, ŷ)= ?
c. According to the BP algorithm: calculate the updated values of the weights W1
after 1 iteration.
5. Given the same dataset as Qn5 in HA1:
• “on the street” 50 times;
• “on the table” 1000 times;
• “on the computer” 2000 times;
• “standing on the street” 40 times;
• “standing on the computer” 1 time;
• “standing on the table” 100 times;
• “on the sky” 5 times;
• “on the water” 10 times;
• “on the cloud” 2 times;
• “it is magic that the guy smiles standing on the sky” 1 time;
• “it is magic that the guy smiles standing on the water” 2 times;
• “it is magic that the guy smiles standing on the cloud” 3 times;
a. Leverage RNN to perform the same task (build the language model) to guess the
next word: I notice three guys standing on the ( ).
b. Compare this with the results obtained by probabilistic language model in HA1.
c. Leverage the feedforward neural network for the same task, compare the result
with that of RNN.
d. If we use the corpus plus (shown in “HA1-Question 5 corpus plus.xlsx”), compare
the results obtained by language model, feedforward neural network, and RNN.
Explain what you find.
6. According to what you have learned about RNN algorithm, do the homework in the slide
below. Compare the results obtained by a feedforward neural network and RNN for this
task. Explain what you find. (You can leverage a large dataset, such as “shakespeare.txt”
to further explain your findings or explain why RNN is better (or worse) than feedforward
NN for sequence analysis.)
7. (Optional question) For a three-layer feedforward neural network, how do you
determine the number of nodes/neurons in the hidden layer? For the given dataset below,
how many neurons in the hidden layer is the best choice? (Hint: general, the following 3
equations can be used:
> 𝑘 (1)
where k is the number of samples, 𝑛ℎ is number of nodes/neurons in the hidden layer; 𝑛 is
the number of inputs; 𝑖 ∈ [0, 𝑛].
𝑛ℎ = √𝑛 + 𝑚 + 𝑎 (2)
where 𝑛ℎ is number of nodes/neurons in the hidden layer; 𝑛 is the number of input units,
𝑚 is the number of output units; 𝑎 ∈ [1, 10].
𝑛ℎ = log2 𝑛 (3)
where 𝑛ℎ is number of nodes/neurons in the hidden layer; 𝑛 is the number of input units.
Table 1: Training dataset:
No. x1 x2 x3 x4 x5 x6 x7 x8 y
1 -1.7817 -0.2786 -0.2954 -0.2394 -0.1842 -0.1572 -0.1584 -0.1998 1
2 -1.8710 -0.2957 -0.3494 -0.2904 -0.1460 -0.1387 -0.1492 -0.2228 1
3 -1.8347 -0.2817 -0.3566 -0.3476 -0.1820 -0.1435 -0.1778 -0.1849 1
4 -1.8807 -0.2467 -0.2316 -0.2419 -0.1938 -0.2103 -0.2010 -0.2533 1
5 -1.4151 -0.2282 -0.2124 -0.2147 -0.1271 -0.0680 -0.0872 -0.1684 2
6 -1.2879 -0.2252 -0.2012 -0.1298 -0.0245 -0.0390 -0.0762 -0.1672 2
7 -1.5239 -0.1979 -0.1094 -0.1402 -0.0094 -0.1394 -0.1673 -0.2810 2
8 -1.6781 -0.2047 -0.1180 -0.1532 -0.1732 -0.1716 -0.1851 -0.2006 2
9 0.1605 -0.0920 -0.0160 0.1246 0.1802 0.2087 0.2234 0.1003 3
10 0.2045 0.1078 0.2246 0.203I 0.2428 0.2050 0.0704 0.0403 3
11 -1.0242 -0.1461 -0.1018 -0.0778 -0.0363 -0.0476 -0.0160 -0.0253 3
12 -0.7915 -0.1018 -0.0737 -0.0945 -0.0955 0.0044 0.0467 0.0719 3
Table 2: Test dataset:
13 -1.4736 -0.2845 -3.0724 -0.2108 -0. 190 4 -0.1467 -0.1696 -0.2001 1
14 -1.6002 -0.2011 -0.1021 -0.1394 -0.1001 -0.1572 -0.1584 -0.2790 2
15 -1.0314 -0.1521 -0.1101 -0.0801 -0.0347 -0.0482 -0.0158 -0.0301 3
Part 2: Sequence Labelling and POS
Application of POS Tagging to Sentiment Analysis
POS tagging is a necessary step for performing sentiment analysis, as the part of speech has
a great impact on a word’s sentiment polarity. Design an algorithm in which different parts of
speech are assigned different sentiment weights. For example, we assume that adjectives
convey the stronger sentiment information than verbs and nouns. So we assign larger sentiment
weights to the adjectives. Verbs and nouns may also convey sentiment information from time
to time. For example, the verb love and the noun congratulations are often associated with
positive sentiment. However, to express the sentiment, we believe adjectives play a much more
dominant role than verbs and nouns. Therefore, we will assign smaller sentiment weights to
verbs and nouns than adjectives. Similarly, we should assign smaller or zero sentiment weights
to determiner and preposition, ….
A small dataset “amazon_cells_labelled.csv” can be used as a case study for your homework.
1. Based on what we learnt from our class and what is described above, improve the Naïve
Bayes method for sentiment classification and implement it (Python codes are
preferred)—(application of POS Tagging to improve Naïve Bayes method for sentiment
2. Application of POS Tagging for knowledge/lexicon extracting: design a new method or
re-implement the method described in the paper: “Lexicon Knowledge Extraction
with Sentiment Polarity Computation (Wang Z, Tong VJ, Ruan P, Li F. Lexicon knowledge
extraction with sentiment polarity computation. In 2016 IEEE 16th International Conference on Data
Mining Workshops (ICDMW) 2016 Dec 12 (pp. 978-983), Barcelona, Spain.)” (Python codes and
english language only)
3. Explain your findings. For example:
a. The performance of Basic Naïve Bayes (e.g. Accuracy: x0%)
b. The performance of improved Naïve Bayes with POS tag as feature extracted (e.g.
c. The performance of improved method with sentiment polarity as feature extracted
(e.g. Accuracy: x2%)
Hints: Pseudocodes of the method described in the paper is provided here (also in the paper: “Lexicon
Knowledge Extraction with Sentiment Polarity Computation” – you can search and download it).
The pseudo-code in the following box summarizes the key steps to extract the specific lexicons from a given
dataset. Chinese language is selected as a case study in the paper. English language is required for your homework
For each data in the selected text dataset:
Preprocess the data to remove noise
Reduce the word to its original form
Segment the word if necessary (no need for English language)
Tag the Part of Speech
Language Detection, if Chinese or other Asian Language, perform Segmentation
For each word w that occurs in the text:
Count the number of positive text data that contain w
Count the number of negative text data that contain w
Determine the sentiment weight (sw) based on w’s POS
Calculate the w’s polarity score ps according to formula (1) in the paper mentioned above
Insert the word into the word list l with its polarity value
Sort the word list l based on the word’s polarity value
Select the top positive and negative set of words in the list l
本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！
E-mail: [email protected] 微信:itcsdx