机器学习代写 | COMS 4732: Computer Vision 2

本次美国代写主要为机器学习相关的assignment

Hidden Layer
u = xW + c
h = max (0; u)
Output Layer
z = hw + b
Softmax Cross-Entropy Loss Layer
Li = log

ezyi
Pj
ezj
!

Note that yi is the true class label
L = 1
N
X
i
Li
| {z }
data loss
+ 1
2

X
k
X
l
W2
k;l
| {z }
regularization loss
Backpropogation: Multi-layer Perceptron
Loss Layer
Denote the softmax probability for element zk as pk.
pk = ezk
Pj
ezj
Recall the Softmax Cross-Entropy Loss function.
CEloss =
C X
j
yj log (pj)
Simplifying, we get:
Li = log (pyi
)
and
@Li
@zk
= pk 1(yi = k)
where 1 is the Indicator function
Output Layer
First, note that:
@z
@h
= wT
Thus,
@L
@h
= @L
@z
wT
Hence, we can perform gradient descent using the following gradients: (Note: The w is
obtained by taking the gradient of the regularization term within our Loss function,
1
2w2
)
@L
@w = hT @L
@z
+ w
@L
@b
= @L
@z

Hidden Layer
Weight updates:
@L
@W = XT @L
@u
+ W
@L
@c
= @L
@u
But how do we obtain @h
@u and by extension @L
@u ???
Derivative of a vector with respect to another vector: Using the
Jacobian Matrix to compute @h
@u
But, what is a Jacobian?
Let f : Rn ! Rm be a function that takes x 2 Rn as input and produces the vector f (x) 2 Rm
as output. The Jacobian matrix of f is then de ned to be an m  n matrix, denoted by J,
whose (i; j) the entry is Jij = @fi
@xj
:
J =
 @f
@x1
  
@f
@xn

=
2
4
rTf1
.
.
.
rTfm
3
5 =
2
6 4
@f1
@x1
  
@f1
@xn
.
.
.
. . .
.
.
.
@fm
@x1
  
@fm
@xn
3
7 5
where rTfi (now a row vector) is the transpose of the gradient of the i component .
In our case, we have the RELU activation function that serves as function f.
R2
R2
h = max (0; u)
@h
@u =
” @h1
@u1
@h1
@u2
@h2
@u1
@h2
@u2
#
=
” @h1
@u1
0
0 @h2
@u2
#
We can write,
@L
@u
=
@h
@u
T
@L
@h
However, since our activation function is only a function of each individual element, the
partials with respect to the other dimensions is 0. Thus, the Jacobian is a diagonal matrix
and hence, we can simplify this expression (making it easier to implement in our code) into an
element-wise product as follows: