机器学习代写 | COMS 4732: Computer Vision 2
本次美国代写主要为机器学习相关的assignment
Hidden Layer
 u = xW + c
 h = max (0; u)
 Output Layer
 z = hw + b
 Softmax Cross-Entropy Loss Layer
 Li = log
ezyi
 Pj
 ezj
 !
Note that yi is the true class label
 L = 1
 N
 X
 i
 Li
 | {z }
 data loss
 + 1
 2
 
 X
 k
 X
 l
 W2
 k;l
 | {z }
 regularization loss
 Backpropogation: Multi-layer Perceptron
 Loss Layer
 Denote the softmax probability for element zk as pk.
 pk = ezk
 Pj
 ezj
 Recall the Softmax Cross-Entropy Loss function.
 CEloss =
 C X
 j
 yj log (pj)
 Simplifying, we get:
 Li = log (pyi
 )
 and
 @Li
 @zk
 = pk 1(yi = k)
 where 1 is the Indicator function
 Output Layer
 First, note that:
 @z
 @h
 = wT
 Thus,
 @L
 @h
 = @L
 @z
 wT
 Hence, we can perform gradient descent using the following gradients: (Note: The w is
 obtained by taking the gradient of the regularization term within our Loss function,
 1
 2w2
 )
 @L
 @w = hT @L
 @z
 + w
 @L
 @b
 = @L
 @z
Hidden Layer
 Weight updates:
 @L
 @W = XT @L
 @u
 + W
 @L
 @c
 = @L
 @u
 But how do we obtain @h
 @u and by extension @L
 @u ???
 Derivative of a vector with respect to another vector: Using the
 Jacobian Matrix to compute @h
 @u
 But, what is a Jacobian?
 Let f : Rn ! Rm be a function that takes x 2 Rn as input and produces the vector f (x) 2 Rm
 as output. The Jacobian matrix of f is then de ned to be an m  n matrix, denoted by J,
 whose (i; j) the entry is Jij = @fi
 @xj
 :
 J =
  @f
 @x1
   
 @f
 @xn
 
 =
 2
 4
 rTf1
 .
 .
 .
 rTfm
 3
 5 =
 2
 6 4
 @f1
 @x1
   
 @f1
 @xn
 .
 .
 .
 . . .
 .
 .
 .
 @fm
 @x1
   
 @fm
 @xn
 3
 7 5
 where rTfi (now a row vector) is the transpose of the gradient of the i component .
 In our case, we have the RELU activation function that serves as function f.
 R2
 R2
 h = max (0; u)
 @h
 @u =
 ” @h1
 @u1
 @h1
 @u2
 @h2
 @u1
 @h2
 @u2
 #
 =
 ” @h1
 @u1
 0
 0 @h2
 @u2
 #
 We can write,
 @L
 @u
 =
 @h
 @u
 T
 @L
 @h
 However, since our activation function is only a function of each individual element, the
 partials with respect to the other dimensions is 0. Thus, the Jacobian is a diagonal matrix
 and hence, we can simplify this expression (making it easier to implement in our code) into an
 element-wise product as follows: