CS165B: Intro. to Machine Learning
(either 0 or 1), and 8. Chance of Admit (ranging from 0 to 1). It can be expected that
roughly linear relations exist among factors (1) to (7) and (8). Furthermore, often times,
say, GRE scores and GPA show a high degree of correlation. Hence, it is reasonable to
attempt a multi-variate linear regression with regularization for this problem. You can
use Panda library to query dataset information.
The programming part of the assignment is then for you to implement an iterative stochastic gradient descent (SGD) linear regressor with regularization. You must use Python for
this assignment (so the reader can grade your assignment automatically using Gradescope), and you must implement the iterative solver yourself (not calling an existing package). Your solver must support 2D grid search with one dimension being the learning rate
α and the other dimension being the regularization weight λ. The skeleton class (SGDSolver) will be provided for you (lambda will be renamed lam as lambda is a keyword in
c l a s s SGDSolver ( ) :
def i n i t ( s e l f , path ) :
def t r a i n i n g ( s e l f , alpha , lam , nepoch , e p s i l o n ) :
def t e s t i n g ( s e l f , te s tX ) :
Critical variables and parameters are explained below:
• self.x: a numpy array of size n × k, with n being the number of training samples,
and k being the number of features per training sample. In our case k = 7.
• self.y: a numpy array of size n×1 of the ground-truth regression values, with n being
the number of training samples. self.x and self.y come from the path parameter in
the constructor that is the name of the numpy file with the training data.
• alpha: range of the learning rates in the grid search (a list of two floats [αmin, αmax])
• lam: range of the regularization weights in the grid search (a list of two floats
• nepoch: maximum number of training epochs (an integer). Each epoch comprising
going through all training samples in a random order once. For SGD, the batchsize
parameter (how many samples are used together in training) is set to 1, or training
samples are used individually.
• epsilon: error bound to stop iteration prematurely if loss emse (see below) is less
than the given bound (a float).
Note that in your implementation, in addition to weight w (one for each input features)
you should also include bias (b) as your class variable. However, regularization applies only
to w not to b. The loss (error) in regression is defined as the mean squared error (MSE)
between the ground truth values and your regression values: Denote the ith component
of the y vector as y
, then emse =
GT − y
In this project, you will only receive 90% of the original dataset for training. 10% data
will be held back for Gradescope use to evaluate your performance. Your training and
validation codes must rely entirely on the given data (90% of the original data) to train
your model. How you split the given data for training and validation, e.g., using n-fold
validation, is up to you.
Your SGDSolver with be tested by three calls in succession: (1) initialization, (2) training,
and (3) testing.
• During the initialization, you will be given the file name containing the training
data. You are to populate self.x and self.y from the input file.
• During the training phase, your function training will be called with alpha, lam, nepoch
and epsilon given. If you need any other parameters for your code to work, it is
entirely up to you to define these parameters and provide some reasonable default
values for them. You can define as many auxiliary functions inside SGDSolver as you
see fit. The training phase must start from scratch with a randomized initial guess
to w and b. The function should store both weight and bias within the SGDSolver
class after the training.
本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！
E-mail: [email protected] 微信:itcsdx