CSC 311 – Introduction to Machine Learning, Fall 2020
1. (15 points total) This simple warm-up question illustrates Numpy’s facilities for indexing and computing with arrays without using loops. In each question below, you
should use at most one assignment statement and one print statement (in addition
to printing the question number). In questions (h) to (o) you should use one print
statement and no assignment statements. Each question below has a simple solution.
Do not use any loops. (1 point each.)
Your code should look like this:
import numpy as np
import numpy.random as rnd
B = …
y = …
C = …
The statement rnd.seed(3) above initializes the seed of the random number generator
to 3. This ensures that everyone will get exactly the same “random” vectors and
matrices and exactly the same answers to all the questions below. For this to work as
intended, you should execute all your code for the questions below at once, in order,
and you should only execute the rand function twice, to answer parts (a) and (b). Any
additional executions of the rand function, between questions, will change the random
matrices that are returned, which will change all your answers.
If you have done everything correctly, your anwers to parts (a) and (b) below should
be exactly the following:
[[0.5507979 0.70814782 0.29090474 0.51082761 0.89294695]
[0.89629309 0.12558531 0.20724288 0.0514672 0.44080984]
[0.02987621 0.45683322 0.64914405 0.27848728 0.6762549 ]
[0.59086282 0.02398188 0.55885409 0.25925245 0.4151012 ]]
(a) Construct a random 4 × 5 matrix. Call it B. The 4 rows are numbered 0,1,2,3,
and the 5 columns are numbered 0,1,2,3,4.
(b) Construct a random 4-dimensional column vector (that is, a 4 × 1 matrix). Call
(c) Reshape B into a 2 × 10 matrix. Call the result C. B itself does not change. (The
first row of C should consist of the first row of B followed by the second row of B.)
(d) Subtract vector y from all the columns of matrix B. Call the resulting matrix D.
D has the same dimensions as B.
(e) Reshape y so that it is a 4-dimensional vector instead of a 4 × 1 matrix. That is,
change its shape from (4,1) to (4). Call the resulting vector z. y itself does not
change. (Note that z is neither a column vector nor a row vector. We say it has
rank 1, since it has 1 dimension; while y and B have rank 2, since they each have
(f) Change column 3 of matrix B to have the same value as vector z. (Note that
column 3 is the 4th column, since column 0 is the first.)
(g) Add vector z to column 2 of matrix B and assign the result to column 0 of matrix
D. Only matrix D changes.
(h) Print the first three rows of matrix B as a single matrix.
(i) Print columns 1 and 3 of matrix B as a single matrix.
(j) Compute the natural logarithm of each element in matrix B. The result is a matrix
with the same dimensions as B.
(k) Compute the sum of all the elements in matrix B. The result is a single real
(l) Compute the maximum of each column of matrix B. The result is a 5-dimensional
(m) Sum the elements in each row of matrix B and print the maximum sum. The
result is single real number.
(n) Using matrix multiplication, compute B
T D, where B
is the transpose of matrix
B. The result is a 5 × 5 matrix.
(o) Compute y
T DDT y. The result is a 1 × 1 matrix (which contains a single real
You should use the functions reshape, sum, max and matmul in numpy, as well as the
function random in numpy.random. The expression B.T computes the transpose of
matrix B (as does the Numpy function transpose). You may also find the Numpy
function shape useful. You will have to look these functions up in the Numpy manual
(simply google them) and read their specifications carefully.
2. (15 points) This simple warm-up question is meant to illustrate the vast difference
in execution speed between iteration in Python (which is slow) and vectorized code
(which is fast), using matrix operations in Numpy.
(a) Write a Python function matrix poly(A) that computes A+A2+A3
, a simple polynomial of the square matrix A. Here, A
3=A*A*A and A
2=A*A, where * denotes matrix
multiplication. Recall that matrix multiplication is defined as follows:
where C and D are matrices and E = CD is the matrix product of C and D.
However, do not implement your function by naively evaluating the polynomial
as written. Instead, implement it as A+A*(A+A*A), which is faster since it does
two matrix multiplications instead of three. Your program will need to use a
triply-nested Python loop.
Your program should not use any NumPy operations that operate on whole matrices or large chunks of matrices. These include matrix addition and multiplication.
Nor should you use fancy array indexing to operate on multiple array elements at
a time, as in Question 1. Instead, you should use loops to operate on matrices one
element at a time, using assignment statements such as C[i,j]=D[i,j]*E[i,j].
You may define subroutines that take matrices as input and return matrices as
output. You may also use the NumPy operations shape (to determine the dimensions of the input matrix) and zeros (to initialize your computations). You
should not use any NumPy operations other than these.
(b) Write a Python function timing(N) to measure execution speed. Specifically, the
function should do the following:
• Use the function random in numpy.random to create a random N × N matrix,
• Execute your function matrix poly with matrix A as its argument. Call the
result B1. Use the function time.time to measure the execution time of this
step. Print out the execution time.
• Use the functions numpy.matmul and + to compute A+A*(A+A*A). Call the
result B2. Do not use any loops. This is vectorized code. Use the function
time.time to measure the execution time of this step. Print out the execution
• Compute and print out the magnitude of the difference matrix, B1-B2. There
are many ways to define the magnitude of a matrix, but for the purpose of this
question, we define it to be the square root of the sum of the squared values
of the matrix elements. That is, the magnitude of a matrix, A, is qP
Do not use iteration to compute this magnitude. Instead, use only NumPy
operations. You can do this in one line of vectorized code.
If your function matrix poly is working correctly, the last step should produce
a very small number (much less than 10−5
), which is due to numerical error.
You should also find that the vectorized code is much faster than using your
matrix poly function. In fact, when N is large, it can be thousands of times
(c) Execute timing(100), timing(300) and timing(1000), and hand in the printed
results. Be sure it is clear which measurement each printed value refers to. In
each case, how many floating-point multiplications does matrix poly perform?1
You should observe that the execution time of timing(N) increases rapidly with
N. This is because matrix multiplication is an O(N3
) operation, so increasing N
by a factor of 10 will increase execution time by a factor of 1000. Depending
on your computer, timing(1000) could take 15-30 minutes to compute. If your
computer is very slow, you may want to let it run over night.
Because loops and iteration in Python are so slow, your programs in the rest of this
assignment should avoid using them to operate on large vectors and matrices. Instead,
you should vectorize your code and use NumPy operations whenever possible to speed
3. Linear Least-Squares Regression. (?? points)
In this question, you will write a Python program to fit a simple linear function to
data using least-squares regression.
As described in class, the data for linear regression consists of a set of pairs, (x
), where each x
is an input and each t
is a target value. Each pair
) is called a data point. In general, x
can be a vector, but in this question, it
will simply be a real number. The function you will fit to the data takes a real-number,
x, as input and returns a real number, y(x), as output. It has the form
y(x) = ax + b (2)
Your job is to find values for a and b so that the function y(x) best fits the data. In
particular, you will minimize the loss function,
l(a, b) = X
(n) − y(x
1A floating-point multiplication is a single multiplication of two floating-point (i.e., real) numbers.
where the sum is over all training points, (x
). Recall from Lecture 2 that the
values of a and b that minimize this loss are given by the following equation:
w = (X
where w = (b, a) is the weight vector, t is a column vector of the target values, and
X is the data matrix. In this case, because the input, x, is a single real number, not
a vector, X has only two columns: the first column is all 1’s, and the second column
consists of the input values x
In addition to fitting a linear function to data, your program should also compute the
mean squared training and test errors of the fitted function. These are given by the
(n) − y(x
(n) − y(x
where the two sums are over the training data and test data, respectively, and Ntrain
and Ntest are the number of training and test points, respectively.
The data you will use is in the file dataA1Q3.pickle.zip on the course web site.
Download and uncompress this file. (Your browser may uncompress it automatically.)
The file contains training and test data. Next, start the Python interpreter and import
the pickle module. You can then read the file with the following Python command:
with open(’dataA1Q3.pickle’,’rb’) as f:
dataTrain,dataTest = pickle.load(f)
The variable dataTrain will now contain the training data, and dataTest will contain
the test data. Specifically, dataTrain is a 2 × 30 Numpy array, where the first row
gives the input values, and the second row gives the target values. Likewise, dataTest
is an array containing 1000 test points. The training data is illustrated in the scatter
plot in Figure 1.
In answering the questions below, do not use any Python loops. Instead, all code
should be vectorized.
(a) Write a Python function least squares(x,t) that returns the optimal values of
a and b. Here, x is a vector of input values, and t is a vector of target values.
They are the training data. Your program should construct the data matrix X
and use equation (4) to solve for a and b. You may find the function inv in
The entire function can be written in at most 7 lines of highly-readible code, not
counting comment lines.
(b) Write a function plot data(x,t) that takes training data as input and plots the
data and also plots the line that best fits the data in the least-squares sense. Here
x and t define the training data, and are vectors of inputs and target values, respectively. You should call the function least squares from part (a) to compute
the values of a and b for the fitted line. Plot the training data as blue dots (as
in Figure 1), and plot the fitted line in red. To plot the fitted line, draw a line
segment between two points on the line. This line segment should extend from
the smallest value of x in the training set to the largest. You will need to compute
the value of y at these two endpoints. Use the functions scatter and plot in
matplotlib.pyplot to plot the training points and draw the fitted line segment,
respectively. Title the figure, Question 3(b): the fitted line. Finally, your function
should return the values of a and b.
The entire function can be written in at most 11 lines of highly-readible code, not
counting comment lines.
(c) Write a function error(a,b,X,T) that measures how well a line fits a data set.
The data is defined by the vectors X and T, and the line is defined by the real numbers a and b. That is, X is a vector of input values, T is a vector of corresponding
target values, and the line is given by the equation y = ax + b. The function
should return the mean squared error of the line with the data. In particular, you
should be able to use this function to compute the training and test errors of your
fitted function. You may find the function numpy.mean useful.
This function can be written in at most three lines of highly-readible code.
(d) Write and execute a simple Python script to test your functions above. The script
should do the following:
• Read the training and test data from the file dataA1Q3.pickle.
• Call the function plot data to fit a line to the training data and plot the
• Print the values of a and b for the fitted line.
• Compute and print the training error.
• Compute and print the test error.
If you have done everything correctly, the training and test errors should both be
between 0.8 and 1.0, and the test error should be greater than the training error.
Hand in the plot and the printed values.
本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！
E-mail: [email protected] 微信:itcsdx