这是一篇美国大学的Homework 3 作业案例，主要内容为代码深度学习的python代写
0.Warm-up (40 points)
Answer the following in your own words (read and understand the concepts, do not copy answers from the internet):
(a.) Explain why the depth-wise separable convolutions can achieve reasonable results while using fewer parameters compared to traditional convolutions. Also compare the representation power of a depth-wise separable convolution to that of a standard convolution if both had the same number of parameters.
(b.) Comment on the quantization bit-widths (for a deep learning model) required to achieve good performance for speech/audio inputs and compare it to that for image inputs. Which type of input requires higher number of bits to achieve reasonable performance, and why?
(c.) From an optimization perspective, why is it challenging to train a deep learning model with quantized weights and activations?
(d.) What is the impact of low-rank projections on the generalization ability of a deep learning model? Explain your reasoning.
(e.) What is the motive to deploy deep learning models on embedded/IoT devices? List the problems associated with sending the data from the IoT device to the cloud through wifi and performing the inference on the cloud
(f.) In one or more sentences, and using sketches as appropriate, explain: SqueezeNet and MobileNet. What are the defining characteristic of these architectures? How do they improve upon their predecessors in terms of latency and memory consumption?
(f.) Compare and contrast the different pruning techniques discussed in the lecture. Which technique would produce the most optimal results? Which technique is the most practical? Explain your reasoning.
(g.) Explain how Scree plots are useful in resource allocation and determining each layer’s target rank.
(h.) Compare and contrast the different quantization techniques discussed in the lectures, and comment on the strengths and limitations of each technique. import torch from eep590_utils import *(a.)The depth-wise separable convolutions’ kernel is responsible for only one channel of the input.
Then it uses pointwise convolution to extend the feature map.
The convolution kernel of pointwise convolution is 1*1 which has a few parameters.
Therefore, depth-wise separable convolutionsn can use fewer parameters to achieve comparable performance to traditional convolution.
At the same time, with the same number of parameters, the depth-wise separable convolutions can generate a deeper network with more powerful representation capabilities.
(b.)This depends on the sample dimensions in both tasks.
In general, a sample of a speech model can be a one-hot vector of ten thousands dimensions.
The input of the image model is a multidimensional image, and the pixel values can be continuous.
In order to convert it into the quantization bit-widths, each pixel needs to be converted.
I think image inputs require higher number of bits to achieve reasonable performance.
(c.)Quantifying the weights and activation layers of the model requires the round function.
The gradient of the round function is 0, and there is no way to use the backpropagation method to update the weights during training.
(d.)Low-rank projections can effectively improve the generalization ability of deep learning models.
Because low-rank projections can remove unnecessary details from the model.
Then the model not only performs well on the training set, but also perfors well on the test set.
In other words, low-rank projections can effectively solve the overfitting problem of the model to improve its generalization ability.
(e.)There are several advantages to doing deep learning on end devices such as embedded devices.
Firstly, computing can be done directly on embedded devices without a network.
Secondly, computing directly on embedded devices avoids latency problems.
In this case, we don’t need to worry about the delay caused by network transmission.
Thirdly, data storage in terminals solves a large part of the privacy problem.
Typical applications of ioT include smart homes, energy, urban transportation, healthcare, industrial manufacturing, and more.
Endpoint security risks and network transmission issues are more related issues.
(f.)From the perspective of network structure optimization, SqueezeNet uses the following three strategies to reduce network parameters and improve network performance.
- SqueezeNet layer: First use 1×1 convolution for dimensionality reduction, the size of the feature map is unchanged, where S1 is less than M, achieving the purpose of compression.
(S1 can control the number of channels).
- Expand layer: Use 1×1 convolution and 3×3 convolution in parallel to obtain feature maps of different sensory fields, somewhat similar to the Inception module, to achieve the purpose of expansion.
- Concat Layer: Channel stitches the two resulting feature maps as the final output.
Based on the above 3 points, SqueezeNet proposed the basic module of Fire Module.
The network structure of SqueezeNet is shown in the figure:
本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！
E-mail: firstname.lastname@example.org 微信:itcsdx