Deep Learning Numericals and solutions PDF

Title	Deep Learning Numericals and solutions
Author	Saurabh Shastri
Course	Deep Learning
Institution	Birla Institute of Technology and Science, Pilani
Pages	24
File Size	482.6 KB
File Type	PDF
Total Downloads	495
Total Views	875

Preview

CLICK TO PREVIEW PDF

Summary

CS230: Deep LearningFall Quarter 2020Stanford UniversityMidterm Examination180 minutesProblem Full Points Your Score1 Multiple Choice 162 Short Answers 163 Convolutional Architectures 164 Movie Posters 21 + 3 (bonus)5 Backpropagation 286 Numpy Coding 14Total 111 + 3 (bonus)The exam contains 24 pages...

Description

CS230: Deep Learning Fall Quarter 2020 Stanford University Midterm Examination 180 minutes Problem

Full Points

1

Multiple Choice

16

2

Short Answers

16

3

Convolutional Architectures

16

4

Movie Posters

21 + 3 (bonus)

5

Backpropagation

28

6

Numpy Coding

14

Total

Your Score

111 + 3 (bonus)

The exam contains 24 pages including this cover page.

• If you wish to complete the midterm in LATEX, please download the project source’s ZIP ﬁle here. (The Stanford Box link, just in case you face issues with the hyperlink: https://stanford.box.com/s/gm5h2ovq5om637uwm0skov7p00ocwijp) • This exam is open book, but collaboration with anyone else, either in person or online, is strictly forbidden pursuant to The Stanford Honor Code. • In all cases, and especially if you’re stuck or unsure of your answers, explain your work, including showing your calculations and derivations! We’ll give partial credit for good explanations of what you were trying to do. Name:

SUNETID:

@stanford.edu

The Stanford University Honor Code: I attest that I have not given or received aid in this examination, and that I have done my share and taken an active part in seeing to it that others as well as myself uphold the spirit and letter of the Honor Code.

Signature: 1

CS230 Question 1 (Multiple Choice Questions, 16 points) For each of the following questions, circle the letter of your choice. Each question has AT LEAST one correct option unless explicitly mentioned. No explanation is required. (a) (2 points) You are training a large feedforward neural network (100 layers) on a binary classiﬁcation task, using a sigmoid activation in the ﬁnal layer, and a mixture of tanh and ReLU activations for all other layers. You notice your weights to your a subset of your layers stop updating after the ﬁrst epoch of training, even though your network has not yet converged. Deeper analysis reveals the gradients to these layers completely, or almost completely, go to zero very early on in training. Which of the following ﬁxes could help? (You also note that your loss is still within a reasonable order of magnitude). (i) Increase the size of your training set (ii) Switch the ReLU activations with leaky ReLUs everywhere (iii) Add Batch Normalization before every activation (iv) Increase the learning rate Solution: (ii), (iii). Classic vanishing gradient problem. Increasing size of the training set (i) doesn’t help as the issue lies with the learning dynamics of the network. Varying the learning rate (iv) might help the network learn faster, but as the problem states the gradients to speciﬁc layers almost completely go to zero, so the issue seems to be localized to speciﬁc layers. (ii) Solves the problem of dying relus by passing some gradient signal back through all relu layers. (iii) Adding BatchNorm prior to every activation ensures the tanh layers have inputs distributed closer to the linear region of the activation, so the elementwise derivative across the layer evaluates closer to 1. (b) (2 points)Which of the following would you consider to be valid activation functions (elementwise non-linearities) to train a neural net in practice? (i) f (x) = − min(2, x) (ii) f (x) = 0.9x + 1 ( min(x, .1x) (iii) f (x) = min(x, .1x) ( max(x, .1x) (iv) f (x) = min(x, .1x)

| x >= 0 |x= 0 |x= 0 0 |x | − 10| → T [i] = 12 Example 2 Let’s say the 5 most positive values are [10, 10, 10, 10, 10]. Average = 10 And the 5 most negative values are [−10, −10, −10, −10, −20]. Average = −12 |10| < | − 12| → T [i] = −12

21

CS230

4. Return the table T . import numpy as np ## params: frames: np.array (F, 1280, 720, 3) ## returns: T: np.array (F - 1, 1) def video_flashinator(frames):

22

CS230

Solution: import numpy as np ## params: frames: np.array (F, 1280, 720, 3) ## returns: T: np.array (F - 1, 1) def video_flashinator(frames): avg_lum_frames = 4 * (0.2 * frames.mean(axis=3) + 1)**2.2 change_lum = np.delete(avg_lum_frames, 0, 0) - np.delete(avg_lum_frames, -1, 0) pos_lum = change_lum.copy().reshape(change_lum.shape[0], -1) pos_lum[pos_lum < 0] = 0 pos_lum.sort(axis=1) pos_lum = np.flip(pos_lum, axis=1) neg_lum = -change_lum.copy().reshape(change_lum.shape[0], -1) neg_lum[neg_lum < 0] = 0 neg_lum.sort(axis=1) neg_lum = np.flip(neg_lum, axis=1) p_avgL = pos_lum[:,:10].mean(axis=1) n_avgL = neg_lum[:,:10].mean(axis=1) T = p_avgL - n_avgL T[T > 0] = p_avgL[T > 0] T[T...