Machine Learning for Graphs and Sequential Data Retake Exam Solutions PDF

Title	Machine Learning for Graphs and Sequential Data Retake Exam Solutions
Course	Machine Learning for Graphs and Sequential Data
Institution	Technische Universität München
Pages	18
File Size	531.5 KB
File Type	PDF
Total Downloads	144
Total Views	769

Preview

CLICK TO PREVIEW PDF

Summary

Sample SolutionCorrection NotesData Analytics and Machine Learning GroupDepartment of InformaticsTechnical University of MunichEcorrectionPlace student sticker hereNote: During the attendance check a sticker containing a unique code will be put on this exam. This code contains a unique number that a...

Description

Data Analytics and Machine Learning Group Department of Informatics Technical University of Munich

Note:

Ecorrection

• During the attendance check a sticker containing a unique code will be put on this exam. • This code contains a unique number that associates this exam with your registration number. • This number is printed both next to the code and to the signature field in the attendance check list.

Place student sticker here

Wednesday 7th October, 2020 14:15 – 15:30

Date: Time:

tio

IN2323 / Retake Prof. Dr. Stephan Günnemann

lu

Exam: Examiner:

n

Machine Learning for Graphs and Sequential Data

Working instructions

So

• This exam consists of 18 pages with a total of 15 problems. Please make sure now that you received a complete copy of the exam. • The total amount of achievable credits in this exam is 60 credits.

• Allowed resources:

ot es

• Detaching pages from the exam is prohibited.

– all materials that you will use on your own (lecture slides, calculator etc.)

e

– not allowed are any forms of collaboration between examinees and plagiarism

Sa m C p or r l

N

• You have to sign the code of conduct.

• Make sure that the QR codes are visible on every uploaded page. Otherwise, we cannot grade your exam.

ec tio

n

• Only write on the provided sheets, submitting your own additional sheets is not possible. • Last two pages can be used as scratch paper.

• All sheets (including scratch paper) have to be submitted to the upload queue. Missing pages will be considered empty. • Only use a black or blue color (no red or green)!

• Write your answers only in the provided solution boxes or the scratch paper. • For problems that say "Justify your answer" you only get points if you provide a valid explanation.

• For problems that say "Prove" you only get points if you provide a valid mathematical proof.

• If a problem does not say "Justify your answer" or "Prove" it’s sufficient to only provide the correct answer. • Instructor announcements and clarifications will be posted on Piazza with email notifications • Exam duration - 75 minutes.

Left room from

to

/

– Page 1 / 18 –

Early submission at

Problem 1 0

Normalizing Flows: Version 1 (5 credits)

a) Let k ∈ N. We consider the transformation f (z) = (z − 10)2k from R to R. Is this transformation invertible ? Justify your answer. If yes, compute the determinant of its Jacobian and its inverse.

1

X

1 2

z1 :

  Pi b) We consider the transformation f (z) =  k =1 zk   Pn : k =1

zk



   from Rn to Rn. Is this transformation invertible ? Justify your  

lu

0



tio

n

No, 10 + z and 10 − z would lead to the same output.

answer. If yes, compute the determinant of its Jacobian and its inverse.

X.

2



n

z1 + 2z2  3z1 + 4z2  4 4  c) We consider the transformation f (z) =   z3 − 2z4  from R to R . Is this transformation invertible ? Justify 3z3 − 4z4

ec tio

1

N



Sa m C p or r l 0

X . The

ot es

    

e

x1  :  x − xi −1 inverse is f (x) =  i   : xn − xn −1

So

Yes, the Jacobian  is triangular with ones on the diagonal. The determinant is therefore equal to1

your answer. If yes, compute the determinant of its Jacobian and its inverse.

Yes. we can compute the determinant of   the Jacobian per block of size 2 and compute the inverse per block 1200

 3400    . Its determinant is (1 × 4 − 2 × 3)(1 × (−4) − (−2) × 3) = −4. as well. The Jacobian is J =  0 0 1 − 2  003 −4   −2x1 + x2  3 x2 − 1 x2   2 2 . inverse is f (z) =   −2x3 + x4  3 − 2 x3 + 21 x4

X

– Page 2 / 18 –

XThe

Problem 2

Normalizing Flows: Version 2 (5 credits)

a) Let k ∈ N. We consider the transformation f (z) = (z − 10)2k from [10, +∞[ to R. Is this transformation invertible ? Justify your answer. If yes, compute the determinant of its Jacobian and its inverse.

0 1

1

X

z1 :

  Pi b) We consider the transformationf (z) =   k =1 zk  Pn : k =1

zk



   from Rn to Rn . Is this transformation invertible ? Justify your  

lu



tio

n

Yes, the determinant is 2k (z − 10)2k −1 . The is f −1 (x) = x2k + 10.

answer. If yes, compute the determinant of its Jacobian and its inverse.

2

X . The

X.

ot es

    

Sa m C p or r l





N

e

x1  :  x − xi −1 inverse is f (x) =  i   : xn − xn −1

1

So

Yes, the Jacobian  is triangular  with ones on the diagonal. The determinant is therfore equal to1

0

ec tio

n

z1 + 2z2  3z1 + 4z2  4 4  c) We consider the transformationf (z) =   z3 − 2z4  from R to R . Is this transformation invertible ? Justify 3z3 − 4z4

your answer. If yes, compute the determinant of its Jacobian and its inverse.

1200

X

– Page 3 / 18 –

1 2

Yes. we can compute the determinant ofthe Jacobian per block of size 2 and compute the inverse per block   3400    . Its determinant is (1 × 4 − 2 × 3)(1 × (−4) − (−2) × 3) = −4 as well. The Jacobian is J =  0 0 1 − 2  003 −4   −2x1 + x2  3 x2 − 1 x2   2 2 The inverse is f (z) =   −2x3 + x4  − 32 x3 + 21 x4

0

X.

Problem 3 0

Normalizing Flows: Version 3 (5 credits)

a) Let k ∈ N. We consider the transformation f (z) = (z − 10)2k from R to R. Is this transformation invertible ? Justify your answer. If yes, compute the determinant of its Jacobian and its inverse.

1

X.

1 2

  b) We consider the transformation f (z) =   

zn z1 : : zn −1



   from Rn to Rn . Is this transformation invertible ? Justify your  

lu

0



tio

n

No, 10 + z and 10 − z would lead to the same output

answer. If yes, compute the determinant of its Jacobian and its inverse. Hint: The determinant of one elementary permutation (i.e. a permutation which interchanges any two rows) is −1

1 2



n

z1 + 2z2  3z1 + 4z2    from R4 to R4. Is this transformation invertible ? Justify c) We consider the transformation f (z) =  z3 − 2z4  3z3 − 4z4

ec tio

0



N

Sa m C p or r l

e

x1

X

ot es

x2

 x3     isf (z) =   :   : 

So

n −1 Yes, it can  be seen as n − 1 stacked permutation. The determinant of the Jacobian is (−1) . The inverse

your answer. If yes, compute the determinant of its Jacobian and its inverse.

Yes. we can compute the determinant ofthe Jacobian per block of size 2 and compute the inverse per block  1200

 3400    as well. The Jacobian is J =  . Its determinant is (1 × 4 − 2 × 3)(1 × (−4) − (−2) × 3) = −4 0 0 1 − 2  003 −4   −2x1 + x2 3 1  x2 − x2  2  2 The inverse is f (z) =   −2x3 + x4  − 32 x3 + 21x4

X

– Page 4 / 18 –

X.

Problem 4

Normalizing Flows: Version 4 (5 credits)

a) Let k ∈ N. We consider the transformation f (z) = (z − 10)2k from [10, +∞[ to R. Is this transformation invertible ? Justify your answer. If yes, compute the determinant of its Jacobian and its inverse.

0 1

1

X.

  b) We consider the transformationf (z) =   

zn z1 : : zn −1



   from Rn to Rn. Is this transformation invertible ? Justify your  

lu



tio

n

Yes, the determinant is 2k (z − 10)2k −1 . The is f −1 (x) = x2k + 10

answer. If yes, compute the determinant of its Jacobian and its inverse. Hint: The determinant of one elementary permutation (i.e. a permutation which interchanges any two rows) is −1

X



X. The

ec tio

n

z1 + 2z2  3z1 + 4z2    from R4 to R4 . Is this transformation invertible ? Justify c) We consider the transformationf (z) =  z3 − 2z4  3z3 − 4z4

your answer. If yes, compute the determinant of its Jacobian and its inverse.

 3400    as well. The Jacobian is J =  . Its determinant is (1 × 4 − 2 × 3)(1 × (−4) − (−2) × 3) = −4 0 0 1 − 2  003 −4   −2x1 + x2 3 1  x2 − x2  2  2 The inverse is f (z) =   −2x3 + x4  − 32 x3 + 21 x4

X

– Page 5 / 18 –

0 1 2

Yes. we can compute the determinant ofthe Jacobian per block of size 2 and compute the inverse per block  1200

2

N

Sa m C p or r l 

1

ot es

    

e

  inverse isf (z) =   

x2 x3 : : x1

So

n −1 Yes, it can be seen as   n − 1 stacked permutation. The determinant of the Jacobian is (−1)

0

X.

Problem 5 0 1 2 3

Variational Inference (3 credits)

We are performing variational inference in some latent variable modelpθ (x, z) using the following family of variational distributions Q1 = {N (z |φ, 1) : φ ∈ R}. Assume that   1 p(z) = N (z |0, 1) ∝ exp − z 2 2

p(x |z) = Bernoulli(x | σ (z)) ∝ exp (xz − log(1 + exp(z)))

n

exp(z) q⋆ ∈ Q1, such that and x is known and fixed and σ (z) = 1+exp( z ) is the sigmoid function. Does there exist a ⋆ KL(q (z) k p(z |x)) = 0? Justify your answer.

X

p(z |x) =

tio

KL divergence between two probability densities is equal to zero if and only if these densities are equal . In other words, KL(q⋆ (z) k p(z |x)) = 0 only if p(z |x) = q∗ (z) for some q∗ ∈ Q1. This means, we need to check whether the posterior distribution p(z |x) is a normal distribution with some mean φ and unit variance. By Bayes’ rule, p (x |z) · p (z) p(x)

2

lu

∝ p(x |z) · p(z)   1 ∝ exp − z 2 · exp (xz − log(1 + exp(z)))

So

  1 2 = exp − z + xz − log(1 + exp(z)) 2

X

We can see that p(z |x) is not a normal distribution because of the non-constant term log(1 + exp(z )) inside the

X Therefore, there doesn’t exist a normal distribution inQ

1

that has zero KL divergence to p(z |x).

ot es

N n

ec tio

Sa m C p or r l

e

exponent.

– Page 6 / 18 –

Problem 6

Robustness of Machine Learning Models (3 credits)

Consider a trained binary logistic regression model with weight vectorw ∈ Rd and bias b ∈ R, where d is the data dimensionality. That is, the predicted probability of a sample x ∈ Rd belonging to class 1 is: p(y = 1|x) = σ (w T x + b),

where σ (z) = 1+e1−z is the logistic sigmoid function. An input sample is assigned to class 1 if p(y = 1|x) > 0.5, else it is assigned to class 0. We would like to perform robustness certification of the logistic regression model using its Lipschitz constant. That is, we want to certify that the predicted class of the input sample does not change w.r.t. some radius in the input space.

n

a) Briefly explain why it is sufficient to considerF(x) = w T x + b, i.e. the model without the sigmoid activation function σ (·) for this purpose.

0 1

Because we only need to know on which side of the decision boundary an input point is mapped, i.e. whether

tio

X

lu

w T x + b > 0.

b) Derive the (smallest possible) Lipschitz constant of F(x) = w T x + b w.r.t. the L2 norm. Justify your answer.

X

So

kw T x − w T x˜ k ≤ L kx − ˜x k

kw T ∆k ≤ L k∆k, where ∆ = x − x˜

ot es

X

≤1

N

Sa m C p or r l

e

kw T ∆ k ≤ L assuming k∆k > 0 k∆ k   T w ∆   ≤ L kw k ·  kw k k∆ k  | {z }

n

Therefore, the smallest possible Lipschitz constant is L ∗ = kw k.

T

ec tio

• It is also correct to derive the result using the operator norm ( kkw∆k∆k= kw k). • It is also correct to derive the result by showing that the norm of the gradient ofF (x) is upper-bounded by kw k.

– Page 7 / 18 –

0 1 2

Problem 7 0 1 2 3 4

Hidden Markov Model (4 credits)

Consider an HMM where hidden variables are in {1, 2, 3} and observed variables are in {a, b } . Let the model parameters be as follows:       1 1 1 1 1 0 4 4 2 3     2 1     1 A =  12 14 π =  13  , , B = 3 3  4     1 4

1 3

1 4

1 2

0

1

n

Assume that the sequence X1:3 = [aba] is observed. Find the most probable sequence [Z1 , Z2 , Z3 ] and compute its likelihood. Justify your answer. We remark that it is impossible to be at state 3 a time t = 1, at state 1 a time t = 2, at state 3 a time t = 3 (all

X

X

tio

paths containing these state/time would have probability0 ) . We apply the viterbi algorithm 12 . The most probable path to go at state 2 and 3 a time t = 2 are [1, 2] and [1, 3] with probabilities 13 × 1 × 41 × 31 and 13 × 1 × 12 × 1 . The most probable path to go at state 1 and 2 a time t = 3 are [1, 3, 1] and [1, 3, 2] with

X

probabilities 13 × 1 × 21 × 1 × 41 × 1 and 13 × 1 × 12 × 1 × 41 × 32 32 . Without using the viterbi algorithm, note that an alternative valid (but less efficient) solution would be to compare all path probabilities. The most . probable sequence of states is [1, 3, 1] with probability 31 × 1 × 12 × 1 × 14 × 1

N n

ec tio

Sa m C p or r l

e

ot es

So

lu

X

– Page 8 / 18 –

Problem 8

Temporal Point Process (3 credits)

Consider a temporal point process defined on the interval [0, 10π ] with the conditional intensity function

0

λ∗ (t ) = sin(t ) + 2

1

What is the expected number of events that will be generated from this TPPon the interval [0, 2π ]? Justify your answer.

2

The intensity function λ∗ (t ) is independent of the history, so the above TPP is an inhomogenous Poissson

X Therefore, the expected number of events in[0, 2π] is equal to the total intensity on this interval.

X

Z

2π

λ∗ (t ) dt = 0

Z

2π

(sin(t ) + 2) dt = 4π

X

0

N n

ec tio

Sa m C p or r l

e

ot es

So

lu

tio

E[N([0, 2π ])] =

n

process.

– Page 9 / 18 –

3

Problem 9 0

Neural Network approaches for temporal data (4 credits)

We trained the following models to produce latent representations:

1

• M1: Skip-gram word2vec model

2

• M2: LSTM

3

• M3: Self-attention without positional encoding

4

• M4: Self-attention with positional encoding • M5: A convolutional NN which produces the latent representation at timet based on the input at time t and t − 1.

n

At inference time, we use the following 6 sentences as inputs: • S1: "I left"

tio

• S2: "They left" • S3: "I left yesterday" • S4: "I go left"

lu

• S5: "left I go" • S6: "go left"

So

For each model (i.e. M1, M2, M3, M4, M5), which sets of sentences (e.g. (S1, S2), (S3, S4, S6),...) are guaranteed to produce the same latent representation for the word "left" ? Note that the answer may be zero, one or more sets of sentences. Justify your answer.

X

ot es

• Skip-gram word2vec model: embeddings are all the same regardless of the context after training.

on the left side.

X

e

• LSTM : "I left" and "I left yesterday" produce same embedding. The words "left" have the same context

• Self-attention without positional encoding: "I go left" and "left I go" produce same embeddings. The

X

N

Sa m C p or r l

context counts but not the order of the words.2 1

n

• Self-attention with positional encoding: Embeddings could be all different.

1 2

X

ec tio

• A convolutional NN with window size of 2: "I left" and "I left yesterday" produce same embedding. "I go left" and "go left" produce same embedding. The words "left" have the same window of size 2

– Page 10 / 18 –

X

Problem 10

vd

PageRank Lollipop (7 credits)

ve

v1

...

v3

vn

va

tio

vb

v2

n

vc

lu

Figure 10.1: A directed “lollipop” graph with a tail of length n

ot es

So

Consider the directed, unweighted graph in Figure 10.1 with a “head” consisting of the six nodesv1 , and va through ve . Its “tail” consists of n nodes v1 , ... , vn where n is a parameter. Note, that we considerv1 to be part of the head as well as the tail.

N n

Sa m C p or r l

e

a) Which of the PageRank problems of dead end, spider trap, and periodic states apply here? Justify your answer. For each problem that applies, give a set of edges (at most 3 per problem) that would resolve that problem if they were inserted.

ec tio

The head forms a spider trap because of it is a group of connected nodes without links to nodes outside of the group. We solve this by adding an edge (ve , vn ) that makes all other nodes reachable from within the previous spider trap. The head also has the problem of periodic states because a random walker returns to a state exactly every 6 steps. We make va aperiodic by introducing the self-loop (va , va ) which lets us return to va at arbitrary times. This way we can also reach any other node in the head at any time because we can offset our the period of 6 arbitrarily. for identifying the spider trap and giving an edge that connects a node from the spider trap with vn .

X...