本次美国代写主要为概率算法和数据结构限时测试

Comp 480/580: Mid-Term Exam

1 Trivia (True or False) 15 points (3 points each)

Make sure to provide short explanation.

1. Given a set S, our goal is to be able to compress S. We want to answer whether any given x is an

element of S or not. We can use bloom ﬁlters on S to compress it for this task.

2a. Given a randomvariable X, we know itsmean and variance E(X) and V ar(X). Given another random

variable Y , which is possibly correlated with X. We also know its mean and variance E(Y ) and V ar(Y ).

The above information is suﬃcient to compute the mean of X + Y.

2b. The information in 2a is suﬃcient to compute the variance of X + Y

3. We have two random variables X and Y (and they are not constants). It is possible to have situations

where E(

X

Y ) = E(X)

E(Y )

.

4. Count-min sketch returns an underestimate.

5. Given any LSH function h, where Pr(h(x) = h(y)) = pxy for any given x and y. We can always construct

another LSH g with probability Pr(g(x) = g(y)) = p3

xy

2 Estimation and Minwise Hashing 10 points

Given two sets S1 and S2, minwise hashing given by h guarantees that the probability of hash agreement

is Jaccard similarity J, i.e.,

Pr(h(S1) = h(S2)) = jS1 \ S2j

jS2 [ S2j

= J

In the class we showed that if we generate K independent minwise hashes of S1 and S2, then simply

counting the number of hash collisions, call it Nc, out of K gives us an estimator of J given by Nc

K . In

addition, let us say that the variance of this counting estimator is given by V = J(1 J)

K

Questions:

1. Identify the random variable in the above problem. 1 point.

2. Given the values of the sizes of the sets, i.e., values of jS1j and jS2j (assume these sizes to be known

constants). Can we get an estimator for jS1 S2j (Number of elements in S1 but not in S2)? (only in terms

of Nc, K, jS1j, and jS2j). 5 points

3. What can we say about the variance of the estimator given in 2 above? (in terms of J, K, jS1j, and

jS2j) 4 points

Known property: V ariance(aX) = a2V ariance(X), where a is constant and X is any random variable.

3 Count-Min Sketch with weights 10 points

(Please see cheat-sheet for reference) Let us assume that every item i has a weight associated with it, call

it vi. We modify the count-min sketch algorithm as follows:

• Addition: Instead of “Each counter is initialized with zero, and every update ct

i (to item i at time t) is

added for all d rows to countersM(j; hj(i)), where j = f1; 2; :::; dg”, we update by multiplying every

update of item i with vi, i.e., for update ct

i we update the countersM(j; hj(i)), where j = f1; 2; :::; dg

with vi ct

i

• Querying: While estimating the count for element i, we use roughly the same estimation as count-

min sketch, except that we divide the estimate by vi. Essentially, a query for the count of item i

reports the minimum of the corresponding d counters (after division by vi) i.e., min

j2f1;2;:::;dg

M(j;hj (i))

vi

Note, this is generalization of count-min sketch. We recover count-min sketch if we put vi = 1 for all i

Questions:

1. Consider the element j that has the largest weight, i.e., vj = maxivi. What can we say (more error

or less error) about the estimate of this largest element compared with the estimate of usual count-min

sketch (where all vi = 1) 4 points.

Questions: Maybe just consider what happens with only 1 hash function (i.e. d =1).

2. Consider the element j that has the smallest weight, i.e., vj = minivi. What can we say (in one

line) about its estimate of this element compared with the estimate of usual count-min sketch (where all

vi = 1) 4 points

3. In light of 1. and 2. above, can you describe (in few words) what do you think is going on. What is

the signiﬁcance of weights here? 2 points

4 Minwise Hashing (Advanced) 15 points

The proof of the fact

Pr(Minhash(S1) = Minhash(S2)) = jS1 \ S2j

jS2 [ S2j

= J;

follows from the following arguments.

1. Given any set S, Apply hash mapping h (assuming perfect hashing i.e. for two diﬀerent element

x = y, implies h(x) = h(y)), to every element of S. The element that gets the minimum value (call

it Minhash(S)) is a perfectly random draw of an element from S

2. Now consider set S1 [ S2. Apply hash mapping h to this union of two sets. Thus, we expect

Minhash(S1 [ S2) to give a random draw from S1 [ S2.

3. We haveMinhash(S1) = Minhash(S2) = Minhash(S1[S2), if an only if theminimumof h(S1[S2)

comes from a token belonging to S1\S2. Only when both the minimum values coming from the sets

h(S1) and h(S2) will be equal. Essentially, the collision event indicates a random draw from S1 \S2.

(Venn diagram may help visualise)

4. Thus, the collision event is a special event where Minhash(S1 [ S2) satisﬁes Minhash(S1) =

Minhash(S2) = Minhash(S1[S2). The event has jS1\S2j choices. The total choices forMinhash(S1[

S2) is jS1 [ S2j (any random draw irrespective of the collision).

A good idea is to look at standard Venn diagram in Figure 1.

Questions:

1. Consider the case Minhash(S1) = Minhash(S2). What is the probability of this event? 2 points.

2. Can we come up with an expression of Pr(Minhash(S1) > Minhash(S2)), in terms of jS1j, jS2j,

jS1[S2j, and jS1\S2j? Can you instead use this to estimate Jaccard similarity J (Generate K independent

minwise hashes of S1 and S2 and …. (similar to Problem 2 deﬁne the estimator)) 8 points

3. Lets say we have three sets. S1, S2, and S3. What is the expression for Pr(Minhash(S1) =

Minhash(S2) = Minhash(S3)) (in terms of set deﬁnitions). Can you generalize the expressions if we

have instead n sets S1, S2, …, Sn and we are looking for Pr(Minhash(S1) = Minhash(S2) = ::: =

Minhash(Sn)) (Hint: Draw a venn diagram with three sets). 5 points

4 (Keep thinking, No points)Which estimator for J is better? The one derived in Problem2 previously

or derived in the second part of this problem.

**程序代写代做C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB**

本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！

**E-mail:** itcsdx@outlook.com **微信:**itcsdx

如果您使用手机请先保存二维码，微信识别。如果用电脑，直接掏出手机果断扫描。