Python代写 | CMPT 459 Assignment 3

本次Python代写是完成概率相关的算法问题

CMPT 459 Assignment 3
Question 1 (10 points)
Given a transaction database T, let ??!, ??”, … , ??# be the k most frequent items. Prove, for any
length-k itemset Y, ??????(??) ≤ ??????{ ??????(??!), ??????(??”), … , ??????(??#)} .
Question 2 (10 points)
Given a transaction database T, let X and Y be two itemsets such that ??????(??) = ??????(??) and ?? ∩
?? ≠ ∅. For example, X = abc and Y = cde. Does ??????(??) = ??????(??) = ??????(?? ∪ ??) always hold?
If so, please give a mathematical proof; otherwise, give a counter example.
Questions 3 and 4 use the tweet data sets D1 and D2 formed in Assignment 1. You can build your
solutions to Questions 3 and 4 on top of any tools/codes you find on the web, as long as you
make proper references.
Question 3 (50 points)
In D1 and D2, treat each token as an item, and each tweet as a transaction. That is, we ignore
the order of tokens within a tweet. If a token appears multiple times in a tweet, keep only one
occurrence. Write a program to find the top 100 patterns of lengths 1, 2, 3, 4, and 5 for D1 and
D2, respectively. Here, a pattern of length k is a set of k tokens. Some patterns may have same
length and same support. Report all of them to make at least 100 patterns for each length. More
specifically, a pattern X of length k is a top-100 pattern if there do not exist 100 other patterns
??!, … , ??!$$, each of length k, such that ??????(??%) > ??????(??) for 1 ≤ ?? ≤ 100.
1. Describe your algorithm and implementation. (20 points)
2. Submit those frequent patterns and their supports. (10 points)
3. Plot a figure where the x-axis is the length k and the y-axis is the support of the most
frequent pattern at length k. Do the curves of D1 and D2 fit the power law distribution?
Try to estimate the parameters of the distribution. (20 points)
Question 4 (30 points)
For a pattern X, we are interested in ??????(??) = &'(!”(*)
&'(!#(*)