Register Now

Login

Lost Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Login

Register Now

Welcome to All Test Answers

PRACTICE MIDTERM-Big Data Analytics and Database Design



Download  file

If you are not a member register here to download this file


 

[Frequent Itemset Mining and Association Rules]

  • [10 marks] Compute frequent itemsets for the baskets below with A-Priori Algorithm. Assume support threshold is s ≥ 3. Provide the details of your computation and all the necessary steps.
    a) Bread, Coke, Milk, Pepsi
    b) Coke, Diaper, Milk
    c) Beer, Bread, Diaper, Milk
    d) Beer, Bread, Fanta, Diaper
    e) Beer, Coke Diaper, Milk
  • [10 marks] Using the baskets of question 1, what is the confidence and interest of the following association rules?
    i. {Milk} → {Coke}
    ii. {Diaper, Milk} → {Beer}
    Which one would you rank higher (i.e., is more interesting)? Justify your answer.
  • [10 marks] Provide a description of PCY algorithm supported by a diagram. What is stored in main memory in Pass 1 and Pass 2. (Assume we look for frequent pairs of elements.)
  • [10 marks] Describe how and why the bitmap is used in PCY algorithm.
    [Finding Similar Items]
  • [10 marks] Answer the following questions.
    a) Provide the definition of Jaccard Similarity (formula).
    b) What is the Jaccard similarity of the two sets S and T in the following figure?

  1. [10 marks] Compute the signature matrix with single pass over two provided hash functions.
    Note: the values of the hash functions are given to you for simplicity.

  • [10 marks] Answer the following questions.
    a) Based on the signature matrix that you computed in the previous question, estimate
    the Jaccard similarity of sets S1 and S4.
    b) What is the actual Jaccard similarity between S1 and S4?
    [Clustering]
  • [10 marks] What is the advantage of BFR algorithm over k-means for clustering?
  • [10 marks] Define centroid and clustroid? Provide supporting examples of computing centroid
    and clustroid.
  • [10 marks] Apply hierarchical clustering on the following data in a 2-diemnsional Euclidean
    space. Assuming stopping point is k = 2 (k is the number of clusters). Provide all intermediate
    computations.
    Note: use Manhattan Distance. Dist( (x1, x2), (y1, y2) ) = |x1 – y1| + |x2 – y2|
    For example, Dist( (4, 10), (3, 8) ) = |4 – 3| + |10 – 8| = 1 + 2 = 3
    a) (4, 10)
    b) (3, 8)
    c) (6, 10)
    d) (6, 8)

About

Leave a reply

Captcha Click on image to update the captcha .

error: Content is protected !!