2023年11月25日

Python代写 | CMPT 459 Assignment 2

本次Python代写是在关系数据上定义数据仓库并对其他一些数据进行多维分析

CMPT 459 Assignment 2
Question 1 (30 points)
This question is about the concept of data warehousing and OLAP.
You learn how to define a data warehouse on relational data. Let us try to extend the idea of
multi-dimensional analysis to some other kinds of data.
1. Consider a data set containing tweets, and assume the text is properly pre-processed. If
we use each keyword (token) as a dimension, use each tweet as a record, and use COUNT()
as the aggregate function, can you give 3 aggregate queries Q1, Q2, and Q3, such that Q1
is a roll-up of Q2, and Q3 is a drill-down of Q1? If there are 1000 different keywords in the
data set, in total how many cuboids are there in the data cube? (15 points)
2. Consider a set of photos enhanced by some attributes, such as location and time. If you
have an AI tool that can identify people in photos, can you suggest an interesting OLAP
query using people as a dimension? (5 points)
3. Now, consider a data set of photos extracted from newspapers and magazines. Suppose
for each photo, the caption, which is a brief description of the photo, is also extracted,
processed and stored. Can you propose 3 interesting OLAP queries that roll up and drill
down using both image and text information? What are the dimensions here? (10 points)
Question 2 (20 points)
This question is about bitmap index and bit-sliced index.
Canada has 13 provinces and territories. To support OLAP queries selecting all records in one or
a subset of provinces/territories, a straightforward way to build a bitmap index uses 13 bits per
record, one bit per province/territory. Can you design a way to use less bits? Describe your design
using two examples using a table T(Province_Territory, Sales). In the first example, calculate the
total sales in BC. In the second example, calculate the total sales in BC, ON, and NT altogether.
How many bits per record does your index need?
Question 3 (20 points)
This question is about implementing data cubes in big data architecture.
Please learn MapReduce techniques by yourself. For example, the following Map and Reduce
procedures transform a set of documents into an inverted index of keywords. The example is
explained in Part 1 of Indexing and Ranking, CMPT 456 <https://youtu.be/8asVyc56ks4>.
Consider a table T (D1, D2, D3, D4, M), where D1, D2, D3, and D4 are dimensions, and M is the
measure attribute.
1. Can you describe a way to compute the data cube using MapReduce? Please give the
Map procedure and the Reduce procedure. (15 points)
2. What is the communication cost of your method? That is, what is the total number of
key-value pairs the mappers emit? (5 points)
Question 4 (30 points)
This question is about multidimensional analysis in practice.
Download the University Advancement, Donations, and Giving data set from <
https://public.tableau.com/s/sites/default/files/media/advancement_donations_and_giving_d
emo.xls>. The data set is a table of donations made to universities in the United States. The
donation amounts and locations in this data set are not real as they are intended for training
purposes only. Ignore the attributes Gift Data and Prospect ID, use Gift Amount as the measure
attribute, and the other attributes as dimensions in this question. Use SUM() as the aggregate
function.

程序代写代做C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB

CS代写,留学生编程代写,CS作业代写,Java代写,程序代写，代码代写 | ITCS代写

本网站支持淘宝支付宝微信支付 paypal等等交易。如果不放心可以用淘宝交易！

E-mail:itcsdx@outlook.com 微信:itcsdx

如果您使用手机请先保存二维码，微信识别。如果用电脑，直接掏出手机果断扫描。

Python代写

Java代写 | Programming Fundamentals COSC 2531 PHP代写 | Final Project – Web Development

CONTACT

Assignment Example

Service Scope

Recent Case

2024年10月8日

ITCS代写

Python代写 | CMPT 459 Assignment 2

CONTACT

Assignment Example

Service Scope

Recent Case

MySQL数据库学习指南：留学生如何在不同国家的课程和就业形势下脱颖而出

北美计算机留学高校整理与热门专业前景分析

留学生计算机代写常见服务有哪些？

留学生程序代写靠谱吗

留学生如何选择机器学习方向的专业

Tags