大数据代写 | MSCA 31013 Assignment 1 – Big Data Solution Design

This is an individual coding assignment. The objective is to implement the R-tree. Each submission
will be graded based on correctness and eciency. The rest of the document explains the details.
How Your Submission Will Be Tested: You will be given a dataset which contains 2D points.
The dataset will be provided in a text le as the following format:
n
id 1 x 1 y 1
id 2 x 2 y 2

id n x n y n
Speci cally, the rst line gives the number of points in the dataset. Then, every subsequent line
gives a point’s id, x-, and y-coordinates.
Your program should build an R-tree in memory from the dataset. Then, we will measure its query
eciency as follows.
First, your program should display the time of reading the entire dataset once. This time
serves as the sequential-scan benchmark to be compared with the cost of your query
algorithms that leverage the R-tree.
[Range Query Testing] You will be given a set of 100 range queries in a text le whose format
is:
x 1 x’ 1 y 1 y’ 1
x 2 x’ 2 y 2 y’ 2

x 100 x’ 100 y 100 y’ 100
That is, each line speci es a query whose rectangle is [x; x0]  [y; y0]. You should output:
{ to a disk le the number of points returned by each query-note: we need only the
number of points retrieved, instead of the details of those points.
{ the total running time of answering all the 100 queries, and the average time of each
query (i.e., divide the total running time by 100).