Java代写 | INFS3200 Data Pre-processing Assignment

本次澳洲代写主要为java python及数据库相关的assignment

INFS3200 Data Pre-processing Assignment

Preliminary: Dataset Description

In this assignment, we have four datasets about book information from four different sources.
The data schemas are listed below:

Book1
(id,title,authors,pubyear,pubmonth,pubday,edition,publisher,isbn13,language,series,pages)
Book2
(id,book_title,authors,publication_year,publication_month,publication_day,edition,publisher_n
ame,isbn13,language,series,pages)
Book3
(ID,Title,Author1,Author2,Author3,Publisher,ISBN13,Date,Pages,ProductDimensions,SalesRa
nk,RatingsCount,RatingValue,PaperbackPrice,HardcoverPrice,EbookPrice,AudiobookPrice)
Book4
(ID,Title,UsedPrice,NewPrice,Author,ISBN10,ISBN13,Publisher,Publication_Date,Pages,Dim
ensions)

Part 1: Data Schema Questions [6 marks]

Read the above schemas carefully and understand the meaning of each attribute. If you don’t
know the meaning of a certain attribute, check the data under it or Google its meaning (especially
for some abbreviations, like ISBN). Answer the following questions based on your
understanding.

Question 1: Given that all four book datasets were stored in different relational databases,
respectively:
(1) [1 marks] Can attribute authors be the primary key for Book1? Why?
(2) [1 marks] Given a query “Find the top 99 books which are sold the most, return their
ranks (sorted in ascending order), titles, PaperbackPrice, HardcoverPrice, EbookPrice,
AudiobookPrice of those books.”, which schema is capable of answering such query?
Write down the corresponding SQL query on that schema.
Question 2: Given that Book2 is stored in a distributed database A, and two queries that are
most frequently asked on A are:
• Find all books whose publisher name is “XXX” (or among multiple publishers),
return their book titles and author info.
• Find all books that are published in a given year, return their book IDs, languages
and number of pages.
Answer the following questions:
(1) [2 marks] If the goal of A is to handle each query by a dedicated local site, which
fragmentation strategy should be used to fragment Book2 table? If only two fragments
are generated, write their schemas (if vertically fragmented) or predicates (if horizontally
fragmented), respectively. (Note: there are lots of valid fragmentation solutions, just
provide one of them.)
(2) [2 marks] Assuming that we horizontally fragmented the table into three fragments
based on the following predicate:
Fragment 1: 1 ≤ publication_day ≤ 10
Fragment 2: 11 ≤ publication_day ≤ 20
Fragment 3: 21 ≤ publication_day ≤ 31
If we want to insert a new record into Book2, please explain the insert process in plain
English (you can use an example to demonstrate the process).