数据库系统代写|W4112 Database Systems Implementation Homework 4
这是一篇美国的数据库系统代写
- Consider the query from Homework 2:
SELECT R.A, S.B, T.C
FROM R, S, T
WHERE R.D=S.E and R.F=T.G AND T.H < 7
Suppose that each of the three tables is stored at a difffferent site in a distributed database.
(a) Show a distributed query plan that could potentially be used to answer this query. You can make (and should state) any assumptions about where the answer to the query is needed. Your plan should have at least one semijoin component.
(b) For your plan from part (a) above, estimate the total volume of the data that will be sent over the network. The size of the A, B and C columns are each 20 bytes, and the remaining columns are 4 bytes each. Each table has 108 rows. You will need to state any assumptions that you make about the selectivity of the selection condition, and the kind of join that’s involved (e.g., Is it one-to-one? Do all records have matches in the other tables?).
(c) Modify your plan above to replace one of the semijoins with a Bloom fifilter. Recalculate the cost taking into account the size of the Bloom fifilter and the false positive rate. What size and false positive rate minimizes the total network data volume?
- Consider the following alternative to the 2-phase commit (2PC) protocol. Explain its vulnerabilities.Is it correct? What kinds of failures could be problematic?
In this protocol, there is a coordinator site and many subordinate sites. At commit time, the coordinator transaction sends a prepare message to all subordinates telling them to prepare to commit, as in 2PC. If the coordinator receives an ack from all subordinates,it does nothing. If there is a missing ack after a certain time, say 10 seconds, an abort message is sent to all participants. Each subordinate that has entered the prepared state waits 20 seconds for an abort message. If an abort message arrives, the site locally aborts the transaction. If no abort message arrives, the site goes ahead and commits.
- FoundationDB insists that transactions take less than 5 seconds, and suggests that long transactions could be broken up into pieces that each take less than 5 seconds. Explain why, in the general case, breaking up transactions in this way would violate the ACID properties. Suggest a way to address these problems.Hint: Take a look at the fifirst three pages of this paper on Sagas:https://dl.acm.org/doi/10.1145/38713.38742.
- Consider a database for an on-line newspaper with four tables:
- Journalist(jid,name,department)
- Article(aid,title,words)
- Wrote(jid,aid,date)
- Clicks(aid,month,year,total)
The attributes have the obvious meanings, and the id attributes in each table form a (possibly composite) key for that table. Rows appear in the Clicks table only when the article has been read at least once, i.e., total will be positive. The only available indexes are on the keys of the table.
The following two views have been materialized, without any associated indexes:
V1 Select name, A.aid, words, date
From Journalist J, Article A, Wrote W
Where J.jid=W.jid And A.aid=W.aid
V2 Select jid, A.aid, C.month, C.year, sum(words*total) as words-served
From Article A, Wrote W, Clicks C
Where A.aid=W.aid and C.aid=A.aid
Group by jid, A.aid, C.month, C.year
For each of the following queries, express the query in SQL. Explain which (if any) of these views can be used (possibly in combination with the base tables) to answer the query. Explain how the view can be used, and whether using it would be likely to be more effiffifficient than the original query.
(a) Which articles did someone named “Mary Smith” write in April 2018?
(b) Give the names and addresses of journalists writing articles longer than 10,000 words.
(c) For each author, give the total words served in the fifirst complete calendar month after the article was written. (So if the article was written on March 23rd, the query is asking for words-served in the whole month of April of the same year. Don’t forget that December wraps to January of the next year.)
(d) List the aids of articles written in 2015 having fewer than 100 words.
(e) List the distinct names of all journalists who have written an article.
(f) List the names of all journalists who have written an article. In the event that n journalists happen to have the same name, the name should appear exactly n times in the answer.