Written Assignment #4
CAS CS 460: Introduction to Database Systems Spring 2021
Due: Tue, April 27, 2021, at 11:59PM, using Gradescope.
Problem 1. (Query Evaluation – Joins) ( 25 points)
Consider the join R ⋈R.a=S.b S and the following information about the relations to be joined. The cost metric is the number of page I/Os unless otherwise noted, and the cost of
writing out the result should be uniformly ignored.
Relation R contains 20,000 tuples and has 10 tuples per page.
Relation S contains 5,000 tuples and also has 10 tuples per page.
Attribute b of relation S is the primary key for S, and every tuple in S matches 3 tuples in R There exists a unclustered B+-tree index on R.a with height 3
There exists a clustered B+-tree index on S.b with height 2
The main memory buffer can hold 5 pages
Answer the following questions:
- If the join is evaluated with a (BNL) block nested loop join, which relation should be the outer relation? Justify your answer. What is the cost of the join in number of I/O’s?
- If the join is evaluated with an index nested loop join, what will be the cost of the join in number of I/O’s? Which index you will use? Show your cost analysis and state any assumption that you may make.
- If the join is evaluated using sort-merge join what is the cost in number of I/Os?
- Estimate the cost of computing the join, in number of I/Os, using a hash join assuming:
i) The main memory buffer can hold 202 pages, ii) The main memory buffer can hold 4 pages.
Problem 2 (Query Optimization) (25 points)
Consider the following three relations: R(a,b,c), S(d,e), W(f,g,h). Assume that R has 1000 tuples, S has 10,000 tuples and W has 100 tuples. Also, assume that each page stores 10 tuples, so R has 100 pages, S has 1000 pages, and W has 10 pages. Assume a buffer of 5 pages.
Consider now the following SQL Query: SELECT *
FROM R, S, W
WHERE R.a = S.d AND R.c = W.h
- (i) Assume that all relations are stored in heap files, there are no indexes, only (BNL) block nested-loop joins can be used, and the selectivity of each join condition is 0.1%. That is, the join between two tables will produce 0.1% of the Cartesian product result (the maximum possible result between the two tables). Show the query plan selected by a System R based query optimizer. Use the number of disk IO operations as the cost function. Hints: 1) Remember that a System R optimizer will try to avoid Cartesian products in the plan. So do NOT consider such plans. 2) The System R optimizer will only consider left-deep plans.
- (ii) Consider the following small modification to the above query and consider that we add an unclustered B+ tree index on S.d as well as a clustered B+ tree index on R.b. Without re-computing the new best plan, explain how these changes affect the optimization process for this query.
FROM R, S, W
WHERE R.a = S.d AND R.c = W.h AND R.b > 100
Problem 3 (Transactions) (25 points)
1. For each of the following schedules, draw the precedence graph and argue if the schedule is conflict serializable. If the schedule is conflict serializable, give one possible equivalent serial schedule. (Ri means transaction i reads an item and Wi writes an item. Ci means transaction i commits.)
a) R1(A);R4(A);W1(A);W3(B);R2(A);R2(B);W2(C);R4(B);R4(C);R2(D);R3(E);C1;C2;C3; b) R1(A);W1(A);R2(A);R2(B);W3(B);W2(C);R4(A);R4(B);R2(D);R3(E);C1;C2;C3;
2. For the following two schedules, insert the appropriate locks (shared and exclusive) into the schedule following the Strict 2PL protocol. Also explain what happens as the scheduler executes each schedule.
Note that, if a transaction blocks because of an operation, the transaction with the next operation in the schedule will continue. If you have a deadlock, you need to chose a transaction to abort, release its locks and then the schedule will continue and the transaction will start again. When a transaction unblocks, it resumes its operations. Write the actual executed schedule.
Problem 4 (Recovery) (25 Points)
Consider the log shown below. LSN LOG
00 begin checkpoint
10 end checkpoint(EMPTY XACT TT TABLE AND DPT) 20 update: T1 writes P1(old x1: new y1), prev=NULL 30 update: T1 writes P2(old x2: new y2), prev=20
40 update: T2 writes P3(old x3: new y3), prev=NULL 50 update: T3 writes P4(old x4: new y4), prev=NULL 60 T1 commit, prev=30
70 update: T3 writes P2(old y2: new y5), prev=50
80 T1 end, prev=60
90 update: T2 writes P1(old y1: new y6), prev=40
100 T2 abort, prev=90
In this log, we store information about 3 transactions. After the log record with LSN 100, the system crashes and then we restart. We use the ARIES recovery algorithm. Based on that, answer the following questions:
- What is done during the Analysis phase?
- What is done during the Redo phase?
- What is done during the Undo phase?
- Show the log when recovery is complete, including all non-null prevLSN and
undonextLSN values in log records.
本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！
E-mail: firstname.lastname@example.org 微信:itcsdx