数据库代写|Database query processing and optimization Problem Set 4

本次美国代写是一个数据库查询处理及优化的Problem Set

Query Processing and Optimization

1. On your database write two queries in SQL of the following form using two different values of the
literal x. Pick two values of x, call them x1 and x2, for which the optimizer picks different plans.

SELECT R.a
FROM R, S
WHERE R.b = S.c AND S.d > x;

(a) Draw the two query plans (with method used for each operator) for x1 and x2.

(b) Experiment with different values between x1 and x2 and find the transition point (the value of x)
at which the optimizer changes the plan. Let’s call the crossover point x0. Explain in a sentence
or two why the query plan changes. Why does it decide to switch plans at that specific value?

Note, your tables need to be large enough for the optimizer to change plans and you may need
to create/drop indices. State all indices you used. Please be as quantitative as possible in your
explanation.

(c) Compare the actual running times corresponding to the two alternative query plans at the
crossover point. How much do they differ? Inside psql, you can measure the query time by
using the ntiming command to turn timing on for all queries. To get accurate timing, you may
also wish to redirect output to /dev/null, using the command no /dev/null; you can stop redi
rection just by issuing the no command without a file name. You may also wish to run each query
several times, throwing out the first time (which may be longer as data is loaded into the buffer
pool) and averaging the remaining runs.

(d) Based on your answers to the previous questions, is x0 actually the best place to switch plans, or
is it an overestimate/underestimate of the best crossover point? If x0 is not the actual crossover
point, can you estimate the actual best crossover point without running queries against the
database? State assumptions underlying your answer, if any.

2. Write a query that includes a three-way join (join of three tables).

(a) On a database with no indices examine the query plan that PostgreSQL uses to answer the query.

(Note you may need to drop constraints in order to drop an index.) Draw the query plan and
explain why PostgreSQL picks this plan. In particular, it is important to explain the join order
PostgreSQL has picked and why it is better than other join orders. Give the average runtime of
this query over a few runs (again discarding the first).

(b) Now add one index that you think will be most advantageous for this query and that leads to a
different query plan. Draw the query plan and explain why PostgreSQL picks this plan. Give the
average runtime of this query over a few runs (again discarding the first) and explain why it is
faster than the no index plan.