# 数据库代写｜W4112 Database Systems Implementation Homework 1

1. Find the technical specifications of:

• a Barracuda magnetic hard drive from Seagate (http://www.seagate.com).
• a Barracuda SSD from Seagate (http://www.seagate.com).
• an Intel Optane SSD

Compare the devices according to price, capacity, sequential I/O bandwidth, and random I/O band
width. You might need to calculate/approximate these numbers from other parameters in the product
specifications. If you can’t find exact pricing information from the manufacturer, estimate the price
based on information from retailers, resellers, reviewers, etc. for the same or similar models.

From these numbers, estimate the following derived metrics: (a) Cost per terabyte; (b) Cost per
MB/second for sequential I/O; (c) Cost per MB/second for random I/O. Based on your comparison,
describe what kind of customer would prefer each kind of drive.

2. Given a set S of items to insert into an extandable hash table, does the order of insertion matter? In
other words, could we end up with different hash table sizes (measured in total disk blocks used) if we
insert the items in S in different orders? Either explain why the structure will always have the same
size, or provide an example of two orderings of a set of insertions that result in different table sizes.

The internal organization of individual buckets is not important for this question.

(a) The paper describes a SIMD width of 128 bits. Since that paper was written, SIMD widths using
AVX512 have grown to 512 bits, which matches the size (64 bytes) of a cache line. How does this
change the structure of the FAST tree?

(b) Suppose that the next generation of SIMD instructions uses 1024 bits, but that machines still
have a cache line size of 64 bytes and there is no change to the maximum number of simultaneous
outstanding cache misses. Would it be a good idea to extend the cache-line blocking structure to 2
cache lines rather than one, to match the new SIMD width? What components of the index lookup
performance improve relative to a 512-bit SIMD scheme? In what ways might the performance
worsen?

4. The DASH Tree uses a fingerprinting technique (Section 4.2 of the paper in the additional readings)
to avoid some unnecessary work for nonmatching keys. Why do you think the authors used this data
structure rather than a Bloom Filter?

5. Table 2 of the Learned Indexes paper shows that the studied learned indexes are not only faster than
B-trees or FAST trees, but also significantly smaller. Explain at a high level where these space savings
come from. (2 paragraphs should be enough here.)

E-mail: itcsdx@outlook.com  微信:itcsdx