COMP 418 Assignment 1
Part 1. Concepts and principles
a. Briefly explain the three main alternatives for storing information in a data entry of an index.
b. Define clustered index, and discuss the relation between the three alternatives and clustered/unclustered indexes.
Consider the following file organizations: sorted files, heap files with an unclustered tree index on the search key, and heap files with an unclustered hash index. Briefly discuss the suitability of each of these file organizations to perform the following operations: file scans, range selections, inserts, and deletes.
a. Briefly describe the two internal organizations for heap files (using lists versus directory of pages).
b. Explain which organization you would choose if records are variable in length.
Compare ISAM and B+ Tree indexes. Explain briefly their differences in handling Search, Insert and Delete, and discuss when you would use ISAM and when you would use B+ Tree index.
Does the final structure of a B+ tree depend on the order in which the terms are added to it? Explain your answer using an illustration example.
Explain how extendible hashing uses a directory of buckets, and discuss the global depth of the index and local depth of a bucket.
Part 2. Design considerations for application scenarios
Consider the following relations:
Professor (profid: integer, name: varchar, salary: integer, age: integer, depid: integer)
Department (did: integer, budget: integer, location: varchar, mgr eid: integer)
Salaries range from $30,000 to $100,000, ages vary from 20 to 80, each department has about 20 employees on average, there are 10 locations, and budgets vary from $100,000 to $1 million. You can assume uniform distributions of values.
For each of the following queries, what index would you choose to speed up the query? If your database system does not consider index-only plans (i.e., data records are always retrieved even if enough information is available in the index entry), how would your answer change? Explain briefly.
a. Query1: Print name, age, and salary for all professors. (5 marks)
b. Query2: Find the dids of departments that are located in Edmonton and have a budget of more than $150,000. (5 marks)
The CVT Company is a leader in the manufacture of work clothes. You are hired as database administrator for the company and your IT supervisor asked you to solve a retrieval speed problem they used to have with a large file for item records. Your supervisor mentioned that they have sorted the file but the problem didn’t improve, so they need to create a B+ tree index to solve the problem. Your supervisor outlined the way to do it: “The best way to accomplish this task is to scan the file, record by record, inserting each one using the B+ tree insertion procedure.” Being a fresh graduate, you noticed that since the file is already sorted there is a better way to do it.
a. What performance and storage utilization problems are there with your supervisor’s approach? (6 marks)
b. Explain how the bulk-loading algorithm provides a better alternative than the proposed scheme. (6 marks)
Part 3. Implementation Case (30 marks)
Consider the following database schema with the following relations:
Student (SID, Name, Address, Telephone, Age)
Course (CourseNo, Title, Department, NumberOfCredits, CourseFees)
Registration (SID, CourseNo, startDate, CompleteDate, Grade)
Consider the following queries:
o List the student numbers and names of students who received a grade greater or equal to 70% in the course “COMP418,” sorted by age ascending.
o List the course numbers and titles of courses that have more than 10 students getting a grade lower than 50. [(Use group by courseNo and count(SID)].
o List the course numbers and titles of courses whose course fees are between 400 and 600 dollars.
o List all courses in the database.
o Update all the course fees by adding 6 dollars to each course.
Your task is to implement this database using PostgreSQL or any other DBMS of the list specified in the course (Oracle, MySQL, DB2, SQL server) then compare the performance of the system before creating the indexes and after creating the indexes. Make sure that you create indexes that support the queries.
o You should use test data to identify performance issues: the more data, the better. Make sure there is sufficient test data in your system to be able to run queries that can return at least a dozen rows of data even when using the queries. Unless there is a fair amount of test data, you will not be able to see much difference in query execution times.
本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！
E-mail: [email protected] 微信:itcsdx