机器学习代写 | COMP30027 Machine Learning

这是一篇来自美国的关于通过编程创建一些学校财政数据库的分析数据的SQL代写

 

The Project Background and Steps

This project utilizes much knowledge and programming skills (both SQL and SAS) covered in the class. You will create a database in Oracle, based on two big SAS datasets called School_Finance_2010. sas7bdat and School_Finance_2015. sas7bdat, and then use this database to analyze the data with sql*plus and SAS. The School_Finance_201X.sas7bdat data files contain real data about the finance of US public schools in the years of 2010 and 2015. For our international students who are not familiar with the US school systems, here is some short explanation. In each state, the public schools are divided into different school districts. A school district normally includes many different levels of schools (elementary schools, middle schools, and high schools in a city/town/area). A school district is normally funded by three sources: the local revenue, the state revenue and the federal revenue. The datasets describe these revenues and expenditures. You need to read Column Description.PDF so that you understand the datasets and the meaning of column names. Please strictly follow the following steps as you do the project and write your project report.

  1. In Oracle command line, log in as the Oracle SYSTEM user and create a new user called LastName_FirstName_STSCI5060FP and grant all the privileges to this user. If you encounter an error of invalid name, you may need to issue an “ALTER SESSION” command (see Lab 1 for details). Then, in ORACLE SQL DEVELOPER create a new connection to this new user with the same name, LastName_FirstName_STSCI5060FP.
  1. In SQL DEVELOPER, create a table, State_t, by importing a csv file called State_Code.csv. To import, right click “Tables (filtered)” and choose “Import Data …” and then follow the instructions. In Step 3, make sure that you only select the three required columns (StCode,StName and StAbb), and then choose Column Definition to continue. In the Column Definition window (in the 4th Step of Data Import Wizard), you need to delete the trailing blank(s) for all the variable names that are listed in the Source Data Columns window, in order to have valid column names. Use the default setting when you import the table, except setting the attributes of columns StCode, StName and StAbb to varchar2(2), varchar2(26) and char(2) respectively.

Attach the following screenshots of ORACLE SQL DEVELOPER: the Columns tab and Data tab of your State_T table (just showing a portion of the data of the table is OK).

  1. In Oracle, programmatically (i.e., you have to code it with SQL) update your State_t table by changing the single-digit values, 1-9, of state code to two-digit values, 01-09. This will make sure that these values are consistent with those in other tables. In this step, you are required to accomplish the task just with one update statement with the following requirements:

A.Use a WHERE clause to specify a condition.

B.Use the concatenation operator.

C.Use the SUBSTR() function.

D.Use the CAST function to explicitly convert the data type for your comparison in the WHERE clause (although in this case Oracle can do data type automatic conversion without using this function).

Query your updated State_t table to confirm that you have successfully made the changes. Only display the 9 rows whose Stcode values are less than 10.

  1. In SAS 9.4, create a libref called “Final” that references your file location of all your final project SAS files, and then connect SAS to the Oracle database user,

“LastName_FirstName_STSCI5060FP,” with the LIBNAME statement by creating a libref called myoracle. Create an Oracle database table, School_Finance_2010_t, using the myoracle libref and PROC SQL by querying the School_Finance_2010.sas7bdat dataset (which is in SAS format).

Right after you ran your SAS code, refresh the connections in your ORACLE SQL Developer. In Oracle, describe the School_ Finance_2010_t table and display the first 10 rows of the table (hint: use rownum<=10).

  1. Change table properties of the School_Finance_2010_t table:
  2. Change the property of IDCENSUS to varchar2(15).
  3. Change the property of NAME to varchar2(60).
  4. Change column names: rename the column “NAME” to “SD_NAME” and the column “State” to

“STCODE” in the School_Finance_2010_t table.

  1. The School_Finance_2010_t table is big table with many functional dependencies. You will create four tables from this big table. The first three tables are Fedrev_t, Strev_t and Locrev_t for the federal, state and local revenues respectively. The fourth table is called School_t.

A.Your Fedrev_t table should include the following columns: idcensus, stcode, and fed_rev, which is the sum of the values of columns c14, c15, c16, c17, c18, c19, b11, c20, c25, c36, b10, b12, and b13.

B.Your Strev_t table should include the following columns: idcensus, stcode, and st_rev, which is the sum of the values of columns c01, c04, c05, c06, c07, c08, c09, c10, c11, c12, c13, c24, c35, c38, and c39.

C.Your Locrev_t table should include the following columns: idcensus, stcode, and loc_rev,which is the sum of the values of columns t02, t06, t09, t15, t40, t99, d11, d23, a07, a08, a09, a11, a13, a15, a20, a40, u11, u22, u30, u50, and u97.

D.Your School_t table should include these columns: idcensus, stcode, and sd_name.

8.Programmatically set primary keys and foreign keys for the following tables:

A.Set the stcode column as the primary key of the State_t table.

B.Set the idcensus column in the Fedrev_t, Strev_t, Locrev_t, and School_t tables as the primary key.

C.Set the idcensus column of the Fedrev_t, Strev_t, Locrev_t tables as the foreign key that references the idcensus column of the School_t table.

D.Set the stcode column of the School_t table as the foreign key that references the stcode column of the State_t table.

[Hint: To set the foreign key, use the following syntax: ALTER TABLE table_name add constraint  foreign_key_name foreign KEY (column_name) references name_of_the_table_referenced (name_of_the_column_referenced);]

  1. Each of the table you have created is correspondent to an entity. As a result, you have five entities: STATE, SCHOOL, FEDREV, STREV and LOCREV. Draw an ERD (using the book’s HRT notation) to represent the relationships of these entities. You are required to list all the attributes. Then, convert your ERD to relations and use arrows to represent the primary key/foreign key constraints. If you draw the diagrams with hand, you may scan it and then insert it into the MS Word (or PDF) file.
  1. Based on Fedrev_t, list all the school districts that received more than $1,000,000 K from the federal source (note that the revenue values are expressed in thousands of dollars in the database tables). You should display three columns: idcensus, stcode and an alias, fed_revenue for the federal revenue. Do the same for the Strev_t and Locrev_t tables and find out all the school districts that received more than $1,000,000 K from the state or local sources. You should name the state and local total revenue aliases as st_revenue and loc_revenue. The revenue values you display should not be in scientific notation (do the same for the rest of the project).

Hint: use the function to_char(value, ‘999999999.9’) to achieve this; the numbers of the digit “9” represent the scale and precision of the number.

  1. Create a view called sd#_v to calculate the total number of school districts (SD#) in each state.

This view has two columns, SD# and stcode. Then,

A.find the state(s) that with the highest number of school districts by using sd#_v. In your output, list the state code, state name and the total number of school districts.

B.find the state(s) that with the lowest number of school districts by using sd#_v. List the state code, state name and the total number of school districts.