数据仓库代写 | COMP810 Data Warehousing and Big Data

本次CS代写的主要内容是使用Oracle建立数据仓库进行ETL和数据挖掘

 Assessment task
To design, implement and analyse a Data Warehouse (DW) for MarketExpress, one of the biggest

online sales websites.

 Project overview
MarketExpress is one of the biggest online websites, selling products from various suppliers. Suppliers’ products are kept in MarketExpress warehouses and orders (customer transactions) are fulfilled from warehouses located all over New Zealand. MarketExpress has thousands of customers and therefore it is important for the organisation to analyse the shopping behaviour of their customers. Based on such analysis the organisation can optimise their selling techniques e.g. by having relevant promotions on different products.

There is a need to 1. Build a DW so that shopping behaviour can be analysed. 2. Customer transactions from Data Sources (DSs) are required to be reflected in the DW on a daily basis. This process of reflecting the customer data into DW is called Data Integration (DI) as shown in Figure 1. To implement DI we usually need to perform ETL (Extraction, Transformation, and Loading). Since the data generated by customers is not in the format required by DW, it needs to be processed in the transformation layer of ETL. This processing will involve the enriching of transactions data with information from Master Data (MD) as shown in Figure 2. The ‘enriched data is then stored in the DW. Once the data is stored in the DW, you will be able to 3. perform analysis using multidimensional data analysis.

Assessment Part 1. Build a DW

You will build a DW using the star schema which is a data modelling technique that is used to map multidimensional decision support data into a relational DW. Star schema is an easily implemented model for multidimensional data analysis while still preserving the relational structures on which the operational database is built.

The star schema represents aggregated data for specific business activities. Using this schema, one can store aggregated data from multiple sources that will represent different aspects of business operations. For example, the aggregation may involve total sales by selected time periods, by products, by warehouses, and so on. Aggregated totals can be the total number of product sold, total sales values by products, and so on. The basic star schema has three main components: facts, dimensions, attributes. Usually in case of star-schema for sales the dimension tables are: product, date, warehouse, and supplier while the fact table is sales. However, to determine the right tables and attributes you will need to consider Figure 2, and build a DW with the appropriate fact table and dimension tables, each with appropriate attributes.

Data specifications

The assessment provides a script file named “DataStream_MasterData_Creator.sql”. By executing the script it will create two tables in your account. One is DATASTREAM table with 10,000 records populated in it. This data will be generated randomly based on 100 products, 50 customers, 10 warehouses, and one year time period as a date – from 01-Jan-19 to 31-Dec-19. The values for
the quantity attribute will be random between 1 and 10. The other is MASTERDATA table with 100 records in it. The structure of both tables with their attribute names and data types is given in Figure 4. The attributes DATASTREAM_ID and PRODUCT_ID are primary keys in DATASTREAM and MASTERDATA tables respectively.