Python代写 | COMP60711 – DATA ENGINEERING Coursework Description

本次Python作业是根据给定的数据集完成各项任务

COMP60711 – DATA ENGINEERING
Coursework Description

Task3 Choose a day of the week, e.g., Tuesday, and use bar plots to visualise the average
traffic volume for each hour of the day. To obtain an accurate average traffic volume
for a given week day, for example Tuesday, consider all Tuesday records in the file
and consider all lanes associated with the North direction, and later (and separately)
considering all lanes associated with the South direction. You should generate a
separate bar plot for each traffic direction (North and South).
Output: Two bar plots, one for the North direction and one for South, for a weekday
of your choice (Tue or Fri). Each bar plot should show the average traffic volume for
each hour of the day. Provide a step-by-step description of the development of the
task, emphasising the features of each of the tools/languages that you used. In
other words, make sure you include (1) screenshots of intermediate steps you had
to carry out while using the technology you chose to use to develop the task
(especially where no coding was necessary), (2) the final recipe/result, (3) an
explanation of each step, and (4) an interpretation of the results.
Marking scheme:
8 marks for correct barplots (5 marks for each).
27 marks for a clear, correct and complete step-by-step description, which should
include not only the complete code you wrote to prepare the data from its original
form to the point the analysis was made, but also an explanation of each step in text
and your interpretation of the obtained results/analysis.
Task4 Develop the same solution for Task 3 described above using a second technology
of your choice (a tool of PL), which should be different to the technology you used to
develop Task 3 for the first time. You can copy and paste here and/or make simple
adaptations to the comments and step-by-step description of the solution you
provided for Task 3.
6
Considering (i) Task 3 of this week, described above, which by now you should
have done and (ii) the two technologies you have used to develop this task,
compare the two technologies, discussing advantages and disadvantages of each
for each task, taking into account not only any extra work (manual or not) you had to
do for absence of facilities, or limitations of existing facilities associated with the
given technology, but also the time it took for you to perform a particular action.
Marking scheme:
35 marks for the second solution for Task 3, as well as a clear and complete
assessment of the two technologies you used to develop the task, which should
include functionality (i.e., functions, facilities, features) similarities, differences,
advantages and limitations, relating each of these to the task you have
performed using the technology, but also a comparison between the two where they
are contrasted.
Week3
Task5 This task is divided into two sub-tasks, described as follows:
Task 5.1: Using a Column Completeness approach, apply the formula below to (i)
make a data quality assessment of the level of completeness of the ‘Gap (s)’
column considering only Tuesdays between 7:00 and 19:00, and to (ii) make an
assessment of the level of completeness of the ‘Headway (s)’ column considering
only Tuesdays between 7:00 and 19:00.
Column_Completeness = (number_of_non-empty_cells x 100) / number_of_cells
Task 5.2: Fill the missing values of columns ‘Gap (s)’ and ‘Headway (s)’ for all
records associated with the NB_MID lane (North direction), considering any
Tuesday between 7:00 and 19:00 for which values for one or both of these columns
are missing. To fill the missing values, you should use the median, calculated for
the particular hour of the day when the missing value occurs, as replacement value.
For example, if missing values are found on Tuesday 06/02/2018 – 10:00 and
Tuesday 20/02/2018 – 15:00, then you should calculate the median of gap (or
headway) considering all Tuesdays at 10:00 and all Tuesdays at 15:00 to obtain two
values, median_at_10:00_allTuesdays and median_at_15:00_allTuesdays. These
are to be used as replacement values. To calculate these values you can do the
following:
● sort the values (gap or headway) inside each time interval, e.g. from 10:00
to 11:00; and
● get the value in the middle. If there are two values in the middle, then take
the average of both.
Output:
Task 5.1: Two numerical values, one for each of the two columns.
Task 5.2: X numerical values (X depends on the number of missing values and
associated day times) representing medians of the Gap and Headway columns for
all Tuesdays between 7:00 and 19:00. A screenshot of the updated dataset should
be included as well. Provide a step-by-step description of the development of the
task, emphasising the features of each of the tools/languages that you used. In
other words, make sure you include (1) screenshots of intermediate steps you had
to carry out while using the technology you chose to use to develop the task
(especially where no coding was necessary), (2) the final recipe/result, (3) an
explanation of each step, and (4) an interpretation of the results. For this task, you
should develop one solution, using the technology (tool or PL) of your choice.
Marking scheme:
Task 5.1:
7
1 mark for correct column completeness assessments (0.5 mark for each
assessment).
Task 5.2:
9 marks for correct median results and screenshot (6 marks for results and 3 marks
for the screenshot).
20 marks for a clear, correct and complete step-by-step description, which should
include not only the complete code you wrote to prepare the data from its original
form to the point the analysis was made, but also an explanation of each step in text
and your interpretation of the obtained results/analysis