Python代写 | COMP60711 – DATA ENGINEERING Coursework Description
本次Python作业是根据给定的数据集完成各项任务
COMP60711 – DATA ENGINEERING
 Coursework Description
Task3 Choose a day of the week, e.g., Tuesday, and use bar plots to visualise the average
 traffic volume for each hour of the day. To obtain an accurate average traffic volume
 for a given week day, for example Tuesday, consider all Tuesday records in the file
 and consider all lanes associated with the North direction, and later (and separately)
 considering all lanes associated with the South direction. You should generate a
 separate bar plot for each traffic direction (North and South).
 Output: Two bar plots, one for the North direction and one for South, for a weekday
 of your choice (Tue or Fri). Each bar plot should show the average traffic volume for
 each hour of the day. Provide a step-by-step description of the development of the
 task, emphasising the features of each of the tools/languages that you used. In
 other words, make sure you include (1) screenshots of intermediate steps you had
 to carry out while using the technology you chose to use to develop the task
 (especially where no coding was necessary), (2) the final recipe/result, (3) an
 explanation of each step, and (4) an interpretation of the results.
 Marking scheme:
 8 marks for correct barplots (5 marks for each).
 27 marks for a clear, correct and complete step-by-step description, which should
 include not only the complete code you wrote to prepare the data from its original
 form to the point the analysis was made, but also an explanation of each step in text
 and your interpretation of the obtained results/analysis.
 Task4 Develop the same solution for Task 3 described above using a second technology
 of your choice (a tool of PL), which should be different to the technology you used to
 develop Task 3 for the first time. You can copy and paste here and/or make simple
 adaptations to the comments and step-by-step description of the solution you
 provided for Task 3.
 6
 Considering (i) Task 3 of this week, described above, which by now you should
 have done and (ii) the two technologies you have used to develop this task,
 compare the two technologies, discussing advantages and disadvantages of each
 for each task, taking into account not only any extra work (manual or not) you had to
 do for absence of facilities, or limitations of existing facilities associated with the
 given technology, but also the time it took for you to perform a particular action.
 Marking scheme:
 35 marks for the second solution for Task 3, as well as a clear and complete
 assessment of the two technologies you used to develop the task, which should
 include functionality (i.e., functions, facilities, features) similarities, differences,
 advantages and limitations, relating each of these to the task you have
 performed using the technology, but also a comparison between the two where they
 are contrasted.
 Week3
 Task5 This task is divided into two sub-tasks, described as follows:
 Task 5.1: Using a Column Completeness approach, apply the formula below to (i)
 make a data quality assessment of the level of completeness of the ‘Gap (s)’
 column considering only Tuesdays between 7:00 and 19:00, and to (ii) make an
 assessment of the level of completeness of the ‘Headway (s)’ column considering
 only Tuesdays between 7:00 and 19:00.
 Column_Completeness = (number_of_non-empty_cells x 100) / number_of_cells
 Task 5.2: Fill the missing values of columns ‘Gap (s)’ and ‘Headway (s)’ for all
 records associated with the NB_MID lane (North direction), considering any
 Tuesday between 7:00 and 19:00 for which values for one or both of these columns
 are missing. To fill the missing values, you should use the median, calculated for
 the particular hour of the day when the missing value occurs, as replacement value.
 For example, if missing values are found on Tuesday 06/02/2018 – 10:00 and
 Tuesday 20/02/2018 – 15:00, then you should calculate the median of gap (or
 headway) considering all Tuesdays at 10:00 and all Tuesdays at 15:00 to obtain two
 values, median_at_10:00_allTuesdays and median_at_15:00_allTuesdays. These
 are to be used as replacement values. To calculate these values you can do the
 following:
 ● sort the values (gap or headway) inside each time interval, e.g. from 10:00
 to 11:00; and
 ● get the value in the middle. If there are two values in the middle, then take
 the average of both.
 Output:
 Task 5.1: Two numerical values, one for each of the two columns.
 Task 5.2: X numerical values (X depends on the number of missing values and
 associated day times) representing medians of the Gap and Headway columns for
 all Tuesdays between 7:00 and 19:00. A screenshot of the updated dataset should
 be included as well. Provide a step-by-step description of the development of the
 task, emphasising the features of each of the tools/languages that you used. In
 other words, make sure you include (1) screenshots of intermediate steps you had
 to carry out while using the technology you chose to use to develop the task
 (especially where no coding was necessary), (2) the final recipe/result, (3) an
 explanation of each step, and (4) an interpretation of the results. For this task, you
 should develop one solution, using the technology (tool or PL) of your choice.
 Marking scheme:
 Task 5.1:
 7
 1 mark for correct column completeness assessments (0.5 mark for each
 assessment).
 Task 5.2:
 9 marks for correct median results and screenshot (6 marks for results and 3 marks
 for the screenshot).
 20 marks for a clear, correct and complete step-by-step description, which should
 include not only the complete code you wrote to prepare the data from its original
 form to the point the analysis was made, but also an explanation of each step in text
 and your interpretation of the obtained results/analysis