本次Python作业是根据给定的数据集完成各项任务

COMP60711 – DATA ENGINEERING

Coursework Description

Task3 Choose a day of the week, e.g., Tuesday, and use bar plots to visualise the average

traffic volume for each hour of the day. To obtain an accurate average traffic volume

for a given week day, for example Tuesday, consider all Tuesday records in the file

and consider all lanes associated with the North direction, and later (and separately)

considering all lanes associated with the South direction. You should generate a

separate bar plot for each traffic direction (North and South).

Output: Two bar plots, one for the North direction and one for South, for a weekday

of your choice (Tue or Fri). Each bar plot should show the average traffic volume for

each hour of the day. Provide a step-by-step description of the development of the

task, emphasising the features of each of the tools/languages that you used. In

other words, make sure you include (1) screenshots of intermediate steps you had

to carry out while using the technology you chose to use to develop the task

(especially where no coding was necessary), (2) the final recipe/result, (3) an

explanation of each step, and (4) an interpretation of the results.

Marking scheme:

8 marks for correct barplots (5 marks for each).

27 marks for a clear, correct and complete step-by-step description, which should

include not only the complete code you wrote to prepare the data from its original

form to the point the analysis was made, but also an explanation of each step in text

and your interpretation of the obtained results/analysis.

Task4 Develop the same solution for Task 3 described above using a second technology

of your choice (a tool of PL), which should be different to the technology you used to

develop Task 3 for the first time. You can copy and paste here and/or make simple

adaptations to the comments and step-by-step description of the solution you

provided for Task 3.

6

Considering (i) Task 3 of this week, described above, which by now you should

have done and (ii) the two technologies you have used to develop this task,

compare the two technologies, discussing advantages and disadvantages of each

for each task, taking into account not only any extra work (manual or not) you had to

do for absence of facilities, or limitations of existing facilities associated with the

given technology, but also the time it took for you to perform a particular action.

Marking scheme:

35 marks for the second solution for Task 3, as well as a clear and complete

assessment of the two technologies you used to develop the task, which should

include functionality (i.e., functions, facilities, features) similarities, differences,

advantages and limitations, relating each of these to the task you have

performed using the technology, but also a comparison between the two where they

are contrasted.

Week3

Task5 This task is divided into two sub-tasks, described as follows:

Task 5.1: Using a Column Completeness approach, apply the formula below to (i)

make a data quality assessment of the level of completeness of the ‘Gap (s)’

column considering only Tuesdays between 7:00 and 19:00, and to (ii) make an

assessment of the level of completeness of the ‘Headway (s)’ column considering

only Tuesdays between 7:00 and 19:00.

Column_Completeness = (number_of_non-empty_cells x 100) / number_of_cells

Task 5.2: Fill the missing values of columns ‘Gap (s)’ and ‘Headway (s)’ for all

records associated with the NB_MID lane (North direction), considering any

Tuesday between 7:00 and 19:00 for which values for one or both of these columns

are missing. To fill the missing values, you should use the median, calculated for

the particular hour of the day when the missing value occurs, as replacement value.

For example, if missing values are found on Tuesday 06/02/2018 – 10:00 and

Tuesday 20/02/2018 – 15:00, then you should calculate the median of gap (or

headway) considering all Tuesdays at 10:00 and all Tuesdays at 15:00 to obtain two

values, median_at_10:00_allTuesdays and median_at_15:00_allTuesdays. These

are to be used as replacement values. To calculate these values you can do the

following:

● sort the values (gap or headway) inside each time interval, e.g. from 10:00

to 11:00; and

● get the value in the middle. If there are two values in the middle, then take

the average of both.

Output:

Task 5.1: Two numerical values, one for each of the two columns.

Task 5.2: X numerical values (X depends on the number of missing values and

associated day times) representing medians of the Gap and Headway columns for

all Tuesdays between 7:00 and 19:00. A screenshot of the updated dataset should

be included as well. Provide a step-by-step description of the development of the

task, emphasising the features of each of the tools/languages that you used. In

other words, make sure you include (1) screenshots of intermediate steps you had

to carry out while using the technology you chose to use to develop the task

(especially where no coding was necessary), (2) the final recipe/result, (3) an

explanation of each step, and (4) an interpretation of the results. For this task, you

should develop one solution, using the technology (tool or PL) of your choice.

Marking scheme:

Task 5.1:

7

1 mark for correct column completeness assessments (0.5 mark for each

assessment).

Task 5.2:

9 marks for correct median results and screenshot (6 marks for results and 3 marks

for the screenshot).

20 marks for a clear, correct and complete step-by-step description, which should

include not only the complete code you wrote to prepare the data from its original

form to the point the analysis was made, but also an explanation of each step in text

and your interpretation of the obtained results/analysis

**程序代写代做C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB**

本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！

**E-mail:** [email protected] **微信:**itcsdx

如果您使用手机请先保存二维码，微信识别。如果用电脑，直接掏出手机果断扫描。