COMP6214 Open Data Innovation Coursework 1
Assignment: Open Data
This coursework has four parts:
i. the first requires you to clean the provided dataset,
ii. the second requires you to model the dataset in an Open Data format (RDF) and
populate the model using the data from the datasets
iii. the third requires you to create a visualisation using a Linked Data Visualization
tool we will look at in the module and host that visualisation
iv. the fourth requires you to describe and submit a report of your work
You must use the provided dataset for both cleaning and generating your visualisation. The
dataset should not be considered “authentic” data. It has been heavily modified in order to
evaluate your ability to clean, manipulate and visualise such data. The data is provided in CSV
and can be found via the course website.
Part 1: Clean the dataset
You will be required to clean the dataset and perform simple manipulations such as formatting,
fixing errors etc. to prepare it for creating your visualisation. You will be assessed on your ability
to identify and handle a number of different types of errors in the dataset. These errors should be
accounted for through pre-processing (using tools such as Open Refine or using your own scripts
or code). You must provide a written description of your data cleaning and manipulation methods
(for details of what to include, see Part 3, below, under “Reporting on your visualisation”.
There are several errors and error types in the dataset, and you should look for at least 6 errors.
It is not necessary to find and fix all of the errors in the CSV file to be awarded the full marks,
provided you have spotted, reported and outlined solutions for at least 6 and have provided
Part 2: Model your dataset and represent them in RDF
Any RDF serialisation type is adequate [RDF/XML, JSON-LD, TURTLE, etc.]. Populate your
RDF model with the dataset (examples are given in the class, and will be uploaded on the
Part 3: Create and Host your visualisation
Creating your visualisation
You must build a visualisation of the dataset. You must use one of the Open Data visualisation
library/tools we will cover in class for the task. Your visualisation should have suitable interactivity
that allows for manipulation, filtering, and detailed analysis of the data.
You should aim to develop a multidimensional (greater than 2 dimensions) visualisation that
enables rich exploration of the data. Note that “multidimensional” refers to the dimensions of the
data, not the visualisation, i.e. expected to use the values from at least 3 columns from the
provided dataset to create your visualisation (from one or more worksheets). The visualisation
should be appropriate to the dataset and appropriate for the target audience or use case of your
Hosting your visualisation
You must create a simple website or web page to host your visualisation. Most of the marks relate
to the data cleaning and the quality of the visualisation itself, so there is no need to produce a
complex website. You can use publicly available templates when creating your website/webpage
provided you reference the source.
Part 4: Communication of Your Work
Reporting on the Open Data Cleaning and Modelling
• A description of your cleaning and manipulation of the dataset used for your visualisation:
o The tool(s) used for data cleaning
o A list of the error or error types you found in the dataset
o For each error type: solutions or transformations you have applied to clean the
• Description of your modelling:
o description of how you modelled your data,
o ontologies you chose and why you chose them
Reporting on the Open Data Visualisation
• Information describing your visualisation:
o An overview of the audience and use case for each visualisation and why your
visualisation is appropriate both to this audience and the data.
o A description of the interaction and functionality the visualisation provides and why
this interaction is appropriate both to the audience and the data. For list what value,
and/or benefits your visualisation offers to users
o Any details about your visualisation that you’ve included to enhance it for your
target audience, this may include how you have highlighted interesting trends to
your identified audience or enriching the source data with another data set.
NOTE1: In any of your writing for this assignment, all sources must be cited. This includes (1) any
code or templates you have used that you have not created yourself, and (2) any
sources/website/journal articles that you have used to justify certain aspects of your visualisation.
NOTE2: You should report your cleaning and modelling of the data on the website on which
you are hosting your visualisation.
Submit one zip file (.zip) to the C-BASS handin system (http://handin.ecs.soton.ac.uk), by the
submission deadline stated above for Assignment 1.
At a minimum your zip file should contain:
1. Your cleaned csv files
2. Description of how you modelled the dataset, ontologies used, and file(s) containing the
Open Data (RDF) format of the cleaned csv files.
3. The source code for your Linked Data visualisation and website, including any
4. A README text file, containing instructions for how to (a) run your code or open your
website and (2) the URL of your website (only if you have chosen to additionally host your
Your zip file will be submitted electronically via handin.ecs.soton.ac.uk. We recommend you
ensure your website is in the correct file format (e.g. index.html, folders for CSS, js etc) such that
it can be locally hosted and runs without errors. You may also choose to host your solution
somewhere online. No extra marks are available for hosting online, but this can be a failsafe if
your zip file doesn’t extract properly.
The standard ECS late penalties apply, as detailed in the regulations (para. 4.1 of
They are 10% per working day that a piece of work is overdue, up to a maximum of 5 days, after
which the mark becomes zero.
Relevant Learning Outcomes
1) Identify innovation opportunities for open data.
2) Be able to apply appropriate validation, cleaning and transformation to use, reuse and
combine a multitude of complex datasets.
3) Be able to model data sets in open data format (RDF) and populate these models with
data from the datasets
4) Critically evaluate a large range of infographics and interaction techniques suitable for
Criterion Description Outcomes Mark
The student has identified a number of errors or
different types of errors in the dataset. The
student has applied suitable techniques to fix
errors and manipulate the dataset ready to be
The student has modelled the concepts (and their
attributes) of the data sets and has populated
these models with real data (from data sets)
The implementation is functional and runs without
errors. The visualisation is hosted on a webpage
which opens without errors. Good use is made of
an appropriate library for presenting dynamically
1, 2, 4 10
The visualisation presents multi-dimensional data
that is interactive; i.e. it allows features such as
filtering, selection, zooming, ad multi-view
capability to explore the dataset.
The choice of visualisation is appropriate to the
data and audience. The visualisation is innovative
and useful; it provides value to the intended
audience beyond that of the raw data or simple
2, 4 7
The visualisation (and the website it is hosted
upon) is aesthetically appealing, intuitive, easy to
navigate, has a good user experience. The
purpose function and instructions for use of the
visualisation are well communicated.
Written work is free from grammatical errors,
offers a high level of readability, clarity of
expression and communication and good
• Information Visualization: Perception for Design, Colin Ware, Morgan Kaufmann, 2004
• Visualising Data, Ben Fry, O’Reilly Media, 2007
本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！
E-mail: [email protected] 微信:itcsdx