1. Overview and Pedagogical Goal
The goal of this assignment is to familiarize you with the complete process of extracting, refining and
delivering insights of particular financial value that are extracted from unstructured data of non-
conventional size from company reports. This is an individual assignment where you are supposed to work
alone in order to extract insights from company financial statements filed at the Electronic Data Gathering,
Analysis, and Retrieval system used at the U.S. Securities and Exchange Commission (SEC). The
assignment maps to level 7 qualification level and aims to establish your ability to handle the development
of in-depth and original solutions to a domain specific problem of a high business value.
The tasks is structured in three (3) parts. The first part (Part A) covers your ability to construct and
demonstrate the handling of text data. It aims to familiarize you with the principles of text mining, the bag-
of-words model and the development of metrics that can be used to analyse structural elements of text, such
as normalizing and cleaning textual corpora. The core of this assignment involves the translation of these
insights to actionable features that can be used to predict an outcome variable of financial interest: the stock
price value. Therefore the second and third parts (Part B and Part C) are concerned with the identification
of features and in particular (a) polarity – whether the text under consideration is positive or negative, (b)
sentiment – the extraction of affective states from the text and (c) the evaluation and extraction of important
topics that are covered and elaborated in the quarterly and annual financial reports (10-Q, 10-K) and the
predictability of these insights on a company, sector and market level (Part C).
The report should be written from the perspective of an analyst involving text mining methods in
constructing a well written piece of work. This should be both academic as well as practical and consider
possible application scenarios where text mining can be used (e.g., Risk analysis etc).
2. Marking Criteria and Weights
The marking criteria for all parts of the assignment are as follows:
• Part A: 30% – Completeness of the solution, efficiency of the code, interpretation of the results.
• Part B: 25% – Completeness of the solution, efficiency of the code, interpretation of the results.
• Part C: 25% – Completeness of the solution, efficiency of the code, interpretation of the results.
20% is reserved for the whole academic content in the report distributed equally among the motivation for
the selection of the companies, the interpretation of the outcomes of this analysis and the convincing line
of argument in providing the results.
本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！
E-mail: email@example.com 微信:itcsdx