Java代写 | CS 435 Introduction to Big Data Programming Assignment 1


CS 435 Introduction to Big Data
Programming Assignment 1

1. Introduction
Social networks allow users to interact with their friends and acquaintances via posts. Current social networking
sites organize their networks and posts to categorize their friends into social circles. For example, ‘circles’ on
Google+ and ‘lists’ on Facebook and Tweeter have similar functionality. To capture “initiative” actions between
users, a social network is often represented as an Ego network. “Ego” is an individual “focal” node. A network
has as many egos as it has nodes. Egos can be persons, groups, organizations, or whole societies.
Circles are user-specific as each user organizes their personal network or friends
independently of all other users. In Figure 1, we are given a single user u and we
form a network encompassing the set V = {v1, v2, v3, v4, v5, v6 } of that user’s friends.
We refer to the user u as the ego and the set of nodes Vi as alters. Ego networks
consist of a focal node (“ego”) and the nodes to whom ego is directly connected
to (these are called “alters”) plus the ties, if any, among the alters.
This assignment uses Google+ ego-networks. The Google+ dataset consists of
‘circles’ and was collected from users who had manually shared their circles using
the ‘share circle’ feature. The Google+ circles are quite distinctive compared to
Facebook or Twitter. Their creators of circles had chosen who they would release
their posts to. As a result, the Google+ ego network is a directed network. As an
interesting example, one circle contains candidates from the 2012 primaries, who
presumably do not follow their followers, nor each other.

2. Programming Requirements
2.1 Programming Language
You are required to use Java (Version 1.8 or higher) for this assignment.
2.2 Installing and configuring Hadoop
For this assignment, you will be working on your own Hadoop Cluster, which should have finished this as a
part of PA0. The walkthrough guidelines are available at: [Link]
2.3 Dataset
The dataset is organized as tuples of [User A, User B], where the user A invited the user B to see his/her
posts. The file looks like:

3. Counting the number of edges
In this assignment, you should count the number of total edges included in this graph using MapReduce. Your
software must return one total count for this problem.
4. In-degree and out-degree of distinct vertices
You should measure the in-degree and out-degree of each distinct vertex using MapReduce. Your software must
return two sorted lists of distinct vertices with associated counts in descending order; one of the in-degree and
another for the out-degree. For the submission, you must submit the first 100 vertices in this list. For vertices
that have the same degree, your list of vertices must be sorted in alphabetical order (ascending).
5. Extracting the “Friends” network
In this assignment, we define a “Friends” relationship as a pair of vertices that have edges in both directions.
Since this dataset represents only directional relations, if there is a pair of nodes A
and B and two different edges (A, B) and (B, A) exist, we assert that there is a
“Friend” relationship between nodes A and B. In this context, if two users invited
each other to their circles, these users will be considered as “Friends”. In Figure 2,
user 7 has 6 alters, but has only one friend, User 1. Similarly, user 4 and user 3 are
friends in this network.
You should create a network with only “Friends” from the original Google+ network.
If there are nodes that do not have any “Friend”, please do not include those nodes
in your results.


本网站支持淘宝 支付宝 微信支付  paypal等等交易。如果不放心可以用淘宝交易!

E-mail: [email protected]  微信:itcsdx