本次Java代写是用Hadoop完成一个大数据网络

CS 435 Introduction to Big Data

Programming Assignment 2

1. Introduction

Small-world networks are a class of networks that are “highly clustered, like regular lattices, yet have small

characteristic path lengths, like random graphs.

1”. This property of networks is closely related to the efficient

information transfer between and within regional clusters. Social networks are intuitive examples of the

Small-world networks. Cliques or clusters of friends are often interconnected while each member in a cluster is

only several friend relationships away from anyone else.

In this assignment, you should profile the Google+ ego networks using the “friend” network (gfn) that you have

created in the assignment 1. For students who could not extract the friend network correctly, we provide the

pre-processed friend network as a part of assignment 2. To analyze the Small-world networks property, you

should calculate the following measurements using Hadoop MapReduce:

(1) Requirement 1: Counting the numbers of edges and vertices of the Google+ friend networks

(2) Requirement 2: The average geodesic path length

(3) Requirement 3: The global cluster coefficient

Small-world networks tend to contain cliques, and near-cliques. This can be quantified with the clustering

coefficient. Also, most pairs of nodes will be connected by at least one geodesic path (shortest path). This

follows from the defining property that the mean-shortest path length be small. Several other properties are

often associated with small-world networks. Often there is a super hub in the network with a high number of

connections. These hubs serve as the common connections mediating the short pathlengths between other

edges.

In this assignment, we will focus on the two properties, the average characteristic path length (L-gfn) and the

cluster coefficient (C-gfn). You must measure the average geodesic path and the cluster coefficient of the

Google+ friends network using MapReduce.

3.1. The Average Geodesic Path Length (Requirement 1 and 2)

Small-world networks have short pathlengths as is commonly observed in random networks. A network (or graph)

consists of nodes (i.e., vertices) connected by edges. A path in a network is a sequence of alternating nodes and

edges that starts with a node and ends with a node. Nodes or edges can appear multiple times in the same

path, and the number of edges in a path is the length of the path. If a graph is connected, then any node can be

reached via a finite-length path starting from any other node. The shortest path between a pair of nodes is called

a geodesic path and there can be more than one such path. In a friend graph, the distance from a person to her

friends is 1, the distance from her to a friend of her friend who is not also her friend is 2, and so on.

In Figure 1, we are given a network encompassing the set of nodes, V = {v1

, v2

, v3

, v4

, v5

, v6

, v7

}. The geodesic path

length between v7 and v4

is 1 and the path is (v4

, v7

). Similarly, the geodesic path length between v2 and v4

is 3

and the path is (v2

, v1

, v7

, v4

). In this case, there is another geodesic path: (v2

, v1

, v5

, v4

).

Small values of shortest path length ensure that information or resources easily

spreads throughout the network. This property makes distributed information

processing possible on technological networks and supports the six degrees of

separation often reported in social networks.

You must calculate all of the geodesic paths between pairs of vertices in the

friend network and return the average geodesic distance using MapReduce.

To calculate the average length, you should use the following formula:

L =

2

v(v−1) ∑

if n, i≠j

di,j

, where v is the number of nodes in the Google+ friend network and d-i,j is the pathlength between the node i

and j. If there is a pair of nodes without existing path, please ignore those path lengths. This computation may

involve multiple MapReduce jobs. Consider only unique pairs of nodes. Since this graph is non-directed graph,

the pair (i, j) and (j, i) are considered as the same pair. Also, if there are multiple shortest paths between the

nodes, consider only one shortest path for measuring the average geodesic pathlengths.

To calculate the total shortest path lengths of this graph ( ∑ ) , you can count unique path with each path

if n, i≠j

di,j

length and add them up. For example,

∑

if n, i≠j

di,j = ∑

for di,j=1

di,j + ∑

for di,j=2

di,j + ∑

for di,j=3

di,j + … + ∑

for d =max_length i,j

di,j

Your software must include all of the paths with the pathlength of 1 to 10 (max_length). Please note that these

paths must include the “shortest” paths between given pair of nodes. If there are three paths, a-b, a-c-b, and

a-f-d-s-g-b, your calculation must consider only a-b. Also, do not count the duplicate path. The path a-b and b-a

should be considered as the same path.

Your submission must include the measurements as depicted in the following table.

Category ∑

for d =1 i,j

di,j ∑

for d =2 i,j

di,j ∑

for d =3 i,j

di,j … ∑

for di,j=10

di,j

Total shortest

pathlength

Total number

of vertices

The Average Geodesic Path Length

Measurement

Table 1. The submission requirement 1

3.2. Cluster Coefficient: Large degree of Transitivity (Requirement 3)

The overall clustering in a network can be determined by averaging the clustering across all individual nodes.

High clustering supports specialization as local collections of strongly interconnected nodes readily share

information or resources. Conceptually, clustering is quite straightforward to comprehend. In a real-world

analogy, clustering represents the probability that one’s friends are also friends of each other.

A cluster coefficient is a measure of the degree to which nodes in a graph tend to cluster together. A triplet is

three nodes that are connected by either two (open triplet) or three (closed triplet) undirected edges. Therefore,

a triangle (e.g. v1

, v5

, v6

in the Figure 1) involves three different triplets, ((v1

, v5

),(v5

,v6

)), ((v1

, v5

),(v1

,v6

)) and ((v1

,

v6

),(v5

,v6

)). In this assignment, you should use the following measure proposed by Luce and Perry

4

. A

global clustering coefficient is defined as the following:

C =

3×number of triangles

number of connected triples

, where a triangle consists of 3 nodes that are completely connected to each other (i.e., a 3-clique) and a

connected triple consists of three nodes { i, j, k} such that node i is connected to node j and node j is connected

to node k. The factor of 3 arises because each triangle gets counted 3 times in a connected triple. The clustering

coefficient C indicates how many triples are in fact triangles. A complete graph, in which every pair of nodes is

connected by an edge, with N ≥ 3 nodes yields the maximum possible value of C = 1, as all triples are also

triangles. The minimum value of the clustering coefficient is C = 0.

You must calculate the global cluster coefficient of the Google+ friend networks using MapReduce. Your

submission must contain the total number of connected triples and number of triangles. This computation may

involve multiple MapReduce jobs.

**程序代写代做C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB**

本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！

**E-mail:** [email protected] **微信:**itcsdx

如果您使用手机请先保存二维码，微信识别。如果用电脑，直接掏出手机果断扫描。