编程代写|Project #3: Your Choice! — MPCS 52060 – Parallel Programming documentation

这是一篇来自美国的关于在以下领域(数据科学、机器学习、计算机图形等)中实现一个并行系统的编程代写

 

Assignment

The !nal project gives you the opportunity to show me what you learned in this course and to build your own parallel system. In particular, you should think about implementing a parallel system in the domain you are most comfortable in (data science, machine learning, computer graphics, etc.). The system should solve a problem that can bene!t from some form of parallelization. Similar to how the performance of an image processing system bene!ts from parallel data decomposition of an image. If you are having trouble coming up with a problem for your system to solve then consider the following:

Embarrassingly Parallel Topics (https://en.wikipedia.org/wiki/Embarrassingly_parallel)

Parallel Algorithms (https://en.wikipedia.org/wiki/Parallel_computing#Algorithmic_methods)

Part 1: Minimum Requirements (85 points)

You are free to implement any parallel algorithm you like. However, you are required to at least have the following features in your parallel system:

An input/output component that allows the program to read in data or receive data in some way. The system will perform some computation(s) on this input and produce an output result.

A sequential implementation of the system. Make sure to provide a usage statement.

Parallel System Implementation: Implement the work-stealing and work-balancing algorithm using a unboundeddequeue implemented as linked-list (i.e., a chain of nodes similar to project #1). You must use the de!nitions de-

!ned in proj3/concurrent . Speci!cally, you must implement these functions that return an ExecutorService

// NewWorkStealingExecutor returns an ExecutorService that is implemented using the work-stealing algorithm.

//@param capacity – The number of goroutines in the pool

//@param threshold – The number of items that a goroutine in the pool can

// grab from the executor in one time period. For example, if threshold = 10

// this means that a goroutine can grab 10 items from the executor all at

// once to place into their local queue before grabbing more items. It’s

// not required that you use this parameter in your implementation.

func NewWorkStealingExecutor(capacity, threshold int) ExecutorService {

….

}

// NewWorkBalancingExecutor returns an ExecutorService that is implemented using the work-balancing algorithm.

//@param capacity – The number of goroutines in the pool

//@param threshold – The number of items that a goroutine in the pool can

//grab from the executor in one time period. For example, if threshold = 10

// this means that a goroutine can grab 10 items from the executor all at

// once to place into their local queue before grabbing more items. It’s

// not required that you use this parameter in your implementation.

//@param thresholdBalance – The threshold used to know when to perform

// balancing. Remember, if two local queues are to be balanced the

// difference in the sizes of the queue must be greater than or equal to

// thresholdBalance. You must use this parameter in your implementation.

func NewWorkBalancingExecutor(capacity, thresholdQueue, thresholdBalance int) ExecutorService

….

}

// NewUnBoundedDEQueue returns an empty UnBoundedDEQueue

func NewUnBoundedDEQueue() DEQueue {

}

You are not allowed to modify/add/delete anything from the interfaces/types provided in the “proj3/concurrent“ package. You cannot modify the type signatures in the above functions. Golang does not have the keyword volatile; therefore, you will need to either use sync.Mutex or sync.atomics or any other sync package object.

I would recommend using sync.Mutex . You will need to adapt the array based version shown in class to a linkedlist version. A few additional notes:

  1. To handle the ABA problem, make sure to de!ne an internal node structure that holds the actual item being held in the dequeue. Every time an item is inserted then a new node is created for that item. The ABA problem only occurs when reusing memory addresses so creating unique ones will stop this from happening.

Thus, you will not need a stamp mechanism in this implementation. However, it does come at the cost of cache performance.

  1. The internal implementations of both work-stealing and work-balancing must be generic (i.e., using interface {} ). You cannot use Golang generics in this project. This means in a few places you will need to use Type assertions (https://go.dev/tour/methods/15) .
  1. Inside the proj3/covid/covid.go , we provide some snippets of code that implements the Covid and Pi programs using the ExecutorService . This code is from a working implementation that uses ExecutorService . The programs show how to use Future , Callable and Runnable within your application portion of the assignment.
  1. You should be able to reuse code from both of these implementations. Make sure to think about this while implementing both solutions.
  1. Place your working-stealing implementation in the proj3/conurrent/stealing.go !le.
  2. Place your working-balancing implementation in the proj3/conurrent/balancing.go !le.
  3. Place your unbounded-dequeue implementation in the proj3/conurrent/unbounded.go !le. Remember

this can be a coarse-grain implementation if you wish since Go does not provide volatile.

Provide a detailed write-up and analysis of your system. For this assignment, this write-up is required to have more detail to explain your parallel implementations since we are not giving you a problem to solve. See the System

Write-up section for more details.

Provide all the dataset !les you used in your analysis portion of your write up. If these !les are to big then you need to provide us a link so we can easily download them from an external source.

These points also include design points. You should think about the modularity of the system you are creating.

Think about splitting your code into appropriate packages, when necessary.

You must provide a script or speci!c commands that shows/produces the results of your system. We need to be able to enter in a single command in the terminal window and it will run and produce the results of your system.

Failing to provide a straight-forward way of executing your system that produces its result will result in signi!cant deductions to your score. We prefer running a simple command line script (e.g., shell-script or python3 script).

However, providing a few example cases of possible execution runs will be su#cient enough.

We should also be able to run speci!c versions of the system. There should be a option (e.g. via command line argument) to run the sequential version, or the various parallel versions. Please make sure to document this in your report or via the printing of a usage statement.

You are free to use any additional standard/third-party libraries as you wish. However, all the parallel work is required to be implemented by you.

There is a directory called proj3 with a single go.mod !le inside your repositories. Place all your work for project

3 inside this directory.

System Write-up

In prior assignments, we provided you with the input !les or data to run experiments against a your system and provide an analysis of those experiments. For this project, you will do the same with the exception that you will produce the data needed for your experiments. In all, you should do the following for the writeup:

Run experiments with data you generate for both the sequential and parallel versions. As with the data provided by prior assignments, the data should vary the granularity of your parallel system. For the parallel version, make sure you are running your experiments with at least producing work for N threads, where N = {2,4,6,8,12} . You can go lower/larger than those numbers based on the machine you are running your system on. You are not required to run project 3 on the Peanut cluster. You can run it on your local machine and base your N threads on the number of logical cores you have on your local machine. If you choose to run your system on your local machine then please state that in your report and the your machine speci!cations as well.

Produce speedup graph(s) for those data sets. You should have one speedup graph per parallel implementation you de!ne in your system. This means either one or two speedup graphs.

Please submit a report (pdf document, text !le, etc.) summarizing your results from the experiments and the conclusions you draw from them. Your report should include your plot(s) as speci!ed above and a self-contained report. That is, somebody should be able to read the report alone and understand what code you developed, what experiments you ran and how the data supports the conclusions you draw. The report must also include the following:

Describe in detailed of your system and the problem it is trying to solve.

A description of how you implemented your parallel solutions.

Describe the challenges you faced while implementing the system. What aspects of the system might make it di#-cult to parallelize? In other words, what to you hope to learn by doing this assignment?

Speci!cations of the testing machine you ran your experiments on (i.e. Core Architecture (Intel/AMD), Number of cores, operating system, memory amount, etc.)

What are the hotspots (i.e., places where you can parallelize the algorithm) and bottlenecks (i.e., places where there is sequential code that cannot be parallelized) in your sequential program? Were you able to parallelize the hotspots and/or remove the bottlenecks in the parallel version?

What limited your speedup? Is it a lack of parallelism? (dependencies) Communication or synchronization overhead?

As you try and answer these questions, we strongly prefer that you provide data and measurements to support your conclusions.

Compare and contrast the two parallel implementations. Are there di”erences in their speedups?

Part 2: Advance Feature(s): MapReduce (15 points)

The above work guarantees a B for the assignment. If you want a higher grade then you need to go the extra mile that adds more to your project. You must implement a MapReduce framework inside your project. This can be related to the application you are building in Part 1 or this can be a totally separate application that uses MapReduce to implement the parallel component. You do not need a sequential version if you implement a new application using MapReduce. Please add a separate section in your report the explicitly explaining the advance feature you implemented and where in your code they are implemented. One to two paragraphs is enough for your explanation.

Your implementation won’t be generic because it’s impossible to implement MapReduce using just interface

{} ; therefore, you will need tightly couple the MapReduce framework we went over in class to the application using MapReduce.

You cannot use KMeans and WordCount as your applications since they are already implemented in the textbook.

Make sure to talk about how your advance feature a”ects the overall performance of the system.

Don’t know What to Implement?

If you are unsure what to implement then by default you can reimplement the image processing assignment using the work-balancing and work-stealing. For the advance feature, you need to be creative and think about how you could adapt the image processing to use a MapReduce paradigm. It could be the case you implement a slightly di”erent version of the image processing project just for MapReduce.

You cannot reimplement project 1 or other assignments.