Python代写 | Coursework

本次英国作业案例是一个Python MDP解算器相关的Python代写Assignment

1 Introduction

This coursework exercise asks you to write code to create an MDP-solver to work in the Pacman environment that we used for the practical exercises.

Read all these instructions before starting.

This exercise will be assessed.

2 Getting started

You should download the file from KEATS. This contains a familiar set of files that implement Pacman, and version 6 of which defines the observability of the environment that you will have to deal with, and the same non-deterministic motion model that the practicals used.

Version 6 of, further extends what Pacman can know about the world. In addition to knowing the location of all the objects in the world (walls, food, capsules, ghosts), Pacman can now see what state the ghosts are in, and so can decide whether they have to be avoided or not.

3 What you need to do

3.1 Write code

This coursework requires you to write code to control Pacman and win games using an MDP-solver. For each move, you will need to have the model of Pacman’s world, which consists of all the elements of a Markov Decision Process, namely:

• A finite set of states S;

• A finite set of actions A;

• A state-transition function P(s0|s,a);

• A reward function R;

• A discount factor γ ∈ [0,1];

Following this you can then compute the action to take, either via Value Iteration, Policy Iteration or Modified Policy Iteration. It is expected that you will correctly implement such a solver and optimize the choice of the parameters. There is a (rather familiar) skeleton piece of code to take as your starting point in the file This code defines the class MDPAgent.

There are two main aims for your code:

1 Mallmann-Trenn / McBurney / 6ccs3ain-cw

(a) Win hard in smallGrid

(b) Win hard in mediumClassic

To win games, Pacman has to be able to eat all the food. In this coursework, for these objectives, “winning” just means getting the environment to report a win. Score is irrelevant.

3.1.1 Getting Excellence points

There is a difference between winning a lot and winning well. This is why completing aim (a) and (b) from previous section allows you to collect up to 80 points in the Coursework. The remaining 20 points are obtained by having a high Excellence Score Difference in the mediumClassic layout, a metric that directly comes from having a high average winning score. This can be done through different strategies, for example through chasing eatable ghosts.

A couple of things to be noted. Let W be the set of games won, i.e., |W| ∈ [0,25]. For any won game i ∈ W define sw(i) to be the score obtained in game/run i.

• ∆Se in the marksheet is the Excellence Score Difference. You can use the following formula to calculate it when you test your code and compare the result against the values in Table 3


本网站支持淘宝 支付宝 微信支付  paypal等等交易。如果不放心可以用淘宝交易!

E-mail:  微信:itcsdx