This coursework exercise asks you to write code to create an MDP-solver to work in the Pacman environment that we used for the practical exercises.
Read all these instructions before starting.
This exercise will be assessed.
2 Getting started
You should download the ﬁle pacman-cw.zip from KEATS. This contains a familiar set of ﬁles that implement Pacman, and version 6 of api.py which deﬁnes the observability of the environment that you will have to deal with, and the same non-deterministic motion model that the practicals used.
Version 6 of api.py, further extends what Pacman can know about the world. In addition to knowing the location of all the objects in the world (walls, food, capsules, ghosts), Pacman can now see what state the ghosts are in, and so can decide whether they have to be avoided or not.
3 What you need to do
3.1 Write code
This coursework requires you to write code to control Pacman and win games using an MDP-solver. For each move, you will need to have the model of Pacman’s world, which consists of all the elements of a Markov Decision Process, namely:
• A ﬁnite set of states S;
• A ﬁnite set of actions A;
• A state-transition function P(s0|s,a);
• A reward function R;
• A discount factor γ ∈ [0,1];
Following this you can then compute the action to take, either via Value Iteration, Policy Iteration or Modiﬁed Policy Iteration. It is expected that you will correctly implement such a solver and optimize the choice of the parameters. There is a (rather familiar) skeleton piece of code to take as your starting point in the ﬁle mdpAgents.py. This code deﬁnes the class MDPAgent.
There are two main aims for your code:
1 Mallmann-Trenn / McBurney / 6ccs3ain-cw
(a) Win hard in smallGrid
(b) Win hard in mediumClassic
To win games, Pacman has to be able to eat all the food. In this coursework, for these objectives, “winning” just means getting the environment to report a win. Score is irrelevant.
3.1.1 Getting Excellence points
There is a diﬀerence between winning a lot and winning well. This is why completing aim (a) and (b) from previous section allows you to collect up to 80 points in the Coursework. The remaining 20 points are obtained by having a high Excellence Score Diﬀerence in the mediumClassic layout, a metric that directly comes from having a high average winning score. This can be done through diﬀerent strategies, for example through chasing eatable ghosts.
A couple of things to be noted. Let W be the set of games won, i.e., |W| ∈ [0,25]. For any won game i ∈ W deﬁne sw(i) to be the score obtained in game/run i.
• ∆Se in the marksheet is the Excellence Score Diﬀerence. You can use the following formula to calculate it when you test your code and compare the result against the values in Table 3
本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！
E-mail: email@example.com 微信:itcsdx