CAP 6629: Reinforcement Learning Spring 2021
Course project 3
Due: 04/09/2021 (Friday), 11:59PM
Submission: A single PDF with your code (use any programming language), results and analysis.
Please follow the project report guidelines and submit the report/code in a SINGLE PDF file
In project 2, you may realize that when you have a large grid world maze setup, it takes a long time for the agent to learn a value table. One way to eliminate this challenge is to use neural networks to approximate the value function as discussed in lecture 10. There are two options provided below and you may choose either one to implement.
a. Based on your results in project 2, you can choose to build a neural network to approximate your obtained Q table. In this way, you are using a neural network to generate your Q value so that you can guide the agent to move to achieve the goal.
b. You may choose to implement an actor-critic architecture (ADP) as discussed in lecture 10. In this way, you will need to build an action network and a critic network to learn the Q table from scratch. This may require more time, but I am happy to help in your project.
1. Choose either option you are going to implement and provide the pseudo code 2. Design your own grid world example
3. Show the Q value table trained from the neural network and compare with that obtained from project 2
本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！
E-mail: email@example.com 微信:itcsdx