CAP 6629: Reinforcement Learning
Course project 1
Part 1: Read chapter 2 and use any programming language to implement a multi-arm Bandit problem.
You may follow the algorithm pseudo code (page 8 of lecture note). The reward distributions are
provided on page 10 and you need to estimate the mean value of each action yourself. Please show your
average reward curves of different \epsilon values (similar figures as we studied in the class).
Part 2: Apply the algorithm in part 1 to a dataset below. The full reward distributions are provided here:
Suppose an advertising company is running 10 different ads targeted towards a similar set of population
on a webpage. Each column index represents a different ad. We have a 1 if the ad was clicked by a user,
and 0 if it was not. A sample from the original dataset is shown below:
Please provide the maximum reward you can achieve with this dataset.
本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！
E-mail: [email protected] 微信:itcsdx