Python代写 | CAP 6629: Reinforcement Learning Course project 1

本次Python代写是使用强化学习来实现多臂Bandit问题

CAP 6629: Reinforcement Learning
Course project 1

Part 1: Read chapter 2 and use any programming language to implement a multi-arm Bandit problem.
You may follow the algorithm pseudo code (page 8 of lecture note). The reward distributions are
provided on page 10 and you need to estimate the mean value of each action yourself. Please show your
average reward curves of different \epsilon values (similar figures as we studied in the class).
Part 2: Apply the algorithm in part 1 to a dataset below. The full reward distributions are provided here:
Suppose an advertising company is running 10 different ads targeted towards a similar set of population
on a webpage. Each column index represents a different ad. We have a 1 if the ad was clicked by a user,
and 0 if it was not. A sample from the original dataset is shown below:
Please provide the maximum reward you can achieve with this dataset.


程序代写代做C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB


本网站支持淘宝 支付宝 微信支付  paypal等等交易。如果不放心可以用淘宝交易!

E-mail: [email protected]  微信:itcsdx


如果您使用手机请先保存二维码,微信识别。如果用电脑,直接掏出手机果断扫描。

blank

发表评论