Proximal Policy Optimization
In this project, you will be using an open source RL library stable-baselines3, and with it learn a policy for the same arm goal reaching task from the previous project.
For this project, we will again use Anaconda as our python virtual environment manager.
Check out Project 5 from the SVN server:
Install the virtual environment:
conda create –name project5 –file spec-file. txt
You can then activate and deactivate the virtual environment anywhere in the terminal with:
conda activate project5
DO NOT install any other libraries/or dependencies or a different version of the already provided package. The autograder will give you a 0 if you import libraries that are not specifed in the spec-fle.txt.
If you are concerned you may have accidentally imported something that isn’t in the spec-fle.txt, delete your conda environment and re-create it, and then re-run your code to see if your code still runs without error. You can also test your code on the MechTech lab machines, which don’t have any additional libraries installed.
Starter Code Explanation
Make sure to watch the lecture from 4/26 for details on how to run this project.
In addition to code you are already familiar with from the previous project (i.e. arm dynamics, etc.) we are providing partially implemented environment in the ArmEnv class. The environment “wraps around” the arm dynamics to provide the key functions that an RL algorithm expects. Your implementation must follow the gyme API.
You must complete implementing ArmEnv in arm_ env.py and train() in train_ .ppo.py. Details are below.
Unlike the previous project, you will be implementing majority of the key functions. You are also expected to deliberate over various choices for setting up the environment. You get to decide the components of the observation space. You can also choose reward function you deem appropriate for the task.
Here, you must fll in the train.(.. function that actually trains a policy using PPO. You can refer to stable- baselines3
https://stable-baselines3.readthedocs. io/en/master/modules/ppp .html?highlight=ppo
The script enjoy_ ppo.py can be used to test your code. This is how we will run your code for grading:
python3 enjoy_ ppo.py –model. path final.zip
While developing, you can also test a pre-saved model, like so:
python3 enjoy. ppo.py –model. path models/2022 -04 10. 12-04 17/models.zip
You can pass the –gui flag to enjoy. ppo.py and then you will also see what the policy is doing.
You MUST take the final model you want to be scored with, copy it into the project root folder, rename it to fnal.zip, and commit it to SVN. Failure to do so will result in getting 0 points on the project. Remember to test your grade by doing a clean SVN checkout: checkout your submission into a new directory, create and activate the virtual environment, and run the grader, without a single modifcation or addition to these steps.
The grader will run five episodes, each with a different goal. For each goal, we expect the end-effector to reach the desired location and then stay there. Is the distance the end-effector stabilizes at is below what we consider an easy to reach threshold, the script will award 1.5 points. If the distance is below a tighter threshold, it will award an additional 1.5 points. The max is thus 3 points for each goal, for a total of 15 for the project.
本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！
E-mail: email@example.com 微信:itcsdx