APP代写 | 2701ICT Assignment App Design and Prototype

这是一篇美国的Python强化学习python代写

 

Proximal Policy Optimization

In this project, you will be using an open source RL library stable-baselines3, and with it learn a policy for the same arm goal reaching task from the previous project.

Project Setup

For this project, we will again use Anaconda as our python virtual environment manager.

Check out Project 5 from the SVN server:

Install the virtual environment:

cd project5
conda create –name project5 –file spec-file. txt

You can then activate and deactivate the virtual environment anywhere in the terminal with:

conda activate project5
conda deactivate

Important:

DO NOT install any other libraries/or dependencies or a different version of the already provided package. The autograder will give you a 0 if you import libraries that are not specifed in the spec-fle.txt.
If you are concerned you may have accidentally imported something that isn’t in the spec-fle.txt, delete your conda environment and re-create it, and then re-run your code to see if your code still runs without error. You can also test your code on the MechTech lab machines, which don’t have any additional libraries installed.

Starter Code Explanation

Make sure to watch the lecture from 4/26 for details on how to run this project.

In addition to code you are already familiar with from the previous project (i.e. arm dynamics, etc.) we are providing partially implemented environment in the ArmEnv class. The environment “wraps around” the arm dynamics to provide the key functions that an RL algorithm expects. Your implementation must follow the gyme API.

Instructions

You must complete implementing ArmEnv in arm_ env.py and train() in train_ .ppo.py. Details are below.

ArmEnv

Unlike the previous project, you will be implementing majority of the key functions. You are also expected to deliberate over various choices for setting up the environment. You get to decide the components of the observation space. You can also choose reward function you deem appropriate for the task.

train()

Here, you must fll in the train.(.. function that actually trains a policy using PPO. You can refer to stable- baselines3
documentation .

Documentation:

https://stable-baselines3.readthedocs. io/en/master/modules/ppp .html?highlight=ppo

gym api:

https://gym.openai.com/docs/

Grading

The script enjoy_ ppo.py can be used to test your code. This is how we will run your code for grading:

python3 enjoy_ ppo.py –model. path final.zip

While developing, you can also test a pre-saved model, like so:

python3 enjoy. ppo.py –model. path models/2022 -04 10. 12-04 17/models.zip

You can pass the –gui flag to enjoy. ppo.py and then you will also see what the policy is doing.

You MUST take the final model you want to be scored with, copy it into the project root folder, rename it to fnal.zip, and commit it to SVN. Failure to do so will result in getting 0 points on the project. Remember to test your grade by doing a clean SVN checkout: checkout your submission into a new directory, create and activate the virtual environment, and run the grader, without a single modifcation or addition to these steps.

The grader will run five episodes, each with a different goal. For each goal, we expect the end-effector to reach the desired location and then stay there. Is the distance the end-effector stabilizes at is below what we consider an easy to reach threshold, the script will award 1.5 points. If the distance is below a tighter threshold, it will award an additional 1.5 points. The max is thus 3 points for each goal, for a total of 15 for the project.