# 机器学习代写 | Robot Learning SS 2020

• Explain Bayesian Model Inference

• Difference between value and reward

• Complete Max-Margin formulation how is the value calculated and what to do against Reward
ambiguity and teachers suboptimality.

• Difference between Q-learning and TD(0)

• apply the synchronous backup rule for iterative policy evaluation

• What needs to be defined for a POMPD.

• Assume a POMPD with 2 underlying states and 3 possible observations. What is the
probability for the states after an observation?

• Assume a POMPD with 3 underlying states and 3 possible actions. What is the probability for
State 3 after an action? There where 2 belief given (state 1 and state 2) 18 probabilities for
being in a state under the condition of the state bevor and the action what is chosen. (belief for
state 3 and 9 more conditional probabilities had to been calculated)

• Show that the temporal difference method TD(1) is equivalent to Monte Carlo sampling.

• Explain a linear state feedback controller

• How does the function calculation works with a Function Approximation like Tile coding?

• How is an eligibility trace calculated

• Backup Diagramm vor SARSA

• How could soft-max action selection change between greedy and soft?

• Write down the Bellman equation with and without Expectation operator

• Which policy should an agent follow, if it is given the optimal value?

• Explain QMDP

• explain exploitation und exploration. How is it used in RL E-mail: itcsdx@outlook.com  微信:itcsdx 