Python Reinforcement Learning
上QQ阅读APP看书,第一时间看更新

State value function

A state value function is also called simply a value function. It specifies how good it is for an agent to be in a particular state with a policy π. A value function is often denoted by V(s). It denotes the value of a state following a policy.

We can define a state value function as follows:

This specifies the expected return starting from state s according to policy π. We can substitute the value of Rt in the value function from (2) as follows:

  

Note that the state value function depends on the policy and it varies depending on the policy we choose.

We can view value functions in a table. Let us say we have two states and both of these states follow the policy π. Based on the value of these two states, we can tell how good it is for our agent to be in that state following a policy. The greater the value, the better the state is:

 

Based on the preceding table, we can tell that it is good to be in state 2, as it has high value. We will see how to estimate these values intuitively in the upcoming sections.