The theory behind policy gradient