Here we present a map of the results shown in the paper.
Config:
- Center:
False - Policy init:
zeros - Njobs:
1 - Gamma:
1.0
## Deep policies per-decision
Note: results are logged w.r.t. iterations, not timesteps
Config:
- Center:
False - Policy init:
xavier - Gamma:
1.0
Also, other experiments in notebook