30 lines (19 loc) · 570 Bytes

Log of the results in the paper

Here we present a map of the results shown in the paper.

Linear policies per-decision

Config:

Center: False
Policy init: zeros
Njobs: 1
Gamma: 1.0

## Deep policies per-decision

Note: results are logged w.r.t. iterations, not timesteps

Config:

Center: False
Policy init: xavier
Gamma: 1.0

Also, other experiments in notebook

Multiple