Problem
PuffeRL stores two different representations of the same config data, and different evaluators read use_rnn, eval_interval, device, etc. from different places. This makes it easy to write code that works against one shape and breaks against the other, and it's already caused review confusion and one genuine bug in PR #371.
Concretely
load_config(env_name) returns a nested dict shaped like the INI:
args = {
"env": {"num_agents": 1024, "dynamics_model": "jerk", ...},
"train": {"learning_rate": 3e-4, "device": "cuda", "use_rnn": True, ...},
"eval": {"eval_interval": 1000, ...},
"safe_eval": {"enabled": True, "interval": 250, ...},
"vec": {"num_workers": 16, ...},
"render": {"map_dir": "...", ...},
...
}
In pufferl.train() (around line 1084), we build a flattened version:
train_config = dict(
**args["train"], # args["train"] keys spread into top level
env=env_name, # env_name (string) at top level
env_config=args.get("env", {}), # args["env"] renamed to env_config
eval=args.get("eval", {}),
safe_eval=args.get("safe_eval", {}),
)
pufferl = PuffeRL(train_config, vecenv, policy, logger, full_args=args)
PuffeRL then stores both:
self.config = train_config # flat; use_rnn at top level
self.full_args = full_args # nested; use_rnn at full_args["train"]["use_rnn"]
Consequences
Evaluators receive whichever one they were built against, and they access the same logical data via different paths:
| Evaluator |
Input |
Reads use_rnn via |
Reads eval_interval via |
Evaluator (in benchmark/evaluator.py) |
self.full_args |
self.configs["train"]["use_rnn"] |
self.configs["eval"]["eval_interval"] |
SafeEvaluator |
self.config (passed as full_config) |
full_config["use_rnn"] |
n/a (uses safe_eval.interval) |
PuffeRL.train (top of loop) |
self.config |
config["use_rnn"] |
self.config["eval"]["eval_interval"] |
Even the keys that exist differ:
train_config["env"] = the string "puffer_drive" (env name)
args["env"] = the full env config dict
...which collide, so args["env"] got renamed to train_config["env_config"] to disambiguate. That's the kind of workaround that accumulates.
Proposed fix
Pick one canonical shape for config and use it everywhere:
Option A: always-nested. Drop train_config, just pass the full nested args dict to PuffeRL. Internal code reads self.args["train"]["use_rnn"], self.args["eval"]["eval_interval"], etc. Advantage: matches what load_config returns and what drive.ini looks like. Downside: touches every line in PuffeRL.__init__ / PuffeRL.train that currently reads from self.config.
Option B: always-flat with namespaces. Flatten everything into a single-level dict with dotted keys ("train.use_rnn", "eval.eval_interval"). This is how the CLI overrides already work (--train.use-rnn, --eval.eval-interval). Downside: biggest diff; hash lookups on strings with dots feel ugly.
Option C (minimal): keep PuffeRL.__init__ as-is but store only self.full_args, and derive the flat train fields as self.train (a namespace/SimpleNamespace, or just direct attribute access). Every evaluator gets self.full_args. train_config stops existing. Internal reads change from self.config["use_rnn"] to self.full_args["train"]["use_rnn"].
Problem
PuffeRLstores two different representations of the same config data, and different evaluators readuse_rnn,eval_interval, device, etc. from different places. This makes it easy to write code that works against one shape and breaks against the other, and it's already caused review confusion and one genuine bug in PR #371.Concretely
load_config(env_name)returns a nested dict shaped like the INI:In
pufferl.train()(around line 1084), we build a flattened version:PuffeRLthen stores both:Consequences
Evaluators receive whichever one they were built against, and they access the same logical data via different paths:
use_rnnviaeval_intervalviaEvaluator(inbenchmark/evaluator.py)self.full_argsself.configs["train"]["use_rnn"]self.configs["eval"]["eval_interval"]SafeEvaluatorself.config(passed asfull_config)full_config["use_rnn"]safe_eval.interval)PuffeRL.train(top of loop)self.configconfig["use_rnn"]self.config["eval"]["eval_interval"]Even the keys that exist differ:
train_config["env"]= the string"puffer_drive"(env name)args["env"]= the full env config dict...which collide, so
args["env"]got renamed totrain_config["env_config"]to disambiguate. That's the kind of workaround that accumulates.Proposed fix
Pick one canonical shape for config and use it everywhere:
Option A: always-nested. Drop
train_config, just pass the full nestedargsdict toPuffeRL. Internal code readsself.args["train"]["use_rnn"],self.args["eval"]["eval_interval"], etc. Advantage: matches whatload_configreturns and whatdrive.inilooks like. Downside: touches every line inPuffeRL.__init__/PuffeRL.trainthat currently reads fromself.config.Option B: always-flat with namespaces. Flatten everything into a single-level dict with dotted keys (
"train.use_rnn","eval.eval_interval"). This is how the CLI overrides already work (--train.use-rnn,--eval.eval-interval). Downside: biggest diff; hash lookups on strings with dots feel ugly.Option C (minimal): keep
PuffeRL.__init__as-is but store onlyself.full_args, and derive the flat train fields asself.train(a namespace/SimpleNamespace, or just direct attribute access). Every evaluator getsself.full_args.train_configstops existing. Internal reads change fromself.config["use_rnn"]toself.full_args["train"]["use_rnn"].