Skip to content

PuffeRL carries two config shapes (flat train_config + nested full_args) that drift #396

@eugenevinitsky

Description

@eugenevinitsky

Problem

PuffeRL stores two different representations of the same config data, and different evaluators read use_rnn, eval_interval, device, etc. from different places. This makes it easy to write code that works against one shape and breaks against the other, and it's already caused review confusion and one genuine bug in PR #371.

Concretely

load_config(env_name) returns a nested dict shaped like the INI:

args = {
    "env":       {"num_agents": 1024, "dynamics_model": "jerk", ...},
    "train":     {"learning_rate": 3e-4, "device": "cuda", "use_rnn": True, ...},
    "eval":      {"eval_interval": 1000, ...},
    "safe_eval": {"enabled": True, "interval": 250, ...},
    "vec":       {"num_workers": 16, ...},
    "render":    {"map_dir": "...", ...},
    ...
}

In pufferl.train() (around line 1084), we build a flattened version:

train_config = dict(
    **args["train"],                    # args["train"] keys spread into top level
    env=env_name,                       # env_name (string) at top level
    env_config=args.get("env", {}),     # args["env"] renamed to env_config
    eval=args.get("eval", {}),
    safe_eval=args.get("safe_eval", {}),
)
pufferl = PuffeRL(train_config, vecenv, policy, logger, full_args=args)

PuffeRL then stores both:

self.config    = train_config  # flat; use_rnn at top level
self.full_args = full_args     # nested; use_rnn at full_args["train"]["use_rnn"]

Consequences

Evaluators receive whichever one they were built against, and they access the same logical data via different paths:

Evaluator Input Reads use_rnn via Reads eval_interval via
Evaluator (in benchmark/evaluator.py) self.full_args self.configs["train"]["use_rnn"] self.configs["eval"]["eval_interval"]
SafeEvaluator self.config (passed as full_config) full_config["use_rnn"] n/a (uses safe_eval.interval)
PuffeRL.train (top of loop) self.config config["use_rnn"] self.config["eval"]["eval_interval"]

Even the keys that exist differ:

  • train_config["env"] = the string "puffer_drive" (env name)
  • args["env"] = the full env config dict

...which collide, so args["env"] got renamed to train_config["env_config"] to disambiguate. That's the kind of workaround that accumulates.

Proposed fix

Pick one canonical shape for config and use it everywhere:

Option A: always-nested. Drop train_config, just pass the full nested args dict to PuffeRL. Internal code reads self.args["train"]["use_rnn"], self.args["eval"]["eval_interval"], etc. Advantage: matches what load_config returns and what drive.ini looks like. Downside: touches every line in PuffeRL.__init__ / PuffeRL.train that currently reads from self.config.

Option B: always-flat with namespaces. Flatten everything into a single-level dict with dotted keys ("train.use_rnn", "eval.eval_interval"). This is how the CLI overrides already work (--train.use-rnn, --eval.eval-interval). Downside: biggest diff; hash lookups on strings with dots feel ugly.

Option C (minimal): keep PuffeRL.__init__ as-is but store only self.full_args, and derive the flat train fields as self.train (a namespace/SimpleNamespace, or just direct attribute access). Every evaluator gets self.full_args. train_config stops existing. Internal reads change from self.config["use_rnn"] to self.full_args["train"]["use_rnn"].

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions