PuffeRL carries two config shapes (flat train_config + nested full_args) that drift

## Problem

`PuffeRL` stores two different representations of the same config data, and different evaluators read `use_rnn`, `eval_interval`, device, etc. from different places. This makes it easy to write code that works against one shape and breaks against the other, and it's already caused review confusion and one genuine bug in PR #371.

## Concretely

`load_config(env_name)` returns a **nested** dict shaped like the INI:

```python
args = {
    "env":       {"num_agents": 1024, "dynamics_model": "jerk", ...},
    "train":     {"learning_rate": 3e-4, "device": "cuda", "use_rnn": True, ...},
    "eval":      {"eval_interval": 1000, ...},
    "safe_eval": {"enabled": True, "interval": 250, ...},
    "vec":       {"num_workers": 16, ...},
    "render":    {"map_dir": "...", ...},
    ...
}
```

In `pufferl.train()` (around line 1084), we build a **flattened** version:

```python
train_config = dict(
    **args["train"],                    # args["train"] keys spread into top level
    env=env_name,                       # env_name (string) at top level
    env_config=args.get("env", {}),     # args["env"] renamed to env_config
    eval=args.get("eval", {}),
    safe_eval=args.get("safe_eval", {}),
)
pufferl = PuffeRL(train_config, vecenv, policy, logger, full_args=args)
```

`PuffeRL` then stores **both**:

```python
self.config    = train_config  # flat; use_rnn at top level
self.full_args = full_args     # nested; use_rnn at full_args["train"]["use_rnn"]
```

## Consequences

Evaluators receive whichever one they were built against, and they access the same logical data via different paths:

| Evaluator | Input | Reads `use_rnn` via | Reads `eval_interval` via |
|---|---|---|---|
| `Evaluator` (in `benchmark/evaluator.py`) | `self.full_args` | `self.configs["train"]["use_rnn"]` | `self.configs["eval"]["eval_interval"]` |
| `SafeEvaluator` | `self.config` (passed as `full_config`) | `full_config["use_rnn"]` | n/a (uses `safe_eval.interval`) |
| `PuffeRL.train` (top of loop) | `self.config` | `config["use_rnn"]` | `self.config["eval"]["eval_interval"]` |

Even the keys that *exist* differ:
- `train_config["env"]` = the string `"puffer_drive"` (env name)
- `args["env"]` = the full env config dict

...which collide, so `args["env"]` got renamed to `train_config["env_config"]` to disambiguate. That's the kind of workaround that accumulates.

## Proposed fix

Pick one canonical shape for config and use it everywhere:

**Option A: always-nested.** Drop `train_config`, just pass the full nested `args` dict to `PuffeRL`. Internal code reads `self.args["train"]["use_rnn"]`, `self.args["eval"]["eval_interval"]`, etc. Advantage: matches what `load_config` returns and what `drive.ini` looks like. Downside: touches every line in `PuffeRL.__init__` / `PuffeRL.train` that currently reads from `self.config`.

**Option B: always-flat with namespaces.** Flatten everything into a single-level dict with dotted keys (`"train.use_rnn"`, `"eval.eval_interval"`). This is how the CLI overrides already work (`--train.use-rnn`, `--eval.eval-interval`). Downside: biggest diff; hash lookups on strings with dots feel ugly.

**Option C (minimal):** keep `PuffeRL.__init__` as-is but store only `self.full_args`, and derive the flat train fields as `self.train` (a namespace/SimpleNamespace, or just direct attribute access). Every evaluator gets `self.full_args`. `train_config` stops existing. Internal reads change from `self.config["use_rnn"]` to `self.full_args["train"]["use_rnn"]`.

Evaluator	Input	Reads `use_rnn` via	Reads `eval_interval` via
`Evaluator` (in `benchmark/evaluator.py`)	`self.full_args`	`self.configs["train"]["use_rnn"]`	`self.configs["eval"]["eval_interval"]`
`SafeEvaluator`	`self.config` (passed as `full_config`)	`full_config["use_rnn"]`	n/a (uses `safe_eval.interval`)
`PuffeRL.train` (top of loop)	`self.config`	`config["use_rnn"]`	`self.config["eval"]["eval_interval"]`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PuffeRL carries two config shapes (flat train_config + nested full_args) that drift #396

Problem

Concretely

Consequences

Proposed fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PuffeRL carries two config shapes (flat train_config + nested full_args) that drift #396

Description

Problem

Concretely

Consequences

Proposed fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions