[Phase 1.3] Joint modeling for nodematch + absdiff + dissolution (dyad-level target stats)

## Context

After #61/#62, all *ego-level* target statistics (`edges`, `nodefactor_*`, `concurrent`) are produced under `method = "joint"` via g-computation on joint Poisson / binomial GLMs. What remains are the **dyad-level** target statistics — every ERGM term whose value depends on *both* partners' attributes:

1. Nodematch stats (`nm.age.grp`, `nm.race`, `nm.race_diffF`, `nm.role.class`) are currently estimated via univariate logistic GLMs on partnership-level data, e.g. `same.age.grp ~ index.age.grp`.
2. Absdiff stats (`absdiff.age`, `absdiff.sqrt.age`) are currently a single `lm(ad ~ 1)` scalar scaled by edges at netstats time.
3. Durations / dissolution (`durs.<layer>.byage`) are currently empirical means/medians stratified only by `(age-match × index.age.grp)` — no regression structure at all.

All three suffer the same marginal-vs-joint bias as the ego-level target stats did before #61: when the target population joint distribution differs from ARTnet's, these statistics carry ARTnet's conditional dyad attribute distribution baked in.

This issue is the dyad-level counterpart to #61/#62. Originally scoped to nodematch only; expanded on 2026-04-19 to cover durations after PI review of #68 noted the gap.

## Proposed approach

All three sub-areas share the same long-form data unit (ARTnet partnership records with `(ego_attrs, alter_attrs)` pairs) and the same g-computation template:
1. Fit a joint regression on partnership-level data with both ego and partner attributes on the RHS.
2. Predict per-dyad for the synthetic population's implied mixing structure.
3. Aggregate to target statistics.

### 1. Nodematch

Fit joint logistic models:

```r
m_nm_age <- glm(same.age.grp ~ index.age.grp + index.race.cat.num +
                  part.age.grp + part.race.cat.num + ...,
                data = lmain, family = binomial())
```

Then predict on synthetic partnership pairs. Generating synthetic pairs is more complex than synthetic nodes — need the joint distribution of `(ego, alter)` attributes. Two options:

**Option A (simpler)**: keep partnership-pair modeling marginal (current approach), but ensure ego-side attributes come from the corrected target joint distribution.

**Option B (fully joint)**: fit joint ego-alter pair model; generate synthetic pairs from p(ego, alter) in the target population; predict match probabilities.

Recommend Option A for first pass; escalate to Option B if Option A still shows substantial bias after joint-nodefactor correction.

### 2. Absdiff

Joint regression on the partnership age gap:

```r
m_ad_age <- lm(ad ~ index.age.grp + index.race.cat.num +
                 part.age.grp + part.race.cat.num + ...,
               data = lmain)
```

Then under joint netstats: predict per-dyad |age_i - age_j| for the synthetic mixing structure and aggregate. Same infrastructure choice (Option A vs Option B) as nodematch — probably share whichever approach we pick.

### 3. Durations / dissolution (new sub-scope)

Currently `netparams$<layer>$durs.<layer>.byage` is a summary stat:

```r
durs.main.byage <- lmain |>
  filter(ongoing2 == 1) |>
  group_by(index.age.grp) |>
  summarise(mean.dur = mean(duration.time), median.dur = median(duration.time))
```

No adjustment for race, HIV concordance, or other dyad attributes. Marginal-vs-joint problem at a dyad level.

Joint analog:

```r
m_dur_main <- lm(log(duration.time) ~ index.age.grp + index.race.cat.num +
                   part.age.grp + part.race.cat.num +
                   hiv.concord + same.race + same.age.grp,
                 data = lmain[lmain\$ongoing2 == 1, ])
```

Under `method = "joint"` in build_netstats: predict expected log-duration per edge given the synthetic network's mixing structure, exponentiate to get mean duration, then run through the existing geometric-distribution rate math (`rates.adj = 1 - 2^(-1/median)`, `mean.dur.adj = 1/(1 - 2^(-1/median))`) to produce dissolution coefs.

Open design question: log-linear on `duration.time` assumes multiplicative attribute effects on mean duration. Alternative: Weibull or other survival models on the censored data (many partnerships are ongoing, so `ongoing2 == 1` filtering to get durations is already dropping right-censored observations — that's a pre-existing issue with the current summary-stat approach too).

## Tasks

- [x] Joint logistic for `same.age.grp`, `same.race` in build_netparams under `method = "joint"`. Store at `netparams$<layer>$joint_nm_age_model`, `joint_nm_race_model`.
- [x] Joint `lm` for `ad` (age absdiff) in build_netparams. Store at `netparams$<layer>$joint_absdiff_age_model`.
- [x] Joint `lm` for `log(duration.time)` among ongoing partnerships. Store at `netparams$<layer>$joint_duration_model`.
- [~] Under `method = "joint"` in build_netstats: predict dyad-level stats on synthetic pairs, aggregate to `nodematch_*`, `absdiff_*`, `diss.<layer>.byage`. Retire the univariate-ratio × new-edges shortcut currently in PR #68. **(partial: nodematch and absdiff done in PR #69 with ego-attr aggregation over the synthetic population. Duration model fit in PR #71 but its consumption on the synthetic population is deferred to issue #73.)**
- [x] Decide Option A vs Option B for generating synthetic partnership pairs (probably same answer for all three sub-areas).
- [x] Unit tests: on ARTnet-own distribution, recover marginal observed same-race / same-age rates, mean absdiff, and mean duration per stratum.
- [x] End-to-end EpiModelHIV-Template estimation with `method = "joint"`. Verified on PR #71: all 6 ERGMs (3 layers x default + joint methods) converge cleanly under `Stochastic-Approximation`; netdx static diagnostics on the default main model show all target stats matched within |Z| <= 2.05 and |% diff| <= 4.2% across 1000 sims.

> Validation suite work — comparing univariate vs joint across multiple target-population scenarios, ablating interaction terms, generating the methods-paper figures — is handled separately on #65 (Phase 1.5), which is explicitly blocked by this issue and lives downstream.

## Acceptance criteria

- Under `method = "joint"`, all of `nodematch_*`, `absdiff_*`, and `diss.<layer>.byage` come from joint models rather than univariate marginals.
- Internal consistency: `Σ_k nodematch_age.grp[k] ≤ Σ_k nodefactor_age.grp[k] / 2` etc.
- End-to-end `netest()` run on EpiModelHIV-Template converges and diagnostics look reasonable (`dx_main`, `dx_casl`, `dx_inst`).

## Related

- Blocked by: #62 (build_netstats joint infra) — now in PR #68.
- May spawn: follow-up issue for full ego-alter pair joint generation (Option B) if Option A bias residual is large.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Phase 1.3] Joint modeling for nodematch + absdiff + dissolution (dyad-level target stats) #63

Context

Proposed approach

1. Nodematch

2. Absdiff

3. Durations / dissolution (new sub-scope)

Tasks

Acceptance criteria

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Phase 1.3] Joint modeling for nodematch + absdiff + dissolution (dyad-level target stats) #63

Description

Context

Proposed approach

1. Nodematch

2. Absdiff

3. Durations / dissolution (new sub-scope)

Tasks

Acceptance criteria

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions