Context
After #61/#62, all ego-level target statistics (edges, nodefactor_*, concurrent) are produced under method = "joint" via g-computation on joint Poisson / binomial GLMs. What remains are the dyad-level target statistics — every ERGM term whose value depends on both partners' attributes:
- Nodematch stats (
nm.age.grp, nm.race, nm.race_diffF, nm.role.class) are currently estimated via univariate logistic GLMs on partnership-level data, e.g. same.age.grp ~ index.age.grp.
- Absdiff stats (
absdiff.age, absdiff.sqrt.age) are currently a single lm(ad ~ 1) scalar scaled by edges at netstats time.
- Durations / dissolution (
durs.<layer>.byage) are currently empirical means/medians stratified only by (age-match × index.age.grp) — no regression structure at all.
All three suffer the same marginal-vs-joint bias as the ego-level target stats did before #61: when the target population joint distribution differs from ARTnet's, these statistics carry ARTnet's conditional dyad attribute distribution baked in.
This issue is the dyad-level counterpart to #61/#62. Originally scoped to nodematch only; expanded on 2026-04-19 to cover durations after PI review of #68 noted the gap.
Proposed approach
All three sub-areas share the same long-form data unit (ARTnet partnership records with (ego_attrs, alter_attrs) pairs) and the same g-computation template:
- Fit a joint regression on partnership-level data with both ego and partner attributes on the RHS.
- Predict per-dyad for the synthetic population's implied mixing structure.
- Aggregate to target statistics.
1. Nodematch
Fit joint logistic models:
m_nm_age <- glm(same.age.grp ~ index.age.grp + index.race.cat.num +
part.age.grp + part.race.cat.num + ...,
data = lmain, family = binomial())
Then predict on synthetic partnership pairs. Generating synthetic pairs is more complex than synthetic nodes — need the joint distribution of (ego, alter) attributes. Two options:
Option A (simpler): keep partnership-pair modeling marginal (current approach), but ensure ego-side attributes come from the corrected target joint distribution.
Option B (fully joint): fit joint ego-alter pair model; generate synthetic pairs from p(ego, alter) in the target population; predict match probabilities.
Recommend Option A for first pass; escalate to Option B if Option A still shows substantial bias after joint-nodefactor correction.
2. Absdiff
Joint regression on the partnership age gap:
m_ad_age <- lm(ad ~ index.age.grp + index.race.cat.num +
part.age.grp + part.race.cat.num + ...,
data = lmain)
Then under joint netstats: predict per-dyad |age_i - age_j| for the synthetic mixing structure and aggregate. Same infrastructure choice (Option A vs Option B) as nodematch — probably share whichever approach we pick.
3. Durations / dissolution (new sub-scope)
Currently netparams$<layer>$durs.<layer>.byage is a summary stat:
durs.main.byage <- lmain |>
filter(ongoing2 == 1) |>
group_by(index.age.grp) |>
summarise(mean.dur = mean(duration.time), median.dur = median(duration.time))
No adjustment for race, HIV concordance, or other dyad attributes. Marginal-vs-joint problem at a dyad level.
Joint analog:
m_dur_main <- lm(log(duration.time) ~ index.age.grp + index.race.cat.num +
part.age.grp + part.race.cat.num +
hiv.concord + same.race + same.age.grp,
data = lmain[lmain\$ongoing2 == 1, ])
Under method = "joint" in build_netstats: predict expected log-duration per edge given the synthetic network's mixing structure, exponentiate to get mean duration, then run through the existing geometric-distribution rate math (rates.adj = 1 - 2^(-1/median), mean.dur.adj = 1/(1 - 2^(-1/median))) to produce dissolution coefs.
Open design question: log-linear on duration.time assumes multiplicative attribute effects on mean duration. Alternative: Weibull or other survival models on the censored data (many partnerships are ongoing, so ongoing2 == 1 filtering to get durations is already dropping right-censored observations — that's a pre-existing issue with the current summary-stat approach too).
Tasks
Validation suite work — comparing univariate vs joint across multiple target-population scenarios, ablating interaction terms, generating the methods-paper figures — is handled separately on #65 (Phase 1.5), which is explicitly blocked by this issue and lives downstream.
Acceptance criteria
- Under
method = "joint", all of nodematch_*, absdiff_*, and diss.<layer>.byage come from joint models rather than univariate marginals.
- Internal consistency:
Σ_k nodematch_age.grp[k] ≤ Σ_k nodefactor_age.grp[k] / 2 etc.
- End-to-end
netest() run on EpiModelHIV-Template converges and diagnostics look reasonable (dx_main, dx_casl, dx_inst).
Related
Context
After #61/#62, all ego-level target statistics (
edges,nodefactor_*,concurrent) are produced undermethod = "joint"via g-computation on joint Poisson / binomial GLMs. What remains are the dyad-level target statistics — every ERGM term whose value depends on both partners' attributes:nm.age.grp,nm.race,nm.race_diffF,nm.role.class) are currently estimated via univariate logistic GLMs on partnership-level data, e.g.same.age.grp ~ index.age.grp.absdiff.age,absdiff.sqrt.age) are currently a singlelm(ad ~ 1)scalar scaled by edges at netstats time.durs.<layer>.byage) are currently empirical means/medians stratified only by(age-match × index.age.grp)— no regression structure at all.All three suffer the same marginal-vs-joint bias as the ego-level target stats did before #61: when the target population joint distribution differs from ARTnet's, these statistics carry ARTnet's conditional dyad attribute distribution baked in.
This issue is the dyad-level counterpart to #61/#62. Originally scoped to nodematch only; expanded on 2026-04-19 to cover durations after PI review of #68 noted the gap.
Proposed approach
All three sub-areas share the same long-form data unit (ARTnet partnership records with
(ego_attrs, alter_attrs)pairs) and the same g-computation template:1. Nodematch
Fit joint logistic models:
Then predict on synthetic partnership pairs. Generating synthetic pairs is more complex than synthetic nodes — need the joint distribution of
(ego, alter)attributes. Two options:Option A (simpler): keep partnership-pair modeling marginal (current approach), but ensure ego-side attributes come from the corrected target joint distribution.
Option B (fully joint): fit joint ego-alter pair model; generate synthetic pairs from p(ego, alter) in the target population; predict match probabilities.
Recommend Option A for first pass; escalate to Option B if Option A still shows substantial bias after joint-nodefactor correction.
2. Absdiff
Joint regression on the partnership age gap:
Then under joint netstats: predict per-dyad |age_i - age_j| for the synthetic mixing structure and aggregate. Same infrastructure choice (Option A vs Option B) as nodematch — probably share whichever approach we pick.
3. Durations / dissolution (new sub-scope)
Currently
netparams$<layer>$durs.<layer>.byageis a summary stat:No adjustment for race, HIV concordance, or other dyad attributes. Marginal-vs-joint problem at a dyad level.
Joint analog:
Under
method = "joint"in build_netstats: predict expected log-duration per edge given the synthetic network's mixing structure, exponentiate to get mean duration, then run through the existing geometric-distribution rate math (rates.adj = 1 - 2^(-1/median),mean.dur.adj = 1/(1 - 2^(-1/median))) to produce dissolution coefs.Open design question: log-linear on
duration.timeassumes multiplicative attribute effects on mean duration. Alternative: Weibull or other survival models on the censored data (many partnerships are ongoing, soongoing2 == 1filtering to get durations is already dropping right-censored observations — that's a pre-existing issue with the current summary-stat approach too).Tasks
same.age.grp,same.racein build_netparams undermethod = "joint". Store atnetparams$<layer>$joint_nm_age_model,joint_nm_race_model.lmforad(age absdiff) in build_netparams. Store atnetparams$<layer>$joint_absdiff_age_model.lmforlog(duration.time)among ongoing partnerships. Store atnetparams$<layer>$joint_duration_model.method = "joint"in build_netstats: predict dyad-level stats on synthetic pairs, aggregate tonodematch_*,absdiff_*,diss.<layer>.byage. Retire the univariate-ratio × new-edges shortcut currently in PR Use joint GLM g-computation for build_netstats target stats (#62) #68. (partial: nodematch and absdiff done in PR Joint dyad-level modeling: nodematch + absdiff (#63 phases 1 & 2) #69 with ego-attr aggregation over the synthetic population. Duration model fit in PR Duration methods: empirical + joint_lm (#63 phase 3) #71 but its consumption on the synthetic population is deferred to issue Duration g-computation: predict joint_lm per-dyad on synthetic population in build_netstats #73.)method = "joint". Verified on PR Duration methods: empirical + joint_lm (#63 phase 3) #71: all 6 ERGMs (3 layers x default + joint methods) converge cleanly underStochastic-Approximation; netdx static diagnostics on the default main model show all target stats matched within |Z| <= 2.05 and |% diff| <= 4.2% across 1000 sims.Acceptance criteria
method = "joint", all ofnodematch_*,absdiff_*, anddiss.<layer>.byagecome from joint models rather than univariate marginals.Σ_k nodematch_age.grp[k] ≤ Σ_k nodefactor_age.grp[k] / 2etc.netest()run on EpiModelHIV-Template converges and diagnostics look reasonable (dx_main,dx_casl,dx_inst).Related