Skip to content

[Phase 1.3] Joint modeling for nodematch + absdiff + dissolution (dyad-level target stats) #63

@smjenness

Description

@smjenness

Context

After #61/#62, all ego-level target statistics (edges, nodefactor_*, concurrent) are produced under method = "joint" via g-computation on joint Poisson / binomial GLMs. What remains are the dyad-level target statistics — every ERGM term whose value depends on both partners' attributes:

  1. Nodematch stats (nm.age.grp, nm.race, nm.race_diffF, nm.role.class) are currently estimated via univariate logistic GLMs on partnership-level data, e.g. same.age.grp ~ index.age.grp.
  2. Absdiff stats (absdiff.age, absdiff.sqrt.age) are currently a single lm(ad ~ 1) scalar scaled by edges at netstats time.
  3. Durations / dissolution (durs.<layer>.byage) are currently empirical means/medians stratified only by (age-match × index.age.grp) — no regression structure at all.

All three suffer the same marginal-vs-joint bias as the ego-level target stats did before #61: when the target population joint distribution differs from ARTnet's, these statistics carry ARTnet's conditional dyad attribute distribution baked in.

This issue is the dyad-level counterpart to #61/#62. Originally scoped to nodematch only; expanded on 2026-04-19 to cover durations after PI review of #68 noted the gap.

Proposed approach

All three sub-areas share the same long-form data unit (ARTnet partnership records with (ego_attrs, alter_attrs) pairs) and the same g-computation template:

  1. Fit a joint regression on partnership-level data with both ego and partner attributes on the RHS.
  2. Predict per-dyad for the synthetic population's implied mixing structure.
  3. Aggregate to target statistics.

1. Nodematch

Fit joint logistic models:

m_nm_age <- glm(same.age.grp ~ index.age.grp + index.race.cat.num +
                  part.age.grp + part.race.cat.num + ...,
                data = lmain, family = binomial())

Then predict on synthetic partnership pairs. Generating synthetic pairs is more complex than synthetic nodes — need the joint distribution of (ego, alter) attributes. Two options:

Option A (simpler): keep partnership-pair modeling marginal (current approach), but ensure ego-side attributes come from the corrected target joint distribution.

Option B (fully joint): fit joint ego-alter pair model; generate synthetic pairs from p(ego, alter) in the target population; predict match probabilities.

Recommend Option A for first pass; escalate to Option B if Option A still shows substantial bias after joint-nodefactor correction.

2. Absdiff

Joint regression on the partnership age gap:

m_ad_age <- lm(ad ~ index.age.grp + index.race.cat.num +
                 part.age.grp + part.race.cat.num + ...,
               data = lmain)

Then under joint netstats: predict per-dyad |age_i - age_j| for the synthetic mixing structure and aggregate. Same infrastructure choice (Option A vs Option B) as nodematch — probably share whichever approach we pick.

3. Durations / dissolution (new sub-scope)

Currently netparams$<layer>$durs.<layer>.byage is a summary stat:

durs.main.byage <- lmain |>
  filter(ongoing2 == 1) |>
  group_by(index.age.grp) |>
  summarise(mean.dur = mean(duration.time), median.dur = median(duration.time))

No adjustment for race, HIV concordance, or other dyad attributes. Marginal-vs-joint problem at a dyad level.

Joint analog:

m_dur_main <- lm(log(duration.time) ~ index.age.grp + index.race.cat.num +
                   part.age.grp + part.race.cat.num +
                   hiv.concord + same.race + same.age.grp,
                 data = lmain[lmain\$ongoing2 == 1, ])

Under method = "joint" in build_netstats: predict expected log-duration per edge given the synthetic network's mixing structure, exponentiate to get mean duration, then run through the existing geometric-distribution rate math (rates.adj = 1 - 2^(-1/median), mean.dur.adj = 1/(1 - 2^(-1/median))) to produce dissolution coefs.

Open design question: log-linear on duration.time assumes multiplicative attribute effects on mean duration. Alternative: Weibull or other survival models on the censored data (many partnerships are ongoing, so ongoing2 == 1 filtering to get durations is already dropping right-censored observations — that's a pre-existing issue with the current summary-stat approach too).

Tasks

Validation suite work — comparing univariate vs joint across multiple target-population scenarios, ablating interaction terms, generating the methods-paper figures — is handled separately on #65 (Phase 1.5), which is explicitly blocked by this issue and lives downstream.

Acceptance criteria

  • Under method = "joint", all of nodematch_*, absdiff_*, and diss.<layer>.byage come from joint models rather than univariate marginals.
  • Internal consistency: Σ_k nodematch_age.grp[k] ≤ Σ_k nodefactor_age.grp[k] / 2 etc.
  • End-to-end netest() run on EpiModelHIV-Template converges and diagnostics look reasonable (dx_main, dx_casl, dx_inst).

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions