Skip to content

Add survey-aware bootstrap for all estimators (Phase 6)#237

Open
igerber wants to merge 6 commits intomainfrom
survey-improvements
Open

Add survey-aware bootstrap for all estimators (Phase 6)#237
igerber wants to merge 6 commits intomainfrom
survey-improvements

Conversation

@igerber
Copy link
Owner

@igerber igerber commented Mar 25, 2026

Summary

  • Add survey-aware bootstrap inference for all 8 bootstrap-using estimators
  • Two strategies: PSU-level multiplier bootstrap (CS, ImputationDiD, TwoStageDiD, ContinuousDiD, EfficientDiD) and Rao-Wu rescaled bootstrap (SunAbraham, SyntheticDiD, TROP)
  • Expand CallawaySantAnna analytical survey support to full strata/PSU/FPC via compute_survey_if_variance()
  • Add shared infrastructure: generate_survey_multiplier_weights_batch, generate_rao_wu_weights, compute_survey_if_variance, aggregate_to_psu
  • Thread survey weights through bootstrap aggregation/IF/GMM score computation for all estimators
  • Add edge-case guards: lonely_psu="adjust" rejection, FPC validation, single-PSU handling, SyntheticDiD placebo+full-design guard
  • Update REGISTRY.md with Phase 6 survey bootstrap methodology section
  • Iteratively refined through 5 rounds of AI review (gpt-5.4-pro)

Methodology references (required if estimator / math changes)

  • Method name(s): Rao-Wu rescaled bootstrap, PSU-level multiplier bootstrap, Taylor Series Linearization
  • Paper / source link(s):
    • Rao & Wu (1988) "Resampling Inference with Complex Survey Data", JASA 83(401)
    • Rao, Wu & Yue (1992) "Some Recent Work on Resampling Methods for Complex Surveys", Survey Methodology 18(2)
    • Kolenikov (2010) "Resampling Variance Estimation for Complex Survey Data"
    • Shao (2003) "Impact of the Bootstrap on Sample Surveys", Statistical Science 18(2)
  • Any intentional deviations from the source (and why):
    • FPC enters Rao-Wu via adjusted resample size m_h = round((1-f_h)*(n_h-1)) per Rao-Wu-Yue (1992) Section 3
    • TROP uses cross-classified pseudo-strata (survey_stratum × treatment_group) for Rao-Wu
    • lonely_psu="adjust" rejected for bootstrap paths (analytical path supports it)
    • Rust TROP bootstrap remains pweight-only; Python fallback for full design

Validation

  • Tests added/updated: tests/test_survey.py, tests/test_survey_phase3.py, tests/test_survey_phase4.py, tests/test_survey_phase5.py
  • 285 survey tests passing across all phases
  • All deferral tests converted to positive tests
  • Smoke tests + scale invariance + uniform weight equivalence tests

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

igerber and others added 6 commits March 24, 2026 13:42
Implement bootstrap + survey interaction for all 8 bootstrap-using
estimators. Two strategies: PSU-level multiplier bootstrap (CS,
ImputationDiD, TwoStageDiD, ContinuousDiD, EfficientDiD) and Rao-Wu
rescaled bootstrap (SunAbraham, SyntheticDiD, TROP). Expand CS
analytical support to full strata/PSU/FPC via compute_survey_if_variance.
Add shared infrastructure: generate_survey_multiplier_weights_batch,
generate_rao_wu_weights, compute_survey_if_variance, aggregate_to_psu.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
P0: Thread survey weights through bootstrap aggregation/IF paths
- CS: use survey_weight_sum for bootstrap re-aggregation weights
- ImputationDiD: pass survey_weights_0 to _precompute_bootstrap_psi
- TwoStageDiD: add survey weights to _compute_cluster_S_scores

P1: Fix CS df_survey inconsistency (use unit-level df everywhere),
fix ContinuousDiD event-study bootstrap weights

P2: Update REGISTRY.md deferred language, clean TODO.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
P0: Pass survey weights to TwoStageDiD Stage-2 solve_ols calls
P1: ImputationDiD event-study/group bootstrap uses survey-weighted
    target weights; SunAbraham collapses to unit-level before Rao-Wu
    and stores NaN for failed draws
P2: CS metadata from unit-level resolved survey; registry CS note
    updated for consistency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
P0: SunAbraham pairs bootstrap now passes survey weights through to
    _fit_saturated_regression, _compute_iw_effects, _compute_overall_att
P1: ImputationDiD passes both treated (sw_1) and untreated (sw_0)
    survey weights to _precompute_bootstrap_psi, fixing array indexing
P1: ContinuousDiD IF scores now include per-unit w_i factor in
    sandwich meat (w_i * X_i * u_i), fixing weighted IF consistency
P3: REGISTRY.md CS section updated for consistent survey support docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
P0: TwoStage solve_ols now passes survey weights for all 3 Stage-2
    paths (static, event-study, group)
P0: SunAbraham pairs bootstrap passes resolved_survey=None to avoid
    stale design weights overriding bootstrap-resampled weights
P1: CS bootstrap uses fixed cohort masses from precomputed survey
    weights (not per-cell survey_weight_sum) for overall and event study
P1: Single-PSU unstratified guard in generate_survey_multiplier_weights_batch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Reject lonely_psu="adjust" for bootstrap with NotImplementedError
- Add FPC validation in multiplier and Rao-Wu bootstrap generators
- Gate SyntheticDiD placebo + full survey design (require bootstrap)
- Update REGISTRY.md for SyntheticDiD, TROP, and CS survey support
- Update TROP fit() docstring for full design support

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link

Overall Assessment

⛔ Blocker

The PR introduces one unmitigated P0 in the shared Rao-Wu bootstrap path and three P1 methodology/documentation issues in the new survey-bootstrap support. The P0 is a silent statistical correctness bug in shared code, so this should not merge as-is.

Executive Summary

  • The shared Rao-Wu rescaled bootstrap cannot produce the zero-variance full-census FPC case, so estimators that use it can report nonzero bootstrap uncertainty where the survey design implies zero variance.
  • ContinuousDiD’s new survey bootstrap targets the wrong ACRT^{glob} estimand under non-uniform survey weights: the point estimator and analytical IF are weighted, but the bootstrap perturbation is unweighted.
  • Both new survey-bootstrap generators reject lonely_psu="adjust", but the new Phase 6 registry text does not disclose that restriction.
  • TROP’s Rao-Wu path changes the no-strata design by treating treatment status as strata, which changes the bootstrap/FPC construction in the strata=None case without a clear Phase 6 note.
  • The new tests are mostly smoke tests; they do not cover Rao-Wu full-census FPC or ContinuousDiD’s survey-bootstrap overall_acrt_* outputs, which is why the methodology regressions above slip through.

Methodology

Code Quality

  • No additional findings beyond the methodology issues above.

Performance

  • No findings.

Maintainability

  • No additional findings beyond the methodology/doc alignment issues above.

Tech Debt

  • No TODO.md mitigation applies to the P0/P1 items above. Under the stated rubric, these are correctness or undocumented-methodology issues, not deferrable tech debt.

Security

  • No findings.

Documentation/Tests

Path to Approval

  1. Fix diff_diff/bootstrap_utils.py so Rao-Wu handles f_h >= 1 as a census case with zero perturbation, then add stratified and unstratified full-census regression tests in at least one Rao-Wu estimator and a generator-level unit test.
  2. Fix diff_diff/continuous_did.py so survey bootstrap ACRT^{glob} uses the same weighted treated derivative average as the point estimator / analytical IF, then add a regression test that asserts overall_acrt_se, CI, and p-value under non-uniform survey weights.
  3. Resolve the lonely_psu="adjust" mismatch by either implementing it in both survey-bootstrap generators or documenting the unsupported mode with explicit **Note:** / **Deviation from R:** labels in docs/methodology/REGISTRY.md, plus tests for the current behavior.
  4. Resolve the TROP strata=None design mismatch by either keeping the Rao-Wu path unstratified in that case or explicitly documenting the treatment-group pseudo-strata rule, including FPC semantics, and add SurveyDesign(weights=..., fpc=...) no-strata tests for both TROP methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant