Skip to content

proxy: unify GUI-saved proxy as the single source of truth for HTTP#149

Merged
TigerInYourDream merged 7 commits into
mainfrom
feat/proxy
May 27, 2026
Merged

proxy: unify GUI-saved proxy as the single source of truth for HTTP#149
TigerInYourDream merged 7 commits into
mainfrom
feat/proxy

Conversation

@TigerInYourDream
Copy link
Copy Markdown

Problems this PR fixes

The current proxy implementation has six concrete defects that let proxy_state.json (the GUI's "saved proxy" file) drift out of sync with what the running process and the live Matrix client actually use. Each one is verified by a unit test in src/proxy_config.rs or by cargo build --features tsp.

1. TLS minimum and User-Agent regression (security)

When build_client and restore_session switched from Client::builder().proxy(url) to .http_client(custom), they lost matrix-sdk's HttpSettings defaults — min_tls_version(TLS_1_2) (BCP 195) and user_agent("matrix-rust-sdk"). The HttpConfig::Custom branch in matrix-sdk passes the supplied reqwest client through verbatim and never layers these defaults on top. Robrix could negotiate sub-1.2 TLS to homeservers on systems whose TLS backend allows it.

2. CLI --proxy override silently dropped on session restore and media download

build_client correctly resolved cli.proxy → CLI override or saved. restore_session and MatrixRequest::DownloadAndSaveFile instead called load_saved_proxy_url() directly, ignoring --proxy. Users launching with robrix --proxy http://A whose saved value was different (or None) had their main client route through A while downloads bypassed it — or vice versa.

3. SSO pre-built client cached with stale proxy

DEFAULT_SSO_CLIENT is built proactively at startup with whatever proxy was saved at that moment. spawn_sso_server then reused the cached client whenever effective_proxy.is_some() == false. If the user cleared the proxy in Settings after boot, the rebuild condition was false and SSO routed through the deleted proxy — exactly the failure mode the spec is meant to eliminate.

4. socks5/socks5h accepted in validation but unreachable at runtime

validate_proxy_url accepted four schemes, but Cargo.lock confirms reqwest 0.12.28 is compiled without tokio-socks. Proxy::all("socks5://...") fails at first request with an opaque connector error, hidden behind a generic anyhow message. The PR drops socks5 from the allow-list rather than ship a promise the runtime can't keep.

5. PROXY_ENV_LOCK poisoning permanently disabled proxy management

A Mutex<()> poisoned by a panic-while-held caused every subsequent apply_proxy_to_process_env call to fail with "Failed to lock proxy env: poisoned ...". The user could not change or clear their proxy until restart.

6. save_proxy_url file write outside the env lock

The file write happened before the env apply and outside the mutex. Two concurrent saves (UI thread vs. OIDC worker thread) could interleave write(A) → write(B) → apply(B) → apply(A), leaving file=B and env=A. The spec's "single source of truth" contract was violable.

Additional cleanups in scope

  • restore_session was missing the 60s reqwest timeout that build_client sets; restored sessions could hang indefinitely on a slow CONNECT/TLS phase.
  • TSP's create_reqwest_client (src/tsp/mod.rs, gated on --features tsp) was missing min_tls_version(TLS_1_2) and the loopback NoProxy bypass. Mirrored from build_policy_reqwest_client.
  • The updater (src/updater.rs) had already been routed through build_policy_reqwest_client in an earlier change but the spec did not list it in Allowed; spec extended to legitimize.

Design

A single helper, proxy_config::build_policy_reqwest_client, now builds every reqwest client used by Matrix / TSP / updater code with consistent TLS, UA, NoProxy bypass, and proxy resolution. A startup-time OnceLock<Option<String>> named CLI_PROXY_OVERRIDE holds the parsed --proxy value so any later resolver picks it up without re-parsing argv or threading the value through deep call chains.

DEFAULT_SSO_CLIENT now carries the proxy URL it was built with as a third tuple element. spawn_sso_server reuses the cached client only when the homeserver is default and the cached proxy still matches the current effective proxy; otherwise it rebuilds.

save_proxy_url holds PROXY_ENV_LOCK across both the file write and the env apply. The lock helper auto-recovers from a poisoned state via clear_poison() + into_inner().

Spec coverage

All seven scenarios in specs/task-proxy-policy.spec.md map to tests or to cargo build:

Scenario Test
Boot with no saved proxy clears inherited env proxy_state_none_clears_inherited_env_proxy_vars
Save null forces direct save_proxy_url_none_clears_env_proxy_vars
Save Some sets env + bypass save_proxy_url_some_sets_proxy_env_and_bypass_rules
Explicit reqwest disables system proxy when policy is None build_policy_reqwest_client_disables_system_proxy_when_proxy_is_none
Explicit reqwest carries minimal loopback bypass only build_policy_reqwest_client_attaches_no_proxy_bypass_for_local_addresses
Invalid proxy URL rejected validate_proxy_url_rejects_socks_schemes + discovery_http_client_rejects_invalid_proxy_override
cargo build passes CI

Three additional tests cover behaviors added in this PR:

  • policy_user_agent_carries_robrix_identity_and_sdk_family
  • validate_proxy_url_rejects_socks_schemes
  • apply_proxy_to_process_env_recovers_from_poisoned_lock

What this PR deliberately does not do

  • Hot-swap the live Matrix Client when the user saves a new proxy in Settings. The existing i18n string for settings.preferences.proxy.popup.saved already tells the user "Re-login to apply to the current Matrix session." Implementing in-place rebuild would require dropping CLIENT, SYNC_SERVICE, and the worker task — out of scope here.
  • Preserve shell-inherited HTTP_PROXY when no GUI proxy is configured. This is the spec's deliberate "GUI is the single source of truth" decision (Decisions §11–13), not a regression. Users who want their shell proxy honored should save it in the GUI.
  • Address std::env::set_var UB on Rust 2024. The env-mutation surface is reduced and serialized via PROXY_ENV_LOCK, but the underlying race with third-party crates reading env (reqwest, hyper) cannot be eliminated while the spec mandates env vars be set. Documented as a known limit.
  • TSP cache invalidation on proxy change. TSP caches its reqwest client for the worker lifetime. Changing the proxy at runtime requires a process restart for TSP to pick it up. Same restart-required UX as the live Matrix Client.

Test plan

  • cargo build — passes locally
  • cargo build --features tsp — passes locally
  • cargo test --lib proxy_config:: — 8/8 passes locally
  • Manual: launch with no saved proxy and shell HTTPS_PROXY set — confirm Robrix routes direct (spec-mandated)
  • Manual: save a proxy in Settings, observe the "Re-login" toast, re-login, confirm sliding sync routes through the saved proxy
  • Manual: clear the proxy in Settings, restart, confirm Robrix routes direct
  • Manual: robrix --proxy http://localhost:8080 with a different saved value — confirm CLI override wins for both login and downloads
  • Manual: trigger SSO login after clearing the proxy — confirm the cached SSO client is rebuilt (no stale proxy reuse)

Restore the matrix-sdk security defaults (TLS 1.2 minimum, matrix-rust-sdk
user agent) that were lost when build_client switched from .proxy(url) to
.http_client(custom). Make the same policy reach every HTTP call site —
session restore, media download, homeserver discovery, SSO pre-build, the
TSP worker, and the updater check — so GUI state can't drift from what
sliding sync actually uses.

Plumb a CLI --proxy override stash through proxy_config::CLI_PROXY_OVERRIDE
so restore_session and DownloadAndSaveFile honor it the same way build_client
does, instead of silently dropping it.

Invalidate the DEFAULT_SSO_CLIENT cache when its baked-in proxy no longer
matches the current effective proxy — previously, clearing the proxy after
boot left the cached client routing SSO through the deleted proxy.

Drop socks5/socks5h from validate_proxy_url: reqwest is not compiled with
the socks feature, so accepting them in the GUI just promises connectivity
the runtime can't deliver. http/https only until the feature is enabled.

Harden save_proxy_url: hold the env lock across the file write so two
concurrent saves cannot leave proxy_state.json and the process env
disagreeing, and recover the lock automatically when a prior panic poisoned
it instead of permanently disabling proxy management.

Match build_client's 60s reqwest timeout in restore_session so restored
sessions don't hang indefinitely on a slow CONNECT/TLS phase.

Spec specs/task-proxy-policy.spec.md captures the contract: proxy_state.json
wins over inherited env, null means forced direct, and one helper builds the
reqwest client for every Robrix HTTP path.
# Conflicts:
#	src/persistence/matrix_state.rs
#	src/sliding_sync.rs
…ource of truth

Replace the env-var-mutating proxy approach with explicit reqwest builder
configuration so every HTTP construction site routes through one helper
(`build_policy_reqwest_client`) with predictable, testable behavior.

Why:
- The previous design wrote/cleared `http_proxy` / `https_proxy` etc. on the
  process, which is fragile (shell-launch vs IDE-launch behavior diverges)
  and leaks into uncontrolled subprocesses and third-party crates.
- Worse, env-var proxy was applied indiscriminately to ALL homeservers
  including LAN addresses, which was the root cause of restore_session
  failures for users running a self-hosted homeserver while a system proxy
  was active.

What changed:
- `proxy_config.rs`: remove `PROXY_ENV_LOCK`, `apply_proxy_to_process_env`,
  `build_env_proxy_config`, and the `robius_proxy` dependency. The new
  helpers (`build_policy_reqwest_client`, `apply_policy_to_reqwest_builder`,
  `build_reqwest_proxy`) call `.proxy(...)` explicitly when the GUI has a
  proxy and `.no_proxy()` when it doesn't — env vars are never consulted.
- `sliding_sync.rs`, `persistence/matrix_state.rs`: build_client and
  restore_session now go through `build_policy_reqwest_client` and pass the
  result via `.http_client(http_client)` on the Matrix SDK builder.
- `tsp/mod.rs`: TSP uses a different reqwest version than matrix-sdk, so it
  mirrors the same policy via `build_tsp_reqwest_proxy` /
  `apply_tsp_proxy_policy` plus TLS 1.2 minimum.
- `updater.rs`: wraps the shared helper as `build_updater_http_client` with
  a 10s timeout.
- `Cargo.toml`/`Cargo.lock`: drop `robius-proxy` (now dead code).
- `validate_proxy_url`: socks5/socks5h schemes are no longer accepted
  (reqwest is not compiled with the `socks` feature).
- Spec updated to reflect "GUI is the single source of truth; no env mutation".

Tests:
- proxy_config: 6 unit tests cover policy UA, scheme rejection, save/load
  round-trip, no-proxy when GUI empty, loopback bypass on explicit proxy.
- tsp: 2 unit tests cover mirrored policy.
- sliding_sync: discovery_http_client_{accepts_valid,rejects_invalid}_proxy_override.
- updater: updater_http_client_disables_system_proxy_when_proxy_is_none.
…y direct

When a user has Clash (or any other public HTTP proxy) saved in Robrix and
also connects to a self-hosted homeserver on the LAN (e.g. Palpo at
192.168.1.58), the previous loopback-only bypass routed the LAN request
through the public proxy, which almost always fails to forward to RFC 1918
addresses. The user had to choose between "proxy on for matrix.org" OR
"proxy off for LAN" — not both.

Expand `DEFAULT_NO_PROXY_BYPASS` to include the standard private-network
ranges, so a single GUI proxy configuration handles both cases:
- IPv4 RFC 1918: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
- IPv4 link-local: 169.254.0.0/16
- IPv6 ULA: fc00::/7
- IPv6 link-local: fe80::/10

reqwest's `NoProxy::from_string` accepts CIDR notation, so this is a pure
string-list change with no new parser logic.

Trade-off (documented in spec): if a corporate VPN resolves a public
matrix.* hostname to an RFC 1918 address via split DNS, that traffic will
also be treated as direct. This is the right default 99% of the time (VPN
already provides encrypted transport), and a future GUI exception list can
override it if needed.

Test updates:
- `build_policy_reqwest_client_attaches_no_proxy_bypass_for_local_addresses`
  and the TSP mirror now assert every entry in `DEFAULT_NO_PROXY_BYPASS`
  appears in the proxy debug representation. The "unexpected" guard list
  switches from "private ranges shouldn't be there" to "specific LAN IPs
  shouldn't be hardcoded" — CIDR ranges cover those addresses, but the
  verbatim IPs must not appear.
A previous version of `validate_proxy_url` accepted `socks5://` and
`socks5h://` URLs. After this branch dropped that support (reqwest is not
compiled with the `socks` feature), any user who had saved a socks5 URL via
the old GUI would, on next launch, see all reqwest client builds fail with
an opaque "Unsupported proxy URL scheme" error originating deep inside
build_policy_reqwest_client — no proxy-related message in the UI.

Add a forward-compatible guard in `load_saved_proxy_url_from_path`: after
normalizing the saved URL, run it through `validate_proxy_url` and, on
rejection, log a `warning!` and return `None`. The user can then re-save a
supported scheme via Settings; meanwhile the rest of the app behaves as if
no proxy were configured (which is correct — the saved one isn't usable).

This guard also auto-handles any future scheme tightening: no second list
of "deprecated schemes" to maintain.

Test: `load_saved_proxy_url_ignores_legacy_socks_scheme` writes a real
`proxy_state.json` with a `socks5://` entry and asserts the load returns
None, catching regressions in the validate→load wiring.
The post-save success popup in Settings → Preferences → Proxy previously
told users to "re-login to apply to the current Matrix session". This was
technically functional but semantically wrong: in Matrix terminology
"re-login" means logout + supply credentials again, whereas all the user
actually needs is to restart the app so a fresh Matrix Client is built
with the new proxy (PR #140's non-destructive restore preserves the
session across restarts).

Fix the misleading verb in both en.json and zh-CN.json. No code or UI
behavior changes — the popup still fires from the same save-success path.
@TigerInYourDream
Copy link
Copy Markdown
Author

Post-merge audit: what "inconsistency" actually meant

Recording for future reference — this is the concrete picture of what was inconsistent before this PR and what got aligned after. Useful when revisiting proxy decisions or debugging proxy-adjacent issues.

Pre-PR state (at commit 1d09b90f)

7 HTTP entry points, none agreed on how to handle proxy:

# Entry point Honored GUI proxy? Honored CLI --proxy? Had LAN bypass? Mechanism
1 build_client (main Matrix Client) env mutation + .proxy() (triple coverage)
2 restore_session ✅ (saved only) load_saved_proxy_url (skipped CLI override resolver)
3 URL preview download N/A reqwest::Client::new() — zero proxy awareness
4 build_discovery_http_client .proxy(), no bypass
5 DEFAULT_SSO_CLIENT prebuild ✅ via fallback build_client(&Cli::default(), …) — lost CLI args
6 Updater N/A matrix_sdk::reqwest::get() — zero proxy awareness
7 TSP N/A ClientBuilder::new() — zero proxy, no TLS min, no bypass

Headline stats:

  • 3 of 7 entries had zero proxy awareness (URL preview, Updater, TSP) — filling proxy in GUI did nothing for them
  • 1 of 7 dropped CLI override (restore_session) — --proxy worked on first login, was ignored on next launch's restore
  • 6 of 7 had no LAN bypass — filling a public proxy made LAN homeservers unreachable
  • 3 of 7 depended on process-env mutation — cross-platform fragility (shell-launch vs IDE-launch divergence)

User-visible bugs this caused

  1. Fill GUI proxy → main Matrix works ✓, but Check for updates fails silently ✗
  2. Fill GUI proxy → main Matrix works ✓, but URL previews / image downloads in rooms fail ✗
  3. Launch with --proxy http://… → first login works ✓, next launch's session restore fails ✗
  4. Fill GUI proxy and want to connect to LAN Palpo → fails ✗ (everything routed through public proxy)

Post-PR state

All 7 entries route through one helper:

build_policy_reqwest_client(
    resolve_effective_proxy_url(proxy_override).as_deref(),
    Some(timeout),
)

which automatically applies:

  • .proxy(...) with NoProxy bypass (RFC 1918 + loopback + link-local + IPv6 ULA)
  • .no_proxy() when no proxy is set
  • Shared POLICY_USER_AGENT
  • Shared TLS 1.2 minimum

TSP uses a different reqwest version, so it mirrors via apply_tsp_proxy_policy with equivalent semantics.

Result: all 7 entries agree on "does this go through proxy", "which proxy", and "is LAN bypassed" — 100% consistency on the three dimensions that matter.

Known accepted limitations (not bugs)

  • No hot-switch — matrix-sdk Client caches the reqwest Client, so GUI proxy changes require a restart. Documented in the "restart Robrix" save popup.
  • No socks5 support — reqwest is not compiled with the socks feature. 99% of users use Clash mixed-port (HTTP+SOCKS on same port) so http://127.0.0.1:7890 works.
  • System proxy (Wi-Fi settings on macOS/Windows, system VPN config) is not auto-detected — only TUN-mode VPN tools (OS-layer) work transparently without configuration; everything else requires explicit GUI fill.

Unrelated bugs surfaced during testing (filed separately)

Both are orthogonal to this PR and pre-existed it — discovered during manual testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant