This repo includes a small k6 suite for exercising both the public frontend and the backend API at the same time.
The suite is designed to answer two related questions:
- Can the rendered frontend continue serving page traffic under load?
- Can the API continue serving search/resource traffic under load?
By default, the k6 targets hit dev1:
https://lib-btaageoapi-dev-app-01.oit.umn.edu
The frontend scenario requests:
//manifest.webmanifest/registerSW.js/search?q=<query>/search?q=<query>&include_filters[...][]=.../resources/<resource_id>
This gives us light coverage of the public page shell, static frontend assets, search results page rendering, faceted search page rendering, and resource detail rendering.
The API scenario requests:
/api/v1/search/api/v1/searchwith live discoveredinclude_filters[...][]facet params/api/v1/search/facets/<facet_name>/api/v1/suggest/api/v1/resources//api/v1/resources/<resource_id>/api/v1/home/blog-posts
This gives us direct backend pressure on the core search/resource endpoints plus the homepage blog payload used by the frontend. The faceted calls are built from live facet values returned by the seed search response so the suite keeps testing realistic filter combinations instead of relying on brittle hard-coded facet values.
Run one end-to-end iteration:
make k6-smokeRun the default concurrent frontend + API stress mix:
make k6-stressThe suite will use a local k6 binary when present. If k6 is not installed,
the Makefile falls back to Docker with grafana/k6:latest.
Summaries are written to:
tmp/k6/smoke-summary.jsontmp/k6/stress-summary.json
make k6-stress K6_BASE_URL=https://lib-btaageoapi-dev-app-01.oit.umn.edu
make k6-stress K6_BASE_URL=https://lib-geoportal-prd-web-01.oit.umn.edumake k6-stress K6_QUERY=chicago
make k6-stress K6_RESOURCE_ID=p16022coll206:283If K6_RESOURCE_ID is not provided, the suite discovers one from the first
search result for K6_QUERY.
For varied-query stress runs, pass a comma- or pipe-separated query pool. The
seed search still uses K6_QUERY to discover a stable resource id and facet
values, while each frontend/API iteration rotates the actual search and suggest
terms through K6_QUERY_POOL:
make k6-stress K6_QUERY_POOL="minnesota|chicago|water|roads|imagery"make k6-stress K6_FRONTEND_TARGET_VUS=6 K6_API_TARGET_VUS=12
make k6-stress K6_FRONTEND_HOLD=5m K6_API_HOLD=5mAvailable knobs:
K6_FRONTEND_TARGET_VUSK6_FRONTEND_RAMP_UPK6_FRONTEND_HOLDK6_FRONTEND_RAMP_DOWNK6_FRONTEND_THINK_TIME_SECONDSK6_FRONTEND_P95_THRESHOLD_MSK6_FRONTEND_P99_THRESHOLD_MSK6_API_TARGET_VUSK6_API_RAMP_UPK6_API_HOLDK6_API_RAMP_DOWNK6_API_THINK_TIME_SECONDSK6_API_P95_THRESHOLD_MSK6_API_P99_THRESHOLD_MSK6_QUERYK6_QUERY_POOLK6_SUGGEST_QUERYK6_RESOURCE_IDK6_API_KEYK6_SEARCH_PER_PAGEK6_CACHE_BUST_SEARCHK6_CACHE_BUST_SEEDK6_ENDPOINT_BREAKDOWNK6_SEARCH_DIAGNOSTICS
Use make k6-endpoint-capacity when you need a defensible endpoint p95 at a
known request rate instead of a blended scenario p95 from VU-based stress. The
target endpoint is exercised with k6's constant-arrival-rate executor, so
K6_REQUEST_RATE=50 means k6 tries to start 50 iterations per second for that
single endpoint:
make k6-endpoint-capacity \
K6_ENDPOINT_TARGET=frontend_search_results_api \
K6_REQUEST_RATE=50 \
K6_ENDPOINT_DURATION=3m \
K6_BASE_URL=https://lib-geoportal-prd-web-01.oit.umn.edu \
K6_API_KEY=...Supported K6_ENDPOINT_TARGET values:
frontend_search_results_apifrontend_faceted_search_results_apifrontend_resource_pageapi_searchapi_faceted_search
Useful fixed-rate knobs:
K6_REQUEST_RATEK6_RATE_TIME_UNITK6_ENDPOINT_DURATIONK6_PRE_ALLOCATED_VUSK6_MAX_VUSK6_ENDPOINT_P95_THRESHOLD_MSK6_ENDPOINT_P99_THRESHOLD_MS
When you want to bypass the URL-level endpoint response cache for search/facet requests, enable cache busting for search-like requests:
make k6-stress K6_CACHE_BUST_SEARCH=1On production-like environments with RATE_LIMIT_ENABLED=true, pass a
service-tier key so the direct API scenario measures backend capacity instead
of the anonymous 10 requests/minute throttle:
make k6-stress K6_BASE_URL=https://lib-geoportal-prd-web-01.oit.umn.edu K6_API_KEY=...For local repeat runs, store a k6-only key in an ignored file such as
tmp/k6/prd-k6.env:
set -a
. tmp/k6/prd-k6.env
set +a
make k6-stress K6_BASE_URL=https://lib-geoportal-prd-web-01.oit.umn.edudev2 is the rate-limiting capacity proving ground. It keeps rate limiting
enabled and caps per-worker database pools so k6 can exercise the production
API-key path without using production traffic:
make k6-stress K6_BASE_URL=https://geodev.btaa.org K6_API_KEY=...The relevant server-side knobs are:
API_KEY_TIER_CACHE_TTL_SECONDS: short process-local cache for successful API-key/tier lookups.API_KEY_LAST_USED_UPDATE_INTERVAL_SECONDS: minimum interval betweenapi_keys.last_used_atwrites per key/process.DB_POOL_MAX: cap for the shareddatabasespool in each API worker.SQLALCHEMY_ASYNC_POOL_SIZEandSQLALCHEMY_ASYNC_MAX_OVERFLOW: caps for SQLAlchemy async engine pools used by search/resource/cache helpers.
This appends an ignored k6cb=... query param to:
/search- faceted
/search /api/v1/search- faceted
/api/v1/search /api/v1/search/facets/<facet_name>
That preserves the existing API/HTML payload shape while forcing unique
search/facet request URLs. The backend may still reuse semantic search-result
cache entries when the actual query/filter/page/sort inputs are unchanged; this
is intentional because it represents protection against transport noise and
duplicated frontend/API/QGIS/MCP search intent. Disable SEARCH_RESULT_CACHE
only when you need to measure the full search-result assembly miss path.
The cache-bust token includes a generated per-run seed so repeated smoke/stress
runs do not accidentally reuse old endpoint-cache entries. Set
K6_CACHE_BUST_SEED=<value> only when you intentionally need reproducible URLs.
Add K6_ENDPOINT_BREAKDOWN=1 when you need per-endpoint p95/p99 rows in
the k6 summary. This is useful after scenario-level thresholds fail and you
need to see which tagged endpoint is carrying the tail latency.
Search diagnostics are enabled by default with K6_SEARCH_DIAGNOSTICS=1.
Search-like requests record custom k6 metrics from X-Search-Semantic-Cache
and Server-Timing headers:
search_semantic_cache_hit_ratesearch_semantic_cache_hitssearch_semantic_cache_missessearch_semantic_cache_observedsearch_semantic_cache_unknownsearch_response_durationsearch_server_timing_totalsearch_server_timing_searchsearch_server_timing_response_buildsearch_server_timing_resource_cache_lookupsearch_server_timing_db_fallbacksearch_server_timing_miss_prefetchsearch_server_timing_miss_buildsearch_server_timing_semantic_cache_lookupsearch_server_timing_semantic_cache_waitsearch_server_timing_semantic_cache_store
These metrics help distinguish true cache misses from cache hits that are slow
because of worker queueing, frontend fetch timing, or downstream response
composition. Disable them with K6_SEARCH_DIAGNOSTICS=0 only when you need the
lowest possible test-client overhead.
The frontend scenario treats /search as a browser flow: it requests the HTML
shell and then requests /search/results for the JSON data that the hydrated
client fetches through the keyed frontend BFF route. This keeps the API-key
throttling path represented in frontend load tests without blocking SSR on the
search payload. In Kamal single-host deployments, nginx handles the exact
/search/results JSON route directly: it injects the server-side API key and
proxies to the internal FastAPI pool. That keeps the browser/API-key contract
intact while avoiding the React Router worker queue for facet-heavy result
payloads. For keyed k6 runs, /search/results preserves the caller's
X-API-Key and does not add the frontend Turnstile gate markers, so mixed tests
can exercise the frontend JSON route without needing a Cloudflare browser
challenge session.
When mixed frontend/API stress shows high client-observed latency but low
search_server_timing_total, check for worker queue contention before changing
search payload shape. The Kamal web container separates public API traffic from
frontend BFF traffic with:
WEB_UVICORN_WORKERS: public/api/...FastAPI workers.WEB_INTERNAL_UVICORN_WORKERS: loopback-only FastAPI workers used by SSR/BFF fetches throughAPI_BASE_URLand nginx's direct/search/resultsBFF proxy.
On the current 12-vCPU production host, the serving profile is:
- Public API:
WEB_UVICORN_WORKERS=4 - Internal frontend API pool:
WEB_INTERNAL_UVICORN_WORKERS=6 - Frontend SSR/BFF:
WEB_SSR_WORKERS=4 - Web container ceiling:
cpus: 8,memory: 5120m
The worker container is capped at cpus: 1.75, giving Celery more room for
bridge syncs, cache warming, and maintenance while still reserving host CPU for
Elasticsearch, Postgres, Redis, and the OS. The 8-vCPU dev hosts keep the
smaller 3 / 4 / 3 web-worker profile with web cpus: 5 unless a test run
explicitly changes those destination overrides. The next validation step after
changing these values is to rerun the mixed 18 API VUs + 6 frontend VUs
profile and compare API p95 against the API-only baseline.
The production Postgres accessory currently has max_connections=100, so each
Kamal destination must keep per-process DB pools bounded. Production uses:
DB_POOL_MAX=2for the shareddatabasesasync pool.SQLALCHEMY_ASYNC_POOL_SIZE=1andSQLALCHEMY_ASYNC_MAX_OVERFLOW=0for SQLAlchemy async engines.SQLALCHEMY_SYNC_POOL_SIZE=1andSQLALCHEMY_SYNC_MAX_OVERFLOW=0for sync SQLAlchemy engines used by request-path thumbnail/visual-asset helpers.
These caps intentionally trade some request queueing inside each app process
for a hard ceiling on database connections. If p95 rises under load, increase
Postgres max_connections or add a pooler before raising these per-process
pool values.
If the API pool remains healthy but frontend resource page tails stay high,
check the SSR layer next. WEB_SSR_WORKERS controls how many local
react-router-serve processes nginx balances across. Raising this reduces
single-process Node queueing for HTML routes, while /search/results data
traffic should stay on the nginx -> internal FastAPI path.
These backend env vars are useful when tuning facet-heavy search behavior:
SEARCH_FACET_CACHE_TTLControls how long normalized search facet blocks and facet-value buckets stay hot in Redis. Default:3600seconds.SEARCH_RESULT_CACHEEnables the semantic/api/v1/searchresponse-core cache below the endpoint response cache. Default:true.SEARCH_RESULT_CACHE_TTLControls how long semantic search response cores stay hot in Redis. Default: same asSEARCH_CACHE_TTL(3600seconds).SEARCH_RESULT_CACHE_VERSIONBumps only the semantic search response-core namespace when the cached core format changes. Default:v1.SEARCH_RESULT_CACHE_LOCK_WAIT_SECONDSControls how briefly a request waits for another worker to fill the semantic cache before computing the response itself. Default:0.25seconds.SEARCH_TIMING_LOG_THRESHOLD_MSControls when the backend logs aggregation timing summaries atinfoinstead ofdebug. Default:750milliseconds.SEARCH_RESPONSE_TIMING_LOG_THRESHOLD_MSControls when/api/v1/searchresponse-assembly timing logs are emitted atinfoinstead ofdebug. Default:750milliseconds.SEARCH_TIMING_HEADERSAdds liveServer-TimingandX-Search-Semantic-Cacheheaders to search responses. Default:true. These diagnostics are not stored in endpoint response-cache records.
The new aggregation timing logs are emitted from the backend search layer and distinguish cache hits from full Elasticsearch aggregation misses, which makes it easier to compare warm and miss-path behavior during k6 runs.
The API search endpoint also emits search_response_timing logs that break down
the /api/v1/search assembly path into semantic-cache lookup/wait/store, search,
representation-cache lookup, DB fallback, miss prefetch, miss build, and
response-build stages.
make k6-stress K6_ENABLE_FRONTEND=0
make k6-stress K6_ENABLE_API=0At least one of K6_ENABLE_FRONTEND or K6_ENABLE_API must remain enabled.
For warm-cache performance runs on dev1, prime the major caches first:
make kamal-prime-resource-cache KAMAL_DEST=dev1
make kamal-prime-visual-caches KAMAL_DEST=dev1That helps separate cache-warming misses from steady-state behavior when you are collecting comparison baselines.
When the Makefile falls back to Docker, the k6 container can hit public URLs without extra setup. If you want to point the suite at a local app instead of a public host, use a Docker-reachable hostname such as:
make k6-smoke K6_BASE_URL=http://host.docker.internal:8000