Skip to content

Commit 6211ad5

Browse files
committed
πŸ“Š Add metrics docs
1 parent 1d12d8b commit 6211ad5

2 files changed

Lines changed: 229 additions & 4 deletions

File tree

β€Ždocs/editor/authentication.mdβ€Ž

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,13 @@ Setting your password is as simple as defining an `env` variable:
2828
- <EnvVar group="auth" name="password" />
2929
- <EnvVar group="auth" name="password_hashed" />
3030

31-
::: info NOTE
32-
All sensitive environment variables are purged before deploying Kloud Workspace, ensuring
33-
that your password is absent from future shell sessions.
31+
::: info Automatic Cleanup
32+
33+
Authentication related environment variables are automatically removed once the workspace
34+
finishes starting up.
35+
36+
This means they will not be visible in your terminal sessions, child processes, or any
37+
application running inside the workspace.
3438
:::
3539

3640
```sh{2}

β€Ždocs/editor/metrics.mdβ€Ž

Lines changed: 222 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,226 @@
1+
---
2+
see:
3+
- name: Configuration
4+
link: /settings/configuration
5+
---
6+
17
# Metrics
28

39
![Metrics](/icons/grafana.svg){.doc-image}
410

5-
Coming soon...
11+
Kloud Workspace ships with a built-in **Prometheus-compatible metrics exporter** that
12+
exposes resource and session telemetry for your running workspace.
13+
14+
The exporter is **disabled by default** and must be explicitly enabled via an environment
15+
variable.
16+
17+
## Enabling Metrics
18+
19+
Set `WS_METRICS_ENABLE` to `true` when launching the container:
20+
21+
```sh{2}
22+
docker run \
23+
-e WS_METRICS_ENABLE=true \
24+
ghcr.io/kloudkit/workspace:v0.1.2
25+
```
26+
27+
Once enabled, the exporter serves metrics at `/` on port **9100** *(default)*.
28+
29+
## Configuration
30+
31+
Three environment variables control the metrics exporter:
32+
33+
- <EnvVar group="metrics" name="enable" />
34+
- <EnvVar group="metrics" name="port" />
35+
- <EnvVar group="metrics" name="collectors" />
36+
37+
To expose metrics on a custom port with only CPU and memory collectors:
38+
39+
```sh{2-4}
40+
docker run \
41+
-e WS_METRICS_ENABLE=true \
42+
-e WS_METRICS_PORT=9200 \
43+
-e WS_METRICS_COLLECTORS="container.cpu,container.memory" \
44+
ghcr.io/kloudkit/workspace:v0.1.2
45+
```
46+
47+
## Collectors
48+
49+
Collectors are organized hierarchically.
50+
Enabling a parent collector *(e.g. `container`)* implicitly enables all its children.
51+
52+
| Collector | Description |
53+
| ---------------------- | ----------------------------------- |
54+
| `workspace` | All workspace metrics |
55+
| `workspace.info` | Workspace build info |
56+
| `workspace.session` | Session timestamp and uptime |
57+
| `workspace.extensions` | Installed extensions count |
58+
| `container` | All container metrics |
59+
| `container.cpu` | CPU usage metrics |
60+
| `container.memory` | Memory usage metrics |
61+
| `container.fs` | Filesystem usage metrics |
62+
| `container.fd` | File descriptor metrics |
63+
| `container.pids` | Process metrics |
64+
| `pressure` | All pressure metrics *(cgroup v2)* |
65+
| `pressure.cpu` | CPU pressure metrics |
66+
| `pressure.memory` | Memory pressure metrics |
67+
| `pressure.io` | IO pressure metrics |
68+
| `network` | Network metrics |
69+
| `sockets` | Socket metrics |
70+
| `io` | IO metrics |
71+
| `gpu` | GPU metrics *(requires nvidia-smi)* |
72+
73+
When unset or `*`, all collectors are enabled *(gpu only when hardware is available)*.
74+
75+
## Exposed Metrics
76+
77+
### Workspace
78+
79+
| Metric | Type | Description |
80+
| ------------------------------------------------- | ----- | -------------------------------------------------- |
81+
| `workspace_info` | gauge | Build info *(labels: `version`, `vscode_version`)* |
82+
| `workspace_session_initialized_timestamp_seconds` | gauge | Unix timestamp when initialized |
83+
| `workspace_session_uptime_seconds` | gauge | Seconds since initialization |
84+
| `workspace_extensions_installed_total` | gauge | VS Code extensions installed |
85+
86+
### Container β€” CPU
87+
88+
| Metric | Type | Description |
89+
| ------------------------------------------------- | ------- | ---------------------------- |
90+
| `workspace_container_cpu_usage_seconds_total` | counter | Total CPU time consumed |
91+
| `workspace_container_cpu_user_seconds_total` | counter | CPU time in user mode |
92+
| `workspace_container_cpu_system_seconds_total` | counter | CPU time in system mode |
93+
| `workspace_container_cpu_periods_total` | counter | Total CPU scheduling periods |
94+
| `workspace_container_cpu_throttled_periods_total` | counter | Throttled CPU periods |
95+
| `workspace_container_cpu_throttled_seconds_total` | counter | Total time throttled |
96+
97+
### Container β€” Memory
98+
99+
| Metric | Type | Description |
100+
| --------------------------------------------- | ------- | -------------------------- |
101+
| `workspace_container_memory_usage_bytes` | gauge | Current memory usage |
102+
| `workspace_container_memory_limit_bytes` | gauge | Memory limit |
103+
| `workspace_container_memory_rss_bytes` | gauge | Resident set size |
104+
| `workspace_container_memory_cache_bytes` | gauge | Page cache memory |
105+
| `workspace_container_memory_swap_bytes` | gauge | Swap usage |
106+
| `workspace_container_memory_swap_limit_bytes` | gauge | Swap limit |
107+
| `workspace_container_memory_anon_bytes` | gauge | Anonymous memory |
108+
| `workspace_container_memory_kernel_bytes` | gauge | Kernel memory |
109+
| `workspace_container_memory_slab_bytes` | gauge | Slab allocator memory |
110+
| `workspace_container_memory_oom_total` | counter | OOM events |
111+
| `workspace_container_memory_oom_kill_total` | counter | OOM kill events |
112+
| `workspace_container_memory_max_total` | counter | Times memory limit was hit |
113+
114+
### Container β€” Filesystem
115+
116+
| Metric | Type | Description |
117+
| ------------------------------------ | ----- | ----------------------------------- |
118+
| `workspace_container_fs_usage_bytes` | gauge | Filesystem usage on `/workspace` |
119+
| `workspace_container_fs_limit_bytes` | gauge | Filesystem capacity on `/workspace` |
120+
121+
### Container β€” File Descriptors
122+
123+
| Metric | Type | Description |
124+
| -------------------------------------------- | ----- | --------------------- |
125+
| `workspace_container_file_descriptors_open` | gauge | Open file descriptors |
126+
| `workspace_container_file_descriptors_limit` | gauge | File descriptor limit |
127+
128+
### Container β€” Processes
129+
130+
| Metric | Type | Description |
131+
| ---------------------------------- | ----- | --------------------------- |
132+
| `workspace_container_pids_current` | gauge | Current number of processes |
133+
| `workspace_container_pids_limit` | gauge | Process limit |
134+
135+
### Pressure β€” CPU
136+
137+
| Metric | Type | Description |
138+
| ---------------------------------------------- | ------- | ----------------------------------- |
139+
| `workspace_pressure_cpu_waiting_seconds_total` | counter | Total time tasks waited for CPU |
140+
| `workspace_pressure_cpu_stalled_seconds_total` | counter | Total time all tasks stalled on CPU |
141+
142+
### Pressure β€” Memory
143+
144+
| Metric | Type | Description |
145+
| ------------------------------------------------- | ------- | -------------------------------------- |
146+
| `workspace_pressure_memory_waiting_seconds_total` | counter | Total time tasks waited for memory |
147+
| `workspace_pressure_memory_stalled_seconds_total` | counter | Total time all tasks stalled on memory |
148+
149+
### Pressure β€” IO
150+
151+
| Metric | Type | Description |
152+
| --------------------------------------------- | ------- | ----------------------------------- |
153+
| `workspace_pressure_io_waiting_seconds_total` | counter | Total time tasks waited for I/O |
154+
| `workspace_pressure_io_stalled_seconds_total` | counter | Total time all tasks stalled on I/O |
155+
156+
::: tip
157+
Pressure metrics require **cgroup v2** and are sourced from the
158+
[PSI (Pressure Stall Information)](https://docs.kernel.org/accounting/psi.html)
159+
subsystem.
160+
:::
161+
162+
### Network
163+
164+
| Metric | Type | Description |
165+
| ------------------------------------------ | ------- | ------------------------- |
166+
| `workspace_network_receive_bytes_total` | counter | Total bytes received |
167+
| `workspace_network_transmit_bytes_total` | counter | Total bytes transmitted |
168+
| `workspace_network_receive_packets_total` | counter | Total packets received |
169+
| `workspace_network_transmit_packets_total` | counter | Total packets transmitted |
170+
| `workspace_network_receive_errors_total` | counter | Total receive errors |
171+
| `workspace_network_transmit_errors_total` | counter | Total transmit errors |
172+
173+
### Sockets
174+
175+
| Metric | Type | Description |
176+
| ----------------------------------- | ----- | --------------------------- |
177+
| `workspace_sockets_tcp_established` | gauge | Established TCP connections |
178+
| `workspace_sockets_tcp_listen` | gauge | Listening TCP sockets |
179+
| `workspace_sockets_udp` | gauge | UDP sockets |
180+
181+
### IO
182+
183+
| Metric | Type | Description |
184+
| -------------------------------- | ------- | --------------------------- |
185+
| `workspace_io_read_bytes_total` | counter | Total bytes read from disk |
186+
| `workspace_io_write_bytes_total` | counter | Total bytes written to disk |
187+
| `workspace_io_read_ops_total` | counter | Total disk read operations |
188+
| `workspace_io_write_ops_total` | counter | Total disk write operations |
189+
190+
### GPU
191+
192+
| Metric | Type | Description |
193+
| ----------------------------------- | ----- | ----------------------- |
194+
| `workspace_gpu_utilization_ratio` | gauge | GPU utilization *(0–1)* |
195+
| `workspace_gpu_memory_used_bytes` | gauge | GPU memory used |
196+
| `workspace_gpu_memory_total_bytes` | gauge | GPU memory total |
197+
| `workspace_gpu_temperature_celsius` | gauge | GPU temperature |
198+
| `workspace_gpu_power_watts` | gauge | GPU power consumption |
199+
200+
::: tip
201+
202+
GPU metrics require `nvidia-smi` to be available in the container.
203+
204+
When the GPU collector is enabled but no GPU hardware is detected, these metrics are
205+
silently omitted.
206+
207+
:::
208+
209+
## Scraping and Visualization
210+
211+
::: tip
212+
Prometheus and Grafana are **not bundled** with Kloud Workspace.
213+
214+
The workspace only exposes a `/metrics`-compatible endpoint, scraping, alerting, and
215+
dashboards are managed by your own infrastructure.
216+
:::
217+
218+
Point your Prometheus instance at the workspace metrics port:
219+
220+
```yaml
221+
scrape_configs:
222+
- job_name: workspace
223+
static_configs:
224+
- targets:
225+
- <workspace-host>:9100
226+
```

0 commit comments

Comments
Β (0)