|
| 1 | +--- |
| 2 | +see: |
| 3 | + - name: Configuration |
| 4 | + link: /settings/configuration |
| 5 | +--- |
| 6 | + |
1 | 7 | # Metrics |
2 | 8 |
|
3 | 9 | {.doc-image} |
4 | 10 |
|
5 | | -Coming soon... |
| 11 | +Kloud Workspace ships with a built-in **Prometheus-compatible metrics exporter** that |
| 12 | +exposes resource and session telemetry for your running workspace. |
| 13 | + |
| 14 | +The exporter is **disabled by default** and must be explicitly enabled via an environment |
| 15 | +variable. |
| 16 | + |
| 17 | +## Enabling Metrics |
| 18 | + |
| 19 | +Set `WS_METRICS_ENABLE` to `true` when launching the container: |
| 20 | + |
| 21 | +```sh{2} |
| 22 | +docker run \ |
| 23 | + -e WS_METRICS_ENABLE=true \ |
| 24 | + ghcr.io/kloudkit/workspace:v0.1.2 |
| 25 | +``` |
| 26 | + |
| 27 | +Once enabled, the exporter serves metrics at `/` on port **9100** *(default)*. |
| 28 | + |
| 29 | +## Configuration |
| 30 | + |
| 31 | +Three environment variables control the metrics exporter: |
| 32 | + |
| 33 | +- <EnvVar group="metrics" name="enable" /> |
| 34 | +- <EnvVar group="metrics" name="port" /> |
| 35 | +- <EnvVar group="metrics" name="collectors" /> |
| 36 | + |
| 37 | +To expose metrics on a custom port with only CPU and memory collectors: |
| 38 | + |
| 39 | +```sh{2-4} |
| 40 | +docker run \ |
| 41 | + -e WS_METRICS_ENABLE=true \ |
| 42 | + -e WS_METRICS_PORT=9200 \ |
| 43 | + -e WS_METRICS_COLLECTORS="container.cpu,container.memory" \ |
| 44 | + ghcr.io/kloudkit/workspace:v0.1.2 |
| 45 | +``` |
| 46 | + |
| 47 | +## Collectors |
| 48 | + |
| 49 | +Collectors are organized hierarchically. |
| 50 | +Enabling a parent collector *(e.g. `container`)* implicitly enables all its children. |
| 51 | + |
| 52 | +| Collector | Description | |
| 53 | +| ---------------------- | ----------------------------------- | |
| 54 | +| `workspace` | All workspace metrics | |
| 55 | +| `workspace.info` | Workspace build info | |
| 56 | +| `workspace.session` | Session timestamp and uptime | |
| 57 | +| `workspace.extensions` | Installed extensions count | |
| 58 | +| `container` | All container metrics | |
| 59 | +| `container.cpu` | CPU usage metrics | |
| 60 | +| `container.memory` | Memory usage metrics | |
| 61 | +| `container.fs` | Filesystem usage metrics | |
| 62 | +| `container.fd` | File descriptor metrics | |
| 63 | +| `container.pids` | Process metrics | |
| 64 | +| `pressure` | All pressure metrics *(cgroup v2)* | |
| 65 | +| `pressure.cpu` | CPU pressure metrics | |
| 66 | +| `pressure.memory` | Memory pressure metrics | |
| 67 | +| `pressure.io` | IO pressure metrics | |
| 68 | +| `network` | Network metrics | |
| 69 | +| `sockets` | Socket metrics | |
| 70 | +| `io` | IO metrics | |
| 71 | +| `gpu` | GPU metrics *(requires nvidia-smi)* | |
| 72 | + |
| 73 | +When unset or `*`, all collectors are enabled *(gpu only when hardware is available)*. |
| 74 | + |
| 75 | +## Exposed Metrics |
| 76 | + |
| 77 | +### Workspace |
| 78 | + |
| 79 | +| Metric | Type | Description | |
| 80 | +| ------------------------------------------------- | ----- | -------------------------------------------------- | |
| 81 | +| `workspace_info` | gauge | Build info *(labels: `version`, `vscode_version`)* | |
| 82 | +| `workspace_session_initialized_timestamp_seconds` | gauge | Unix timestamp when initialized | |
| 83 | +| `workspace_session_uptime_seconds` | gauge | Seconds since initialization | |
| 84 | +| `workspace_extensions_installed_total` | gauge | VS Code extensions installed | |
| 85 | + |
| 86 | +### Container β CPU |
| 87 | + |
| 88 | +| Metric | Type | Description | |
| 89 | +| ------------------------------------------------- | ------- | ---------------------------- | |
| 90 | +| `workspace_container_cpu_usage_seconds_total` | counter | Total CPU time consumed | |
| 91 | +| `workspace_container_cpu_user_seconds_total` | counter | CPU time in user mode | |
| 92 | +| `workspace_container_cpu_system_seconds_total` | counter | CPU time in system mode | |
| 93 | +| `workspace_container_cpu_periods_total` | counter | Total CPU scheduling periods | |
| 94 | +| `workspace_container_cpu_throttled_periods_total` | counter | Throttled CPU periods | |
| 95 | +| `workspace_container_cpu_throttled_seconds_total` | counter | Total time throttled | |
| 96 | + |
| 97 | +### Container β Memory |
| 98 | + |
| 99 | +| Metric | Type | Description | |
| 100 | +| --------------------------------------------- | ------- | -------------------------- | |
| 101 | +| `workspace_container_memory_usage_bytes` | gauge | Current memory usage | |
| 102 | +| `workspace_container_memory_limit_bytes` | gauge | Memory limit | |
| 103 | +| `workspace_container_memory_rss_bytes` | gauge | Resident set size | |
| 104 | +| `workspace_container_memory_cache_bytes` | gauge | Page cache memory | |
| 105 | +| `workspace_container_memory_swap_bytes` | gauge | Swap usage | |
| 106 | +| `workspace_container_memory_swap_limit_bytes` | gauge | Swap limit | |
| 107 | +| `workspace_container_memory_anon_bytes` | gauge | Anonymous memory | |
| 108 | +| `workspace_container_memory_kernel_bytes` | gauge | Kernel memory | |
| 109 | +| `workspace_container_memory_slab_bytes` | gauge | Slab allocator memory | |
| 110 | +| `workspace_container_memory_oom_total` | counter | OOM events | |
| 111 | +| `workspace_container_memory_oom_kill_total` | counter | OOM kill events | |
| 112 | +| `workspace_container_memory_max_total` | counter | Times memory limit was hit | |
| 113 | + |
| 114 | +### Container β Filesystem |
| 115 | + |
| 116 | +| Metric | Type | Description | |
| 117 | +| ------------------------------------ | ----- | ----------------------------------- | |
| 118 | +| `workspace_container_fs_usage_bytes` | gauge | Filesystem usage on `/workspace` | |
| 119 | +| `workspace_container_fs_limit_bytes` | gauge | Filesystem capacity on `/workspace` | |
| 120 | + |
| 121 | +### Container β File Descriptors |
| 122 | + |
| 123 | +| Metric | Type | Description | |
| 124 | +| -------------------------------------------- | ----- | --------------------- | |
| 125 | +| `workspace_container_file_descriptors_open` | gauge | Open file descriptors | |
| 126 | +| `workspace_container_file_descriptors_limit` | gauge | File descriptor limit | |
| 127 | + |
| 128 | +### Container β Processes |
| 129 | + |
| 130 | +| Metric | Type | Description | |
| 131 | +| ---------------------------------- | ----- | --------------------------- | |
| 132 | +| `workspace_container_pids_current` | gauge | Current number of processes | |
| 133 | +| `workspace_container_pids_limit` | gauge | Process limit | |
| 134 | + |
| 135 | +### Pressure β CPU |
| 136 | + |
| 137 | +| Metric | Type | Description | |
| 138 | +| ---------------------------------------------- | ------- | ----------------------------------- | |
| 139 | +| `workspace_pressure_cpu_waiting_seconds_total` | counter | Total time tasks waited for CPU | |
| 140 | +| `workspace_pressure_cpu_stalled_seconds_total` | counter | Total time all tasks stalled on CPU | |
| 141 | + |
| 142 | +### Pressure β Memory |
| 143 | + |
| 144 | +| Metric | Type | Description | |
| 145 | +| ------------------------------------------------- | ------- | -------------------------------------- | |
| 146 | +| `workspace_pressure_memory_waiting_seconds_total` | counter | Total time tasks waited for memory | |
| 147 | +| `workspace_pressure_memory_stalled_seconds_total` | counter | Total time all tasks stalled on memory | |
| 148 | + |
| 149 | +### Pressure β IO |
| 150 | + |
| 151 | +| Metric | Type | Description | |
| 152 | +| --------------------------------------------- | ------- | ----------------------------------- | |
| 153 | +| `workspace_pressure_io_waiting_seconds_total` | counter | Total time tasks waited for I/O | |
| 154 | +| `workspace_pressure_io_stalled_seconds_total` | counter | Total time all tasks stalled on I/O | |
| 155 | + |
| 156 | +::: tip |
| 157 | +Pressure metrics require **cgroup v2** and are sourced from the |
| 158 | +[PSI (Pressure Stall Information)](https://docs.kernel.org/accounting/psi.html) |
| 159 | +subsystem. |
| 160 | +::: |
| 161 | + |
| 162 | +### Network |
| 163 | + |
| 164 | +| Metric | Type | Description | |
| 165 | +| ------------------------------------------ | ------- | ------------------------- | |
| 166 | +| `workspace_network_receive_bytes_total` | counter | Total bytes received | |
| 167 | +| `workspace_network_transmit_bytes_total` | counter | Total bytes transmitted | |
| 168 | +| `workspace_network_receive_packets_total` | counter | Total packets received | |
| 169 | +| `workspace_network_transmit_packets_total` | counter | Total packets transmitted | |
| 170 | +| `workspace_network_receive_errors_total` | counter | Total receive errors | |
| 171 | +| `workspace_network_transmit_errors_total` | counter | Total transmit errors | |
| 172 | + |
| 173 | +### Sockets |
| 174 | + |
| 175 | +| Metric | Type | Description | |
| 176 | +| ----------------------------------- | ----- | --------------------------- | |
| 177 | +| `workspace_sockets_tcp_established` | gauge | Established TCP connections | |
| 178 | +| `workspace_sockets_tcp_listen` | gauge | Listening TCP sockets | |
| 179 | +| `workspace_sockets_udp` | gauge | UDP sockets | |
| 180 | + |
| 181 | +### IO |
| 182 | + |
| 183 | +| Metric | Type | Description | |
| 184 | +| -------------------------------- | ------- | --------------------------- | |
| 185 | +| `workspace_io_read_bytes_total` | counter | Total bytes read from disk | |
| 186 | +| `workspace_io_write_bytes_total` | counter | Total bytes written to disk | |
| 187 | +| `workspace_io_read_ops_total` | counter | Total disk read operations | |
| 188 | +| `workspace_io_write_ops_total` | counter | Total disk write operations | |
| 189 | + |
| 190 | +### GPU |
| 191 | + |
| 192 | +| Metric | Type | Description | |
| 193 | +| ----------------------------------- | ----- | ----------------------- | |
| 194 | +| `workspace_gpu_utilization_ratio` | gauge | GPU utilization *(0β1)* | |
| 195 | +| `workspace_gpu_memory_used_bytes` | gauge | GPU memory used | |
| 196 | +| `workspace_gpu_memory_total_bytes` | gauge | GPU memory total | |
| 197 | +| `workspace_gpu_temperature_celsius` | gauge | GPU temperature | |
| 198 | +| `workspace_gpu_power_watts` | gauge | GPU power consumption | |
| 199 | + |
| 200 | +::: tip |
| 201 | + |
| 202 | +GPU metrics require `nvidia-smi` to be available in the container. |
| 203 | + |
| 204 | +When the GPU collector is enabled but no GPU hardware is detected, these metrics are |
| 205 | +silently omitted. |
| 206 | + |
| 207 | +::: |
| 208 | + |
| 209 | +## Scraping and Visualization |
| 210 | + |
| 211 | +::: tip |
| 212 | +Prometheus and Grafana are **not bundled** with Kloud Workspace. |
| 213 | + |
| 214 | +The workspace only exposes a `/metrics`-compatible endpoint, scraping, alerting, and |
| 215 | +dashboards are managed by your own infrastructure. |
| 216 | +::: |
| 217 | + |
| 218 | +Point your Prometheus instance at the workspace metrics port: |
| 219 | + |
| 220 | +```yaml |
| 221 | +scrape_configs: |
| 222 | + - job_name: workspace |
| 223 | + static_configs: |
| 224 | + - targets: |
| 225 | + - <workspace-host>:9100 |
| 226 | +``` |
0 commit comments