Lumen is a self-hosted AI chat portal for research institutions. It lets your users chat with AI models through a web browser, while giving administrators control over who can access which models and how many tokens each user or group can spend.
Key features:
- Chat interface for AI models (OpenAI-compatible endpoints, Ollama, vLLM, etc.)
- Login via your institution's identity provider through CILogon
- Token budgets per user and group — with optional auto-refresh
- Admin panel to manage users, groups, and usage
- Round-robin load balancing across multiple model backends
- Docker and Docker Compose
- A public domain name (required for CILogon OAuth)
CILogon provides federated login for research institutions (universities, national labs, etc.).
- Register your application at https://cilogon.org/oauth2/register
- Set the callback URL to
https://your-domain/callback - Request these scopes:
openid email profile org.cilogon.userinfo - Note your
client_idandclient_secret
Copy the example config and edit it:
cp config.yaml.example lumen/config.yamlAt minimum, set:
app.secret_key— a long random stringoauth2.client_idandoauth2.client_secret— from CILogonoauth2.redirect_uri—https://your-domain/callbackadmins— your email addressmodels— at least one model endpoint (see below)
docker compose up -dLumen will be available at https://your-domain.
If you want to run Lumen locally without Docker or CILogon credentials:
uv synccp config.yaml.example config.yamlEdit config.yaml with at minimum:
app:
secret_key: "any-random-string"
encryption_key: "another-random-string"
database_url: sqlite:///lumen_dev.db
debug: true
dev_user: dev@example.com # bypasses OAuth — logs in as this email automaticallyAnd at least one model under models:. Two options:
Option A: Built-in echo server (no external dependencies)
The repo includes a lightweight echo server that mirrors your message back with sample math. Add this to your config.yaml:
models:
- name: dummy
active: true
input_cost_per_million: 0.0
output_cost_per_million: 0.0
endpoints:
- url: http://localhost:9999/v1
api_key: dummyStart it in a separate terminal before running Lumen:
uv run dummyOption B: Ollama (real local models)
Install Ollama, pull a model, and keep the llama3 entry in config.yaml pointing at http://localhost:11434/v1:
ollama pull llama3.2uv run flask db upgrade
uv run lumenVisit http://localhost:5000, click Login, and you'll be auto-logged in as dev@example.com.
Note: The
dev_useroption skips OAuth entirely. Remove it (or leave it empty) to use normal CILogon authentication.
app:
name: Lumen
tagline: Illuminating AI access
secret_key: change-me-to-something-random # any long random string; used for session cookies
encryption_key: change-me-to-something-different # separate secret used to hash user API keys
database_url: sqlite:///lumen.db # or a postgres:// URL
debug: falseencryption_key can also be supplied via the LUMEN_ENCRYPTION_KEY environment variable, which takes precedence over the value in config.yaml. This is useful for injecting secrets at deploy time (e.g. via Docker secrets or a Kubernetes secret) without writing them into the config file.
Warning: Rotating
encryption_key(orLUMEN_ENCRYPTION_KEY) invalidates all existing user API keys — users will need to generate new ones.
oauth2:
client_id: cilogon:/client_id/...
client_secret: ...
server_metadata_url: https://cilogon.org/.well-known/openid-configuration
redirect_uri: https://your-domain/callback
scopes: openid email profile org.cilogon.userinfo
# Optional: restrict login to one institution
# params:
# idphint: urn:mace:incommon:uiuc.eduadmins:
- you@example.eduAdmins have full access to the admin panel (users, groups, usage stats).
Each model entry defines a name users will see and one or more backend endpoints. Lumen round-robins across endpoints and skips unhealthy ones.
models:
- name: gpt-4o
active: true
input_cost_per_million: 5.0 # for usage tracking only
output_cost_per_million: 15.0
endpoints:
- url: https://api.openai.com/v1
api_key: sk-...
# model: gpt-4o # optional — overrides the name sent to this endpoint
- name: llama3
active: true
input_cost_per_million: 0.0
output_cost_per_million: 0.0
endpoints:
- url: http://localhost:11434/v1
api_key: ollama
model: llama3.2Set active: false to hide a model without removing it.
Groups control how many tokens users can spend. Every user gets the default group. You can create additional groups and assign users manually via the admin panel, or auto-assign them based on CILogon attributes.
groups:
default:
default: # applies to all models
max: 0 # token budget (0 = no access)
refresh: 0 # tokens added per hour (0 = no auto-refresh)
starting: 0 # tokens granted on first login
faculty:
default:
max: 1000000
refresh: 50000
starting: 1000000Automatically add users to a group at login based on their CILogon attributes (requires the org.cilogon.userinfo scope):
groups:
uiuc-staff:
rules:
- field: affiliation
contains: staff@illinois.edu # substring match
- field: idp
equals: urn:mace:incommon:uiuc.edu # exact match
default:
max: 500000
refresh: 10000Supported fields: affiliation, member_of, idp, ou. Groups assigned by rules are automatically removed if the rule no longer matches on next login.
chat:
remove: hide # "hide" = soft-delete (recoverable) | "delete" = permanentAll endpoints are rate-limited per authenticated user (API key ID for /v1/* routes, session user ID for /chat/* routes). The limit is a single string in flask-limiter notation (N per second/minute/hour):
rate_limiting:
limit: "30 per minute"
# storage_url: redis://localhost:6379/0 # optional; use Redis in multi-worker deploymentsBy default, limits are tracked in-memory (per-process). For multi-worker deployments (e.g. gunicorn with multiple workers), set storage_url to a shared Redis instance so limits are enforced across all workers. Changing storage_url requires a restart; changing limit takes effect within ~5 seconds (hot-reloaded).
Controls how SQLAlchemy manages database connections. The defaults (pool size 5, overflow 10) are fine for light use; increase them for production or high-concurrency workloads. Changes require a restart.
app:
db_pool:
pool_size: 20 # persistent connections kept open
max_overflow: 30 # burst connections allowed above pool_size
pool_timeout: 10 # seconds to wait for a free connection before returning an error
pool_recycle: 1800 # recycle connections after 30 min to avoid stale-connection errors
pool_pre_ping: true # test each connection before use; silently replaces stale onespool_size + max_overflow is the maximum number of simultaneous DB connections. For 50 concurrent requests, set these to at least 50 combined.