Skip to content

KeyAuth.__is_token_expired uses wrong leeway direction, causing persistent 401 errors #3535

@Seyda92

Description

@Seyda92

Bug Report: KeyAuth.__is_token_expired uses wrong leeway direction, causing persistent 401 errors

Repository: stackitcloud/stackit-sdk-python
Component: core/src/stackit/core/auth_methods/key_auth.py
Affected version: stackit-core 0.2.0 (also present on main branch as of 2026-03-23)
Severity: High – silently causes all API requests to fail with HTTP 401 for extended periods


Summary

The EXPIRATION_LEEWAY in KeyAuth.__is_token_expired is added to the token's exp claim instead of subtracted. This inverts the intended behavior: rather than triggering a proactive refresh before the token expires, the method considers the token valid for an additional 5 minutes after it has already expired. During this window, every API request is sent with an expired bearer token and receives a 401 Unauthorized response.

Root Cause

In key_auth.py, __is_token_expired:

EXPIRATION_LEEWAY = timedelta(minutes=5)

def __is_token_expired(self, token: str) -> bool:
    try:
        decoded_token = jwt.decode(token, options={"verify_signature": False})
        exp = decoded_token.get("exp")
        if exp:
            return time.time() > (exp + self.EXPIRATION_LEEWAY.total_seconds())
            #                          ^^^
            #  BUG: should be minus (-) to refresh BEFORE expiry
    except jwt.ExpiredSignatureError:
        return True
    except jwt.DecodeError:
        return True
    return False

Current behavior: time.time() > (exp + 300) → token is considered expired 5 minutes after actual JWT expiry.

Expected behavior: time.time() > (exp - 300) → token is considered expired 5 minutes before actual JWT expiry, ensuring proactive refresh.

Suggested Fix

- return time.time() > (exp + self.EXPIRATION_LEEWAY.total_seconds())
+ return time.time() > (exp - self.EXPIRATION_LEEWAY.total_seconds())

Impact Timeline

Using a concrete example where the access token's exp claim is T:

Time Current behavior (+ leeway) Expected behavior (- leeway)
T - 6 min Token seen as valid ✅ Token seen as valid ✅
T - 4 min Token seen as valid ✅ Token seen as expired → refresh triggered
T (actual expiry) Token seen as valid ❌ Fresh token already in use ✅
T + 3 min Token seen as valid ❌ (all API calls return 401) Fresh token in use ✅
T + 6 min Token finally seen as expired → refresh triggered Fresh token in use ✅

In the worst case, the SDK sends expired tokens for up to 5 minutes + 60 seconds (leeway + background check interval) before a refresh is triggered.

Additional Issues in KeyAuth (same file)

While investigating the root cause above, we identified two additional issues that compound the problem:

Issue 2: __call__ does not wait for token refresh before using the token

def __call__(self, r: Request) -> Request:
    with self.lock:
        if self.refresh_future is not None and self.refresh_future.done():
            self.refresh_future.result()
        if self.__is_token_expired(self.access_token):
            if self.refresh_future is None or self.refresh_future.done():
                self.refresh_future = self.executor.submit(self.__refresh_token)
        r.headers["Authorization"] = f"Bearer {self.access_token}"  # ← uses stale token
    return r

When a token is detected as expired, the refresh is submitted to the thread pool, but the request proceeds immediately with the old (expired) token. The caller has no way to know that the token is stale.

Expected behavior: When the token is expired and a refresh is triggered, __call__ should wait for the refresh to complete (or use a synchronous refresh path) before attaching the Authorization header.

Issue 3: __refresh_token writes self.access_token without holding the lock

def __refresh_token(self):
    # ...
    new_token = response_data.get("access_token")
    self.access_token = new_token  # ← no lock held

The __refresh_token method runs in the ThreadPoolExecutor (i.e., a separate thread) but writes to self.access_token without acquiring self.lock. This is a data race with __call__, which reads self.access_token under the lock.

Steps to Reproduce

  1. Create a long-running application using any STACKIT service SDK with key-flow authentication.
  2. Wait for the access token to expire
  3. Observe that all API calls start returning 401 Unauthorized.
  4. The errors persist for approximately 5–6 minutes until the SDK's __is_token_expired finally detects the token as expired (due to the inverted leeway) and triggers a refresh.
  5. Even after the refresh is triggered, the first request still fails because __call__ does not wait for the refresh to complete.

Environment

  • Python: 3.12
  • stackit-core: 0.2.0
  • stackit-kms: 0.8.1
  • OS: Linux (also reproducible on macOS)

Workaround

Until the SDK is fixed, consumers can work around this by accessing the KeyAuth instance and forcing a synchronous token refresh after receiving a 401:

from stackit.core.auth_methods.key_auth import KeyAuth

auth = api.api_client.rest_client.session.auth
if isinstance(auth, KeyAuth):
    with auth.lock:
        auth._KeyAuth__create_initial_token()
        auth._KeyAuth__fetch_token_from_endpoint()

This bypasses the broken expiry check and obtains a fresh token before retrying the failed request.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions