Skip to content

bindflt: concurrent relative-path NtCreateFile on a mapped directory intermittently returns STATUS_INVALID_PARAMETER #635

@MarshallOfSound

Description

@MarshallOfSound

Inside a process-isolated Windows Server container, when the process's current directory is on a bindflt-mapped volume (e.g. a Kubernetes emptyDir or Docker bind mount), CreateFileW / CreateDirectoryW called with a relative path intermittently fail with ERROR_INVALID_PARAMETER (87) under concurrent load. The same path passed absolute never fails. The same files on container scratch (wcifs only, no bindflt mapping) never fail. The same files accessed from the host directly (no container) never fail.

This points at bindflt.sys's handling of handle-relative NtCreateFile (where RootDirectory is the cwd handle and ObjectName is the relative path) when multiple threads operate on overlapping targets.

Environment

  • Host: Windows Server 2022 10.0.20348.4893 and Windows Server 2025 10.0.26100.x (both reproduce; AKS Windows node pools)
  • Container: process-isolated, mcr.microsoft.com/windows/servercore:ltsc2022
  • Mount: Kubernetes emptyDir: {}C:\work (HCS mapped directory)
  • Runtime: containerd (AKS default)
  • Filter stack on host (fltmc filters from a hostProcess pod):
    Filter Name     Num Instances   Altitude
    bindflt                    29     409800
    FsDepends                  35     407000
    WdFilter                   35     328010
    wcifs                      17     189900
    Wof                         1      40700
    

Symptom

CreateFileW("src\third_party\…\foo.h", GENERIC_READ, …, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL)
  → INVALID_HANDLE_VALUE, GetLastError() = 87 (ERROR_INVALID_PARAMETER)

The failure is transient — retrying the same call after ~5 ms succeeds. Hit rate is ~0.02–0.1 % of opens under the conditions below.

Trigger (all required)

  1. Process runs inside a process-isolated container
  2. cwd is on a bindflt-mapped directory (k8s emptyDir / bind mount)
  3. Path is passed relative (not absolute, not \\?\)
  4. Multiple threads concurrently open overlapping targets (same file with an existing open handle, or shared parent directories)

Bisection — what it is not

variation err87
same files, absolute path (C:\work\src\…) 0 / 1.44 M
same files, relative path, on container scratch (wcifs-only VHDX, no bindflt mapping) 0 / 130 k
same files, relative path, from hostProcess pod against the emptyDir's host-side path (no container filters) 0 / 836 k
same files, relative path, on bindflt-mapped emptyDir ~0.1 % of 130 k–1.44 M
pure C / Win32 (CreateFileW + _beginthreadex), no Go runtime reproduces (159 / 836 k)
FILE_FLAG_BACKUP_SEMANTICS set vs. cleared irrelevant (cleared actually hit slightly more)
WS2022 vs WS2025 host both reproduce
files written by tar / gn / robocopy / Go os.WriteFile all reproduce (writer irrelevant)
CreateFileW (OPEN_EXISTING) vs CreateDirectoryW both reproduce (20 / ~117 k mkdir)

Minimal reproduction

Pod

apiVersion: v1
kind: Pod
metadata:
  name: bindflt-repro
spec:
  restartPolicy: Never
  nodeSelector:
    kubernetes.io/os: windows
  containers:
    - name: repro
      image: mcr.microsoft.com/windows/servercore:ltsc2022
      command: ["cmd", "/c", "ping -t 127.0.0.1 > nul"]
      volumeMounts:
        - name: work
          mountPath: 'C:\work'
  volumes:
    - name: work
      emptyDir: {}

C reproducer

Build with cl /O2 repro.c or zig cc -target x86_64-windows-gnu -O2 -municode repro.c.

// repro.c — concurrent relative-path double-open on a bindflt mapped dir.
// Usage: cd into the mapped volume, then:  repro.exe <relative-dir> 32 12
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <process.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_FILES 200000
static wchar_t *g_files[MAX_FILES];
static volatile LONG g_nfiles, g_idx, g_err87, g_recovered, g_other;
static HANDLE g_inner_sema;

static HANDLE open_ro(const wchar_t *p) {
  return CreateFileW(p, GENERIC_READ, FILE_SHARE_READ | FILE_SHARE_WRITE,
                     NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
}

static void walk(const wchar_t *dir) {
  wchar_t pat[1024]; _snwprintf(pat, 1024, L"%ls\\*", dir);
  WIN32_FIND_DATAW fd; HANDLE h = FindFirstFileW(pat, &fd);
  if (h == INVALID_HANDLE_VALUE) return;
  do {
    if (!wcscmp(fd.cFileName, L".") || !wcscmp(fd.cFileName, L"..")) continue;
    wchar_t full[1024]; _snwprintf(full, 1024, L"%ls\\%ls", dir, fd.cFileName);
    if (fd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) walk(full);
    else if (g_nfiles < MAX_FILES) g_files[InterlockedIncrement(&g_nfiles)-1] = _wcsdup(full);
  } while (FindNextFileW(h, &fd));
  FindClose(h);
}

struct inner { const wchar_t *p; DWORD err; };
static unsigned __stdcall inner_thread(void *a) {
  struct inner *ia = a;
  WaitForSingleObject(g_inner_sema, INFINITE);
  HANDLE h = open_ro(ia->p);
  ia->err = (h == INVALID_HANDLE_VALUE) ? GetLastError() : (CloseHandle(h), 0);
  ReleaseSemaphore(g_inner_sema, 1, NULL);
  return 0;
}

static unsigned __stdcall worker(void *_) {
  for (;;) {
    LONG i = InterlockedIncrement(&g_idx) - 1;
    if (i >= g_nfiles) return 0;
    const wchar_t *p = g_files[i];
    HANDLE h1 = open_ro(p);
    if (h1 == INVALID_HANDLE_VALUE) {
      (GetLastError()==ERROR_INVALID_PARAMETER ? InterlockedIncrement(&g_err87)
                                               : InterlockedIncrement(&g_other));
      continue;
    }
    struct inner a = { p, 0 };
    HANDLE th = (HANDLE)_beginthreadex(NULL, 0, inner_thread, &a, 0, NULL);
    WaitForSingleObject(th, INFINITE); CloseHandle(th);
    if (a.err == ERROR_INVALID_PARAMETER) {
      InterlockedIncrement(&g_err87);
      Sleep(5);
      HANDLE h2 = open_ro(p);
      if (h2 != INVALID_HANDLE_VALUE) { InterlockedIncrement(&g_recovered); CloseHandle(h2); }
      fwprintf(stderr, L"err87 %ls\n", p);
    } else if (a.err) InterlockedIncrement(&g_other);
    CloseHandle(h1);
  }
}

int wmain(int argc, wchar_t **argv) {
  const wchar_t *dir = argc>1 ? argv[1] : L"tree";
  int nw = argc>2 ? _wtoi(argv[2]) : 32, rounds = argc>3 ? _wtoi(argv[3]) : 12;
  walk(dir);
  wprintf(L"files=%ld workers=%d rounds=%d\n", g_nfiles, nw, rounds);
  if (!g_nfiles) return 1;
  g_inner_sema = CreateSemaphoreW(NULL, nw, nw, NULL);
  HANDLE *ths = calloc(nw, sizeof(HANDLE));
  for (int r=1; r<=rounds; r++) {
    g_idx = 0;
    for (int w=0; w<nw; w++) ths[w] = (HANDLE)_beginthreadex(NULL,0,worker,NULL,0,NULL);
    for (int w=0; w<nw; w++) { WaitForSingleObject(ths[w], INFINITE); CloseHandle(ths[w]); }
    wprintf(L"round %d/%d  err87=%ld\n", r, rounds, g_err87);
  }
  wprintf(L"\nopens=%ld err87=%ld recovered=%ld other=%ld\n",
          (long)g_nfiles*rounds, g_err87, g_recovered, g_other);
  return 0;
}

Steps

# inside the container
cd C:\work
# create a deep-ish tree (~60k files across 4096 dirs)
0..63 | % { $a=$_; 0..63 | % { mkdir "tree\d$a\d$_" -Force } } | Out-Null
0..59999 | % { Set-Content ("tree\d{0}\d{1}\f{2}.txt" -f ($_ % 64), ([math]::Floor($_ / 64) % 64), $_) "x" }
# relative path, from inside the mapped volume:
.\repro.exe tree 32 12
# → nonzero err87

# controls (all zero):
.\repro.exe C:\work\tree 32 12              # absolute path
robocopy C:\work\tree C:\scratch\tree /E    # scratch = wcifs-only
cd C:\scratch ; ..\work\repro.exe tree 32 12

Real-world impact

Chromium's build tool siso (and by extension Electron's CI) opens ~10–90 k .ninja files from a relative cwd under out/, with each file opened twice (outer open + chunked re-open). On Windows ARC runners backed by a k8s emptyDir, ~3–8 of those opens fail per build with The parameter is incorrect., aborting the manifest load.

Any tool that does concurrent relative-path file or directory operations on a bind-mounted volume inside a Windows container is exposed — build systems, parallel extractors, test harnesses, etc.

Workarounds

  • Pass absolute paths to CreateFileW / CreateDirectoryW
  • Avoid opening a path while another handle to the same target is live in the process
  • Retry on ERROR_INVALID_PARAMETER (clears in ≤5 ms)
  • Place the working tree on container scratch instead of a bind mount (requires bumping containerd sandbox size; scratch is wcifs-only and does not reproduce)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriageNew and needs attention

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions