Skip to content

Simplify API to worker mode with transparent parallel pool#57

Open
benoitc wants to merge 11 commits intomainfrom
feature/parallel-python-pool
Open

Simplify API to worker mode with transparent parallel pool#57
benoitc wants to merge 11 commits intomainfrom
feature/parallel-python-pool

Conversation

@benoitc
Copy link
Owner

@benoitc benoitc commented Mar 26, 2026

Summary

  • Simplify API to worker mode only, removing subinterpreter mode from public API
  • Add transparent parallel Python pool using OWN_GIL subinterpreters (enabled via CMAKE flag)
  • Remove ASGI/WSGI runner support
  • Fix parallel pool initialization to properly restore main interpreter state
  • Remove subinterpreter-specific tests that are no longer applicable

Changes

API Simplification:

  • Worker mode is now the only public context mode
  • Subinterpreters are used internally for parallel pool when enabled
  • Removed py:context/2 mode parameter, contexts now always use worker mode

Parallel Pool (ENABLE_PARALLEL_PYTHON=ON):

  • Creates OWN_GIL subinterpreters for true parallel Python execution
  • Transparent to user code - just enable the CMAKE flag
  • Fixed GIL state restoration using PyEval_RestoreThread instead of PyGILState_Ensure

Removed:

  • ASGI/WSGI runner support
  • Public subinterpreter mode API
  • OWN_GIL-specific test suites

All 415 tests pass in both standard and parallel modes.

benoitc added 11 commits March 25, 2026 11:01
Implement transparent parallel Python execution when built with
-DENABLE_PARALLEL_PYTHON=ON (requires Python 3.14+):

- Create py_parallel_pool.c/h with OWN_GIL subinterpreter pool
- Each slot has its own GIL enabling true parallel execution
- Contexts assigned to slots via round-robin
- User API unchanged - parallelism is transparent

Build modes:
- Default: Current worker mode unchanged (~400K calls/sec)
- Parallel: OWN_GIL subinterpreters for true parallelism

Usage:
  rebar3 compile                              # default mode
  CMAKE_OPTIONS="-DENABLE_PARALLEL_PYTHON=ON" rebar3 compile  # parallel

Performance verified: 4 concurrent 100ms sleeps complete in ~108ms
(vs ~400ms if serial), confirming true parallel execution.

Note: Known shutdown race condition with OWN_GIL subinterpreters
during VM termination - core functionality works correctly.
Address multiple race conditions identified during shutdown:

1. Add active_count tracking to parallel slots
   - Increment before acquiring GIL, decrement after releasing
   - Shutdown waits for all slots to become idle (with timeout)

2. Add shutdown_requested flag to slots
   - Prevents new acquisitions during shutdown
   - parallel_slot_acquire returns false if shutdown requested

3. Fix context destructor for parallel mode
   - Clean up context objects using the correct slot's GIL
   - Skip cleanup if slot is shutting down

4. Simplify finalize for ENABLE_PARALLEL_PYTHON
   - Just shutdown parallel pool directly (no ASGI/WSGI/numpy cleanup)
   - OWN_GIL subinterpreters handle their own cleanup via Py_EndInterpreter

Note: Process exit may still crash due to Python OWN_GIL cleanup
internals - this is outside our control and doesn't affect the
actual functionality (all tests pass before the crash).
Wrap all OWN_GIL thread-per-context functions and their call sites in
#ifndef ENABLE_PARALLEL_PYTHON guards. When the parallel pool is enabled,
these functions are unused since the pool replaces the dedicated thread
approach.

This eliminates unused function warnings in parallel builds and reduces
binary size. The code remains available for non-parallel builds.

Files changed:
- py_nif.c: wrap owngil_execute_*, dispatch_*_to_owngil, owngil_context_init/shutdown
- py_nif.h: wrap reactor dispatch function declarations
- py_event_loop.c: wrap reactor dispatch call sites
Remove subinterp/owngil modes from public API. True parallel Python
execution is now enabled via:
- ENABLE_PARALLEL_PYTHON=ON build flag (Python 3.14+)
- Free-threaded Python (3.13t+) - automatic

Changes:
- Remove py:subinterp_* functions (13 functions)
- Remove mode parameter from py_context:new/1
- Remove 5 owngil/subinterp test suites
- Remove 4 owngil/subinterp example files
- Update docs to reflect simplified API
- Add CI jobs for parallel Python testing
- README: Remove SHARED_GIL/OWN_GIL mode selection references
- README: Update parallelism options to worker/free-threaded/parallel pool
- py_nif.erl: Mark subinterp/owngil NIFs as @Private internal
This removes the ASGI and WSGI application runner functionality:
- Delete py_asgi.c, py_asgi.h, py_wsgi.c, py_wsgi.h
- Remove ASGI/WSGI NIF exports and stubs from py_nif.erl
- Remove ASGI/WSGI processing from py_worker_pool.c/h
- Remove ASGI scope atoms and buffer resource type
- Clean up py_nif.c includes and initialization

The PyBuffer API for zero-copy WSGI input is retained as a
separate feature for passing body data to Python apps.
- Remove test_subinterp_supported and test_parallel_execution tests
- Remove all test_asgi_* tests (response extraction, caching, etc.)
- Simplify test_memory_stats and test_reload by removing subinterp checks

The removed tests tested functionality that was removed in the
ASGI/WSGI runner removal commit.
Remove subinterpreter-specific tests that are no longer applicable:
- py_context_process_SUITE: Remove subinterp test group
- py_channel_SUITE: Remove subinterp_sync_receive_wait_test
- py_import_SUITE: Remove registry_applied_to_subinterp_test
- py_reactor_SUITE: Remove reactor_context_subinterp_isolation_test

Fix parallel pool crash by properly restoring main interpreter state:
- Save main_tstate before creating OWN_GIL subinterpreters
- Use PyEval_RestoreThread(main_tstate) instead of PyGILState_Ensure()
- Defer event loop initialization to avoid NULL env parameter

All 415 tests pass in both standard and parallel modes.
Since the API now only exposes worker mode, the SHARED_GIL subinterpreter
pool (py_subinterp_pool.c/h) is no longer reachable from user code.

Removed:
- c_src/py_subinterp_pool.c and .h
- All subinterp_pool references from py_nif.c and py_nif.h
- test/py_web_frameworks_SUITE.erl (tested removed py_asgi/py_wsgi)

The parallel pool (py_parallel_pool.c) remains for ENABLE_PARALLEL_PYTHON
builds which use OWN_GIL subinterpreters for true parallel execution.
… 3.0.0

Documentation updates after API simplification:
- Remove WSGI/ASGI references from buffer.md, getting-started.md
- Remove py_asgi examples from asyncio.md
- Mark subinterp/OWN_GIL as internal in scalability.md, process-bound-envs.md
- Update migration.md with simplified mode descriptions
- Update source comments in py_buffer.erl, py_nif.erl, py_reactor_context.erl
- Remove missing owngil_internals.md from rebar.config

New documentation:
- Add docs/parallel-execution.md explaining parallel pool architecture

Version bump:
- Bump version to 3.0.0 in app.src and getting-started.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant