This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
flox is a Python library providing fast GroupBy reduction operations for dask.array. It implements parallel-friendly GroupBy reductions using the MapReduce paradigm and integrates with xarray for labeled multidimensional arrays.
# Create and activate development environment
mamba env create -f ci/environment.yml
conda activate flox-tests
python -m pip install --no-deps -e .# Run full test suite (as used in CI)
pytest --durations=20 --durations-min=0.5 -n auto --cov=./ --cov-report=xml --hypothesis-profile ci
# Run tests without coverage
pytest -n auto
# Run single test file
pytest tests/test_core.py
# Run specific test
pytest tests/test_core.py::test_function_name# Run all pre-commit hooks
pre-commit run --all-files
# Format code with ruff
ruff format .
# Lint and fix with ruff
ruff check --fix .
# Type checking
mypy flox/
# Spell checking
codespell# Performance benchmarking (from asv_bench/ directory)
cd asv_bench
asv run
asv publish
asv previewci.yaml- Main CI pipeline with test matrix across Python versions (3.11, 3.13) and operating systems (Ubuntu, Windows)ci-additional.yaml- Additional CI jobs including doctests and mypy type checkingupstream-dev-ci.yaml- Tests against development versions of upstream dependenciespypi.yaml- PyPI publishing workflowtestpypi-release.yaml- Test PyPI release workflowbenchmarks.yml- Performance benchmarking workflow
environment.yml- Main test environment with all dependenciesminimal-requirements.yml- Minimal requirements testing (pandas==1.5, numpy==1.22, etc.)no-dask.yml- Testing without dask dependencyno-numba.yml- Testing without numba dependencyno-xarray.yml- Testing without xarray dependencyenv-numpy1.yml- Testing with numpy<2 constraintdocs.yml- Documentation building environmentupstream-dev-env.yml- Development versions of dependenciesbenchmark.yml- Benchmarking environment
.readthedocs.yml- ReadTheDocs configuration usingci/docs.ymlenvironment
core.py- Main reduction logic, central orchestrator of groupby operationsaggregations.py- Defines theAggregationclass and built-in aggregation operationsxarray.py- Primary integration with xarray, providesxarray_reduce()APIdask_array_ops.py- Dask-specific array operations and optimizations
aggregate_flox.py- Native flox implementationaggregate_npg.py- numpy-groupies backendaggregate_numbagg.py- numbagg backend for JIT-compiled operationsaggregate_sparse.py- Support for sparse arrays
cache.py- Caching mechanisms for performancevisualize.py- Tools for visualizing groupby operationslib.py- General utility functionsxrutils.py&xrdtypes.py- xarray-specific utilities and types
flox.groupby_reduce()- Pure dask array interfaceflox.xarray.xarray_reduce()- Pure xarray interface
Engine Selection: The library supports multiple computation backends ("flox", "numpy", "numbagg") that can be chosen based on data characteristics and performance requirements.
MapReduce Strategy: Implements groupby reductions using a two-stage approach (blockwise + tree reduction) to avoid expensive sort/shuffle operations in parallel computing.
Chunking Intelligence: Automatically rechunks data to optimize groupby operations, particularly important for the current auto-blockwise-rechunk branch.
Integration Testing: Extensive testing against xarray's groupby functionality to ensure compatibility with the broader scientific Python ecosystem.
- Framework: pytest with coverage, parallel execution (pytest-xdist), and property-based testing (hypothesis)
- Coverage Target: 95%
- Test Environments: Multiple conda environments test optional dependencies (no-dask, no-numba, no-xarray)
- CI Matrices: Tests across Python 3.11-3.13, Ubuntu/Windows, multiple dependency configurations
Core: pandas>=1.5, numpy>=1.22, numpy_groupies>=0.9.19, scipy>=1.9, toolz, packaging>=21.3
Optional: cachey, dask, numba, numbagg, xarray (enable with pip install flox[all])
- Uses
setuptools_scmfor automatic versioning from git tags - Heavy emphasis on performance with ASV benchmarking infrastructure
- Type hints throughout with mypy checking
- Pre-commit hooks enforce code quality (ruff, prettier, codespell)
- Integration testing with xarray upstream development branch
- Python Support: Minimum version 3.11 (updated from 3.10)
- Git Worktrees:
worktrees/directory is ignored for development workflows