Skip to content

Reduce peak memory usage during release builds to fix OOM on manylinux runners#1445

Open
kevinjqliu wants to merge 2 commits intoapache:mainfrom
kevinjqliu:kevinjqliu/more-ci-optimizations
Open

Reduce peak memory usage during release builds to fix OOM on manylinux runners#1445
kevinjqliu wants to merge 2 commits intoapache:mainfrom
kevinjqliu:kevinjqliu/more-ci-optimizations

Conversation

@kevinjqliu
Copy link
Contributor

@kevinjqliu kevinjqliu commented Mar 27, 2026

Which issue does this PR close?

Follow up to #1443
Closes #1429

Rationale

As the dependency tree has grown (DataFusion + Substrait + Arrow + object_store with aws/gcp/azure/http features), the release build's peak memory during LTO linking has exceeded what the GitHub runner can provide.

Fixes OOM (Killed process ... (rustc) total-vm:25086084kB, anon-rss:15361808kB) during manylinux x86_64 release builds, where rustc consumed ~15 GB and exhausted the runner's memory.

What changes are included in this PR?

Cargo profile (Cargo.toml):

  • Switch from fat LTO (lto = true) to thin LTO (lto = "thin") -- this is the biggest win, reducing peak memory by ~50-70% since LLVM no longer needs to merge all bitcode into a single module
  • Increase codegen-units from 1 to 2 -- splits LLVM's workload, further reducing peak RSS

CI workflow (.github/workflows/build.yml):

  • Add 8 GB swap to the build-manylinux-x86_64 job as a safety net (matching the existing pattern in the aarch64 job)
  • Reduce build-manylinux-aarch64 swap from 16 GB to 8 GB for consistency

Tradeoffs

Thin LTO + codegen-units=2 may produce binaries that are ~1-4% slower in micro-benchmarks vs fat LTO + codegen-units=1. In practice, this is unlikely to be measurable for a Python extension where the Python-Rust FFI boundary and PyArrow serialization dominate execution time.

sudo swapoff -a || true
sudo rm -f /swapfile
sudo fallocate -l 16G /swapfile || sudo dd if=/dev/zero of=/swapfile bs=1M count=16384
sudo fallocate -l 8G /swapfile || sudo dd if=/dev/zero of=/swapfile bs=1M count=8192
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont need all 16GB, take less disk space

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

main branch has errors

1 participant